Big data entails nearly every aspect of commerce. However, protecting big data as a form of intellectual property is complex. For instance, patents, copyrights and trade secrets provide limited protection for datasets. Moreover, the ownership of datasets can be uncertain. Additionally, datasets may be subject to numerous regulatory laws. In view of the aforementioned complexities, contractual agreements play a pivotal role in protecting and commercializing big data.
Big data is a valuable asset
Big data can be in many forms. Such forms can include market data, consumer data, business records, health records, and experimental results.
Additionally, big data can find applications in numerous fields, including the healthcare and life sciences industries. For instance, in the healthcare industry, data extracted from electronic health records can be supplied into a software with artificial intelligence (AI) or machine-learning algorithms for diagnostic applications, such as detecting early heart failure and predicting surgical complications. Similarly, in the life sciences industry, DNA sequences generated through next generation sequencing techniques can be supplied to various AI-based software for the identification of potential drug targets.
Patents, copyrights and trade secrets provide limited big data protection
Despite a broad range of applications, protection of big data as a form of intellectual property can be complex. For instance, unless a minimal amount of creativity exists in the selection, coordination, and arrangement of datasets, datasets may not be protectable by copyrights, regardless of the laborious efforts involved in collecting and compiling the data. Furthermore, data compilation processes and compiled data may not qualify as patent eligible subject matter.
Datasets are protectable as trade secrets. However, trade secret protection may not be practical in some circumstances because trade secret protection would require the maintenance of the datasets as confidential. Trade secret protection would also require the proactive implementation of reasonable measures to maintain the secrecy of the datasets. Moreover, trade secret protection provides a narrow protection of datasets because it does not protect against reverse engineering or independent development of the datasets.
Big data ownership can be uncertain
Complexities also exist in ascertaining the ownership of datasets. For instance, in the absence of contractual agreements to the contrary, different individuals or entities may claim ownership to datasets, including the generators, compilers, users, purchasers, and guardians of the datasets. Such complexities may escalate further when different individuals or entities generate, compile, use, store or purchase datasets at different times, at different institutions, or at different locations.
Big data can be highly regulated
Additionally, the use, storage and distribution of datasets may be subject to numerous state and federal laws. For instance, if the datasets contain personally identifiable information, then the datasets could be subject to numerous state and federal data protection laws. As an example, the Health Insurance Portability and Accountability Act (HIPAA) mandates the protection of an individual’s health information that is held or transmitted by health plans, healthcare providers or healthcare clearinghouses .
Contractual agreements help protect and commercialize big data
In view of the aforementioned complexities in safeguarding big data intellectual property rights, ascertaining big data ownership, and complying with big data regulatory requirements, contractual agreements play a pivotal role in big data protection and commercialization. For instance, assignment agreements help establish the ownership of datasets while confidentiality agreements help maintain their confidentiality. Additionally, license agreements help establish the terms by which others can exploit and commercialize datasets.
Assignment agreements help establish the ownership of big data
Where applicable, assignment agreements can help establish ownership over datasets. For instance, in order to obtain clear title to a dataset, an entity that retains or employs individuals to generate or compile the dataset should execute comprehensive assignment agreements with those individuals. Preferably, such assignment agreements should require the individuals to assign all of their rights to the datasets (including intellectual property and commercialization rights) to the entity.
Confidentiality agreements help maintain the confidentiality of big data
Confidentiality agreements can help maintain the confidentiality of datasets and prevent their unauthorized disclosure. Confidentiality agreements can also help ensure compliance with numerous state and federal regulations by helping prevent the unauthorized disclosure of any protected information. Additionally, confidentiality agreements can help protect any trade secrets within a dataset.
In order to provide maximum protection of datasets, entities that own or control datasets should ensure that all individuals who have accessed or will access the datasets (including compilers, generators, users, and purchasers) have executed comprehensive confidentiality agreements. Such confidentiality agreements should clearly set forth the authorized and unauthorized uses of the datasets.
For instance, an entity that retains or employs individuals to generate or compile datasets should execute comprehensive confidentiality agreements with those individuals, where the individuals agree not to disclose the datasets to anyone other than the authorized representatives of the entity. Similarly, an entity that distributes a dataset to a programmer for the training of an AI software should execute a comprehensive confidentiality agreement with the programmer, where the programmer agrees not to disclose or reverse engineer the dataset.
License agreements help establish the terms of commercializing and exploiting big data
An entity that owns or controls datasets (i.e., a licensor) can provide a third party (i.e., a licensee) with certain rights to the datasets by executing a database license agreement with the third party. Database license agreements can be standalone agreements that focus on the grant of rights to certain datasets. Database license agreements can also be part of a broader license agreement that includes the grant of rights beyond datasets, such as software.
- Level of exclusivity
Database license agreements should also define the level of exclusivity that a licensee will obtain to the licensed datasets. For instance, an exclusive license can bar the owner or controller of the dataset from granting additional licenses to other parties for the licensed datasets. On the other hand, a non-exclusive license could allow the owner or controller of the datasets to grant additional licenses to other parties for the licensed datasets.
- Sublicensing rights
Database license agreements should also clearly define the ability of the licensee to sub-license the licensed datasets to third parties. For instance, sublicensing rights could provide the licensee with the right to partner with other parties in the commercialization or use of the licensed datasets. However, a lack of sublicensing rights could prevent the licensee from entering into such partnership agreements.
- Warranties and disclaimers
Database license agreements should also include numerous warranties and disclaimers in order to provide assurances between the parties and minimize liability. For instance, database licensors generally include disclaimers that they are providing the datasets “as is” without any warranties regarding their suitability for an intended purpose. However, licensees generally seek warranties and representations from a licensor that the licensor owns or controls the datasets at issue, and has sufficient rights to grant a license to the datasets.
Moreover, both the licensee and licensor usually represent and warrant that they are in compliance with applicable laws, such as applicable data security and privacy laws. For instance, if the licensed datasets contain protected health information, then both the licensor and licensee should provide assurances in the license agreement that they will comply with relevant regulatory laws, such as HIPAA. On the other hand, if the licensed datasets do not contain any regulated data, then the licensee may request the licensor to provide assurances that the licensed datasets are devoid of any regulated data, such as protected health information.
Additionally, the database license agreement should clearly define the confidentiality obligations of the parties towards the licensed datasets. For instance, if the licensed datasets contain trade secrets, then both the licensor and the licensee should provide assurances that they will take reasonable measures in order to maintain the secrecy of the trade secrets.
Devising a proper strategy for protecting and commercializing big data is a fact-specific inquiry that depends on numerous factors, including the type of data, the origins of the data, the storage location of the data, the destination of the data, and the intended use of the data. Regardless, carefully drafted contractual agreements play a pivotal role in protecting datasets, maximizing their commercialization value, avoiding disputes between parties, and limiting liability.
 See, e.g., Kilic A. Artificial Intelligence and Machine Learning in Cardiovascular Health Care. Ann Thorac Surg. 2020 May;109(5):1323-1329. doi: 10.1016/j.athoracsur.2019.09.042. Epub 2019 Nov 7. PMID: 31706869.
 See, e.g., Dlamini Z et al., Artificial intelligence (AI) and big data in cancer and precision oncology. Comput Struct Biotechnol J. 2020 Aug 28;18:2300-2311. doi:10.1016/j.csbj.2020.08.019. PMID: 32994889; PMCID: PMC7490765.
 See Feist Publications, Inc. v. Rural Telephone Service Co., 499 U.S. 340, 111 S. Ct. 1282 (1991).
 See 35 U.S.C. §101 (identifying patent eligible subject matter as “any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof”, which generally exclude standalone datasets). Also see In Re Board of Trustees of the Leland Stanford Junior University, No. 20-1012 (Fed. Cir. 2021) (holding that processes directed to generating datasets through the utilization of mathematical formulas were not eligible for patenting under 35 U.S.C. §101). Also see WinTech blog article entitled “Determining the Patent Eligibility of Inventions under the new USPTO Guidelines” (explaining that the patent eligibility of computer-implemented inventions, such as methods of generating datasets, remains an unsettled area of law).
 See 18 U.S.C §1839(3) (identifying trade secrets under the Defend Trade Secrets Act as “all forms and types of financial, business, scientific, technical, economic, or engineering information, including patterns, plans, compilations, program devices, formulas, designs, prototypes, methods, techniques, processes, procedures, programs, or codes, whether tangible or intangible, and whether or how stored, compiled, or memorialized physically, electronically, graphically, photographically, or in writing.”)
 See 18 U.S.C §1839(3)(A) (requiring the owner of a trade secret to take “reasonable measures to keep such information secret.”). Also see WinTech blog article entitled “Protecting Your Most Valuable Assets: How to Identify and Maintain Your Institution’s Trade Secrets” (outlining the reasonable measures that a trade secret owner must take in order to maintain the secrecy of the trade secrets).
 See 18 U.S.C §1839(6)(B) (indicating that trade secret misappropriation “does not include reverse engineering, independent derivation, or any other lawful means of acquisition.”)
 See https://www.hhs.gov/hipaa.
 see WinTech blog article entitled “Protecting Your Most Valuable Assets: How to Identify and Maintain Your Institution’s Trade Secrets” (outlining examples of reasonable measures that a trade secret owner must take in order to maintain the secrecy of the trade secrets).