It is worth noting, however, a set of curious circumstances surrounding the paper itself. Pursuant to a torture d reading of the relevant Department of Health and Human Services (DHHS) regulations, 45 C.F.R. 46.102(f), the study qualified as being “not human subjects research.” This is because, per the DHHS definition, “research performed on anonymized data with no contact between investigators and participants does not constitute research on human subjects” (Eriksson et al., 16). Thus, part of this bizarre conclusion rests on the fact that “participant names [were] anonymous with respect to the data seen by the investigators” (Gibson and Coperhaver 2010, 1). On the issue of participant anonymity, the study further explains:
The Consent and Legal Agreement stated that participants’ genotype data and whatever phenotype data they entered would be used for internal research after being coded and stripped of individually identifying information (‘‘anonymized’’). Individually identifying information refers to personal information that is collected during purchase, such as name, credit card information, billing and shipping addresses, and contact information such as an email address or telephone number. (Eriksson et al. 2010, 16)
Nothing more is provided on how participants’ information was “anonymized,” nor how such anonymity was preserved during the study. Given the fact that 23andMe plans to “provid[e] participants with well-explained descriptions of their genetic data,” it follows that not all links between participants and their information were severed: otherwise, how would 23andMe report back their individual results? That the participants’ data was not truly “anonymized” is no surprise, however, and hints at an underlying axiom of data analysis: “Data can either be useful or perfectly anonymous but never both” (Ohm 2009, 4).
In the last decade, increasingly-sophisticated data mining technology has essentially negated the traditional concept of information “anonymization.” “At the very least,” one research concludes, “we must abandon the pervasively held idea that we can protect privacy by removing personally identifiable information” (Ohm 2009, 35). Re-identification—the converse of anonymization—relies on “pockets of suprising uniqueness remaining in . . . data. Just as human fingerprints can uniquely identify a single person and link that person with ‘anonymous’ information . . . so too do data generate ‘data fingerprints’—combinations of values of data shared by nobody else.” (Ohm 2009, 21).
“It is noteworthy that several respondents who had particular experience with biobanks or social science research,” the author of a study on biobank security writes, “were concerned about datasets that permitted identification of individuals with special characteristics despite the coding [anonymization] and the absence of any name or directly identifying item” (Elger 2008, 181). For example, according to a recent study 87% of the American population possess unique ZIP code, birth date, and gender combinations. In other words, over 250 million Americans can be uniquely identified by the combination of the ZIP code, gender, and birth date. (Ohm 2009, 4). The point here is that three pieces of otherwise non-identifying data can intersect to re-identify a once-anonymized individual. This begs the question: how could the 23andMe study participants give their informed consent if they were under the mistaken belief that their identities would be protected?
As the editors of PLoS Genetics acknowledge, further complicating this ethical quagmire is the fact that:
. . . the experience of 23andMe reflects an unfortunate loophole that applies to all research with human samples that is not, as above, formally designated to be ‘‘human subjects research.’’ For situations in which a study does not meet the aforementioned criteria but obtaining a consent form would still be desirable, there are no guidelines or policy with regard to how such a consent form should be developed and reviewed in an ethically responsible manner. (Gibson and Gopenhaver 2010, 2)
Without informed consent guidelines, it is impossible to judge whether the 23andMe “Consent and Legal Agreement” properly advised study participants as to the risks associated with their participation. That document only speaks to the matter to the extent that it states that information withheld from the study dataset “include[s] identifying information you provided when you purchased the Personal Genome Service(TM) or created an account (such as name, address, e-mail address, or credit card information)” (23andMe, Inc. 2010). The study’s frivolous subject matter notwithstanding, at the very least its publication points up a glaring failure on the part of regulators to protect consumer privacy.
23andMe, Inc. 2010. Consent and Legal Agreement. https://www.23andme.com/about/consent/.
Elger, Bernice et al. 2008. Ethical Issues in Governing Biobanks: Global Perspectives. Hampshire, UK: Ashgate Publishing, Ltd.
Eriksson, Nicholas, J., et al. 2010. Web-Based, Participant-Driven Studies Yield Novel Genetic Associations for Common Traits. PLoS Genet 6, no. 6 (June 24): e1000993.
Gibson, Greg, and Gregory P. Copenhaver. 2010. Consent and Internet-Enabled Human Genomics. PLoS Genet 6, no. 6 (June 24): e1000965.
Ohm, P. 2009. Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization. University of Colorado Law Legal Studies Research Paper No. 09 12.