Open Access News

News from the open access movement


Tuesday, October 28, 2008

Open data vs. genetic privacy

Brenda Patoine, Speed Bump for Open Access to Genomic Data, Annals of Neurology blog, October 27, 2008.

... Genome-wide association studies have been used to great effect in recent years ...

To facilitate data sharing and accelerate genetic studies, the National Institutes of Health has made a concerted effort to ensure that summary data from genome-wide association studies is freely available to researchers, and to require researchers to bank genetic data from NIH-funded studies in online repositories. But in a policy change announced August 29, the National Human Genome Research Institute (NHGRI)- along with The Wellcome Trust and the Broad Institute — took a cautionary step backward, limiting access to the very same data for which they’ve advocated greater sharing.

The move was prompted by the discovery that, with enough genomic data on an individual, it is possible to determine whether that individual participated in a given genetic study by analyzing pooled summary data such as that readily available on NIH’s dbGaP or CGEMS Web sites until recently. In the August 29 issue of PLoS Genetics, David W. Craig and colleagues at the Translational Genomics Research Institute (TGen) in Phoenix and the University of California, Los Angeles, spelled out a methodology by which an individual genotype could be detected, probabilistically, from a mix of DNA samples or from pooled data sets of aggregate single nucleotide polymorphisms. ...

In a letter to Science magazine published online September 4, NIH Director Elias Zerhouni and National Heart, Lung and Blood Institute Director Elizabeth Nabel said that, in addition to having important implications for forensics and genome-wide studies, the TGen/UCLA research “has also changed our understanding of the risks of making aggregate genomic data publicly available.”

“Sharing genomic data and, particularly, allele frequencies has become common practice, if not an imperative, in science,” Zerhouni and Nabel wrote. “Yet, the protection of participant privacy and the confidentiality of their data are of paramount importance.”

Informed by Craig in advance of the paper’s publication that study participants’ genetic information privacy could be compromised, NIH moved quickly to remove aggregate genomic data from public access. Such data is now sealed off behind a firewall, accessible to researchers only after an application and review process and subject to specific terms and conditions of use. The change essentially treats aggregate data as individual-level genotype/phenotype data, to which access was already controlled because of perceived privacy vulnerabilities. ...

The move has nonetheless caused ripple effects throughout the genetics research community, as universities mull whether to pull data from their own Web sites and grapple with issues of informed consent in the face of the apparent vulnerabilities to participant confidentiality. ...