As part of an ongoing effort begun in 2008, an NIH-funded research team collects and manages data for Large National Survey wherein consented participants agree to allow a public use data set made available for researchers. Data elements in the survey include preferences for self-reported health status. Participants also authorize the release of their Medicare Claims data and linkage to public data sets for future research. The informed consent authorizes NIH to publicly post data sets deidentified in accordance with the HIPAA Safe Harbor Standard.
In 2013, the CDC funds Max Researcher to use the NIH-funded data set for conducting exploratory research. Max integrates the restricted data set with publicly available geocoded datasets from the CDC, Census Bureau, about socio-economic and environmental health risk factors that might predict the best treatments for chronic disease with more patient-specific and subgroup-level accuracy.
Three years into Max’s exploratory research grant (2016), Harvey Hacker at Computer Science University demonstrates that he can apply linear programming methods to uniquely identify two of the individuals in the NIH-funded data set by combining it with voter registration records and the same geocoded data sets being used by Max Researcher. These two individuals represent 0.01% of the Large National Survey Population. Harvey alerts Large National Survey before he publishes his findings in Computer Science Journal and as a New York Times Op Ed.
Approximately 20% of the participants in the National Survey see the story and exercise their rights to withdraw their data from the Large National Survey by sending Max Researcher a signed letter with their designated subject ID, which is provided each time the survey is administered.
In response, the Large National Survey removes publicly available online data set and changes agreements to require assurances from users that they will not combine either restricted or public use data with other data sets. Max Researcher destroys the data she has been using, shuts down her lab, and takes a job at Venture Capital Drug Discovery Firm, which uses privately brokered data sets with greater utility for exploratory research. Max cannot publish her findings under the auspices of her new organization until patents for treatments are granted.
Questions:
Title | Response |
---|---|
Description | Under terms of funding from NIH, data sets collected with public funds must be made available while protecting privacy. Privacy researchers have shown that such data sets do not truly protect privacy, an issue that has received substantial public attention. This has resulted in more conservative approaches by data stewards, increasing barriers to data use by researchers. |
Primary actor/participant | Researcher, Data Stewards |
Support actor/participant | Funding agency |
Preconditions |
|
Post conditions |
|
Alternatives |
|
Considerations |
|
Data Elements Considered | Survey, Laboratory, Demographic, and Geocoded Data about Environmental Risks, consented data from administrative claims |
Purpose of the Data Collection | Research |
Purpose of Data Use | Research |
Terms of Transfer to the Data Holders | Consent |
Terms of Transfer to Researchers | IRB approval, agreements negotiated with data stewards |
|