Reminder: Do not include any PHI or PII in Confluence. If you require 508 accessibility assistance or any other support for this system, then please send an email to onc-jira-questions@healthit.gov
Please provide any feedback regarding this scenario in the comment form below or by clicking here. |
---|
As part of an ongoing effort begun in 2008, an NIH-funded An NIH research team collects and manages data for a Large National Survey wherein consented participants agree to allow a public use data set made available for researchers. Data elements in the survey include preferences for self-reported health status. Participants also authorize the release of their Medicare Claims data and linkage to public data sets for future research. The informed consent authorizes NIH to publicly post data sets deidentified in accordance with the HIPAA Safe Harbor Standard. linkage of their survey data to administrative healthcare claims and blood samples used for laboratory testing of biomarkers for chronic disease. Until 2014, the research team stewarding the data had two separate releases: a public release of an anonymized data set (Public Data Set) and a restricted release (Restricted Data Set) with the claims data and laboratory tests, but no direct identifiers. Participants consent to terms of release for both of these data sets, with the understanding that greater scrutiny and safeguards are required for researchers using the restricted data, and that the Public Data would be available online.
In 2013, NIH funded Max Researcher to use the Restricted Data Set conduct exploratory Precision MedicineIn 2013, the CDC funds Max Researcher to use the NIH-funded data set for conducting exploratory research. Max integrates the restricted data set with publicly available geocoded datasets from the CDC, Census Bureau, about socio-economic and environmental health risk factors that might predict the best treatments for chronic disease with more patient-specific and subgroup-level accuracywith greater precision.
Max produces a number of findings that increase the precision with which physicians can select treatments, particularly for ethnic Samoans with asthma living in proximity to freeways and septuagenarians living at high altitudes with incident cancer.
Three years into Max’s exploratory research grant (2016)the Exploratory Precision Medicine grant, Harvey Hacker at Computer Science University demonstrates demonstrated that he can could apply linear programming methods to uniquely identify two of the individuals in the NIH-funded data Public Data set by combining it with voter registration records and the same geocoded data sets being used in use by Max Researcher. These two individuals represent 0.01% of the Large National Survey Population. Harvey alerts Large National Survey before he publishes his findings in Computer Science Journal journal and as a New York Times Op Ed. Approximately 20% of the participants in the National Survey see the story and exercise their rights to withdraw their data from the Large National Survey by sending Max Researcher a signed letter with their designated subject ID, which is provided each time the survey is administered.
In response, the Large National Survey removes publicly publically available online data set and changes agreements to require assurances from users that they will not combine either restricted or public use data with other data sets. Max Researcher destroys the data she has been using, shuts down her lab, and takes a job at Venture Capital Drug Discovery Firm, which uses privately brokered data sets with greater utility for exploratory researchprecision medicine. Max cannot publish her findings under the auspices of her new organization until patents for precision treatments are granted.
Questions:
- What is the best way to manage funding agencies’ mandates for data sharing with privacy concerns?
- What ethical obligations does Harvey have to share or protect the algorithms he used? What obligations do cryptographers have to accurately communicate privacy risks? What obligations do scientific journals carry to publish or protect methods that might be used for unethical purposes?
- Should Large National Survey research participants be alerted of cryptographers’ findings as newly identified risks?
- With the newly published algorithms, several other publicly publically available research data sets are vulnerable: Should these data sets also be removed? Should participants be warned?
- The Large National Survey Data Stewards create a computing enclave where the Public Use data can be accessed and analyzed but not downloaded. The capacity limitations render many types of analysis infeasible. What alternatives exist?
- What standards or guidelines exist now for assessing tradeoffs between privacy risks and utility?
- Should researchers working under IRBs receive any special status or trust that would distinguish them from members of the public so that they might combine data sets to add value to the data?
- How can the risks of reidentification be balanced against the potential loss of valuable health insights that result from the removal of data sets from the public domain?
- How can the risks of reidentification be balanced against the burden of limited data access for researchers and potential loss of health insights (e.g., when a researcher removes themselves and their entire research program from the public domain)?
Title | Response |
Description | Under terms of funding from NIH, data sets collected with public funds must be made available while protecting privacy. Privacy researchers have shown that such data sets do not truly protect privacy, an issue that has received substantial public attention. This has resulted in more conservative approaches by data stewards, increasing barriers to data use by researchers. |
Primary actor/participant | Researcher, Data Stewards |
Support actor/participant | Funding agency |
Preconditions |
|
...
Post conditions |
|
Alternatives |
|
...
| |
Considerations |
|
Data Elements Considered | Survey, Laboratory, Demographic, and Geocoded Data about Environmental Risks, consented data from administrative claims |
Purpose of the Data Collection |
...
Precision Medicine | |
Purpose of Data Use | Research |
Terms of Transfer to the Data Holders | Consent |
Terms of Transfer to Researchers | IRB approval, |
...
Agreements negotiated with |
...
Data Stewards |
Anchor | ||||
---|---|---|---|---|
|