Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Please provide any feedback regarding this scenario in the comment form below or by clicking here.

As part of an ongoing effort begun in 2008, an NIH-funded An NIH research team collects and manages data for a Large National Survey wherein consented participants agree to allow a public use data set made available for researchers. Data elements in the survey include preferences for self-reported health status. Participants also authorize the release of their Medicare Claims data and linkage to public data sets for future research. The informed consent authorizes NIH to publicly post data sets deidentified in accordance with the HIPAA Safe Harbor Standard. linkage of their survey data to administrative healthcare claims and blood samples used for laboratory testing of biomarkers for chronic disease. Until 2014, the research team stewarding the data had two separate releases: a public release of an anonymized data set (Public Data Set) and a restricted release (Restricted Data Set) with the claims data and laboratory tests, but no direct identifiers. Participants consent to terms of release for both of these data sets, with the understanding that greater scrutiny and safeguards are required for researchers using the restricted data, and that the Public Data would be available online.

In 2013, NIH funded Max Researcher to use the Restricted Data Set conduct exploratory Precision MedicineIn 2013, the CDC funds Max Researcher to use the NIH-funded data set for conducting exploratory research. Max integrates the restricted data set with publicly available geocoded datasets from the CDC, Census Bureau, about socio-economic and environmental health risk factors that might predict the best treatments for chronic disease with more patient-specific and subgroup-level accuracywith greater precision.

Max produces a number of findings that increase the precision with which physicians can select treatments, particularly for ethnic Samoans with asthma living in proximity to freeways and septuagenarians living at high altitudes with incident cancer.

Three years into Max’s exploratory research grant (2016)the Exploratory Precision Medicine grant, Harvey Hacker at Computer Science University demonstrates demonstrated that he can could apply linear programming methods to uniquely identify two of the individuals in the NIH-funded data Public Data set by combining it with voter registration records and the same geocoded data sets being used in use by Max Researcher. These two individuals represent 0.01% of the Large National Survey Population. Harvey alerts Large National Survey before he publishes his findings in Computer Science Journal journal and as a New York Times Op Ed.   Approximately 20% of the participants in the National Survey see the story and exercise their rights to withdraw their data from the Large National Survey by sending Max Researcher a signed letter with their designated subject ID, which is provided each time the survey is administered.

In response, the Large National Survey removes publicly publically available online data set and changes agreements to require assurances from users that they will not combine either restricted or public use data with other data sets. Max Researcher destroys the data she has been using, shuts down her lab, and takes a job at Venture Capital Drug Discovery Firm, which uses privately brokered data sets with greater utility for exploratory researchprecision medicine. Max cannot publish her findings under the auspices of her new organization until patents for precision treatments are granted.

Questions:

  • What is the best way to manage funding agencies’ mandates for data sharing with privacy concerns?
  • What ethical obligations does Harvey have to share or protect the algorithms he used? What obligations do cryptographers have to accurately communicate privacy risks? What obligations do scientific journals carry to publish or protect methods that might be used for unethical purposes?
  • Should Large National Survey research participants be alerted of cryptographers’ findings as newly identified risks?
  • With the newly published algorithms, several other publicly publically available research data sets are vulnerable: Should these data sets also be removed? Should participants be warned?
  • The Large National Survey Data Stewards create a computing enclave where the Public Use data can be accessed and analyzed but not downloaded. The capacity limitations render many types of analysis infeasible. What alternatives exist?
  • What standards or guidelines exist now for assessing tradeoffs between privacy risks and utility?
  • Should researchers working under IRBs receive any special status or trust that would distinguish them from members of the public so that they might combine data sets to add value to the data?
  • How can the risks of reidentification be balanced against the potential loss of valuable health insights that result from the removal of data sets from the public domain?
  • How can the risks of reidentification be balanced against the burden of limited data access for researchers and potential loss of health insights (e.g., when a researcher removes themselves and their entire research program from the public domain)?

 


Title

Response

Description

Under terms of funding from NIH, data sets collected with public funds must be made available while protecting privacy. Privacy researchers have shown that such data sets do not truly protect privacy, an issue that has received substantial public attention. This has resulted in more conservative approaches by data stewards, increasing barriers to data use by researchers.

Primary actor/participant

Researcher, Data Stewards

Support actor/participant

Funding agency

Preconditions

  • Data Sharing Policies from funding agencies exist
  • Participants have been consented
  • Public data repository can be accessed and combined with other public data

...

Post conditions

  • The researcher collects and analyzes the data for a specific research study.
  • Data sets are removed from public access

Alternatives

  • Cryptographers do not alert data stewards before results are released
  • Cryptographers post code online and make it available for unrestricted use
  • In addition to Drug Discovery, Venture Capital Drug Discovery Firm is selling data to marketers about the likely identities of

...

  • individuals and their treating physicians for drug detailing.

Considerations

  • Conflicts between mandates of funding agencies for data sharing and privacy concerns
  • Conflicting interests between cryptography research publishing incentives, privacy of research participants, and
  • Public perception of risk vs. actual risk.

Data Elements Considered

Survey, Laboratory, Demographic, and Geocoded Data about Environmental Risks, consented data from administrative claims

Purpose of the Data Collection

...

Precision Medicine

Purpose of Data Use

Research

Terms of Transfer to the Data Holders

Consent

Terms of Transfer to Researchers

IRB approval,

...

Agreements negotiated with

...

Data Stewards

 

Anchor
mosaiceffect
mosaiceffect