Changing the Culture of Data Management and Sharing: A Report on the NASEM Workshop
Creators


The National Academies of Sciences, Engineering, and Medicine (NASEM) hosted the two-day virtual workshop "Changing the Culture of Data Management and Sharing" on 28th-29th April 2021 to discuss the challenges and opportunities for establishing effective data management and sharing practices and exploring the question of universal availability of scientific data. In advance of the new NIH Data Management and Sharing Plan that will be implemented in 2023, this virtual workshop was of great interest to researchers, funders, and publishers and had over 1200 attendees. GigaScience Data Scientist Chris Armit attended the workshop and reports below on some of the major highlights. Also collecting together the video streams of the event now they have gone live.
NIH Data Management and Sharing Policy
As GigaScience Editorial Board member Maryann Martone
(University of California, San Diego) explained in the opening session,
the NIH Data Management and Sharing Policy was released in Oct 2020, and
a Data Management and Sharing Plan will be required for all NIH-funded
research from 25th Jan 2023. The Policy goals are as
follows:
- Increase scientific transparency and public trust
- Improve reliability / reproducibility
- Enable reuse of valuable data
- Accelerate discoveries
Richard Nakamura (NIH Center for Scientific Review) followed on from this and presented "Goals for the NIH Data Management and Sharing Policy" which highlighted the benefits of Data Sharing and FAIR Data. These benefits included reproducibility, and the possibility of meta-analyses that would enable new conclusions to be drawn from existing datasets. Richard further highlighted the need for a culture shift to ensure universal availability of scientific data, but was swift to point out that this requires cooperation across a diverse research environment that includes researchers, funders, and publishers.
What Has the COVID-19 Pandemic Taught Us About Data Sharing
and Open Science?
So what infrastructure is needed to enable Data Sharing? Patricia
Brennan (Director, National Library of Medicine) touched on this in her
Keynote Presentation entitled "What Has the COVID-19 Pandemic Taught
Us About Data Sharing and Open Science?" Patricia highlighted
critical steps taken by NIH – such as investing in data repositories,
hub-and-spoke frameworks with a Common Cloud Infrastructure, and data
life cycle plans that ensure consent and reuse – that have been
particularly helpful for promoting Data Sharing during the pandemic. As
Patricia explained, the two key projects that were most relevant for
COVID-19 were: 1) Post-Acute Sequelae of
SARS-CoV-2 Infection (PASC) Initiative, and; 2) Rapid Acceleration of
Diagnostics (RADx), which together cost NIH almost $2 billion and
which had core principles of Data Sharing – including patient
consent and patient de-identification – integrated into their project
plans from the very beginning. So what happens when COVID-19 data are
shared and science is open? Vaccines entered Phase 3 Clinical Trials
within 180 days. In addition, there was timely characterisation of
variants, rapid evaluation of therapeutics and real-time interventions
and, importantly, the public were reassured that steps were being taken
to combat the pandemic. Are there any messages here for future studies?
As Patricia explains, "consent is important" and this leads to
the question of whether there should be generalized consent for human
patient data to enable clinical data to be shared more easily.
The Need to Incentivise Data Sharing
So what is this culture shift that Richard Nakamura feels is
necessary? Additional perspectives on this topic were provided in the
panel discussion at the end of the opening session. In particular,
Alexander Ropelewski (Brain Image Library) hinted that there may be
barriers at the institutional level that prove troublesome for
the data-submitting labs who wish to submit their data to a centralized
repository. From a clinical data perspective, Joshua Wallach (Yale
School of Public Health) echoed this sentiment and highlighted the need
to incentivise data sharing in the clinical community. As Joshua
explained, in the clinical research environment it is more likely for
data to be 'kept within the group'. As a corollary to this,
Atal Butte (UCSF) was swift to point out that DOIs are insufficient as
an incentive, and that we should be thinking in terms of profit when
proposing motivations for a Data Sharing culture shift.
On a related note, in the later panel discussion in the session entitled "Data Quality and Other Factors that Make Data More Likely to be Reused", Jan Bjaalie (University of Oslo, EU Human Brain Project) further commented that researchers could benefit from "a business model" that incentivises data reuse.
Measuring Success in Data Sharing Strategies
So what does successful Data Sharing look like? In the session entitled,
"Strategies for Managing and Sharing Data: Diverse Needs and
Challenges", David Haussler (UC Santa Cruz) explained how the
Global Alliance for Genomics and Health (GA4GH) strategy changed from a
Data Commons to a Data Federation. As a founding
member of GA4GH, David explained that the original idea of a Data
Commons, which would aggregate data from multi-national datasets, did
not work due to a fundamental 'lack of trust' in the
international community. The Data Federation approach – which would host
data locally – ensures privacy preservation and this has proven to be a
more successful strategy for Data Sharing. As David explains, "the
most important issue in Data Sharing is respect. Respect for those who
make the science possible by contributing data. But that respect has to
be earned with an equal amount of generosity." GigaScience
are organisational
members of GA4GH, and for more see the recent GigaBlog on the GA4GH 8th
Plenary Meeting.
From a repository perspective, the talk by Rebecca Koskela (Research Data Alliance) was particularly informative. Rebecca highlighted a survey the RDA performed that questioned, "If your data are not available to others, why or why not?" By conducting the same survey on a 4-year cycle, some interesting changes in perceptions are emerging over this 12-year period of investigation, with insufficient time and a lack of funding being less commonly held viewpoints in the more recent surveys. As Rebecca explains, the advantages of Research Data Repositories include: avoidance of data generation costs; efficiency of data management; long-term usability and reuse of data; transparency of scientific results; and value-added data products. Rebecca additionally highlighted the perspective of journals and publishers, and explained that Journal Open Data Sharing Policies "grew organically", whereby there was an early understanding that a citable DOI was needed, but over time a need for adopting FAIR Principles was additionally deemed of immense value.
Jeremy Wolfe (Brigham & Women's Hospital, Harvard Medical School) additionally highlighted his take on the Journal Editor perspective and pointed out "there is a tension between full reporting and the desire to publish science that moves the field forward". Jeremy pointed out that, from a researcher perspective, the structure of Basic Experimental Studies with Humans (BESH) "does not fit well with formats like ClinicalTrials.gov", but also that from a Journal Editor perspective he does not want journals flooded with extraneous data that dilute the core message of the scientific findings in a manuscript. Jeremy offers one potential solution to this dilemma by splitting research findings into Grant Progress Reports and Peer-Reviewed Publications.
Value and Costs of Managing and Sharing Data
In the session entitled "Value and Costs of Managing and Sharing
Data", John Borghi (Lane Medical Library, Stanford University)
explored data sharing in practice and offered insights into the
perceived value of data management and data sharing. Key motivations for
data management were the need "to ensure access for
collaborators", "to foster openness and reproducibility",
and "to prevent loss of data". Ana Van Gulick (FigShare)
further addressed this point and highlighted that "measuring data
impact is important", but was also swift to point out that
"data citation is still an emerging practice without clear
standards" and that it is "still early to see large-scale reuse
of open data". As Ana explained, the perception of data sharing is
dependent on recognition of value from both funders and host
institutions.
Is Data the New Oil…?
Daniel Goroff (Vice President and Program Director, Alfred P. Sloan
Foundation) was undoubtedly the best-dressed speaker at the workshop and
gave a very interesting talk from an economist's perspective that
highlighted the potential pitfalls of reusing sensitive data. In the
session entitled "Data Quality and Other Factors that Make Data More
Likely to be Reused", Daniel questioned whether "Data is the
New Oil" and highlighted that reference datasets, if open, can
potentially represent a public good. As Daniel explains, "Oil is
rival. Once you use up a barrel, no one else can use that same
barrelful". In contrast, a public good as a commodity is
'non-rival' and does not get used up. A public good has the additional
property that it is non-exclusive. However, as Daniel further explains,
"There are laws and expectations about people's data. Data reuse
threatens research validity if the results are not accurate. Data reuse
threatens research legality if the results are not privacy preserving.
Reusing data requires inevitable trade-offs between privacy and
accuracy".
In the panel discussion, Daniel highlighted the moral and legal considerations of data reuse as it relates to human data. Somewhat controversially, Daniel suggested that data reuse in this scenario may actually be cost-ineffective, and that generation of primary data tailored to a research question could be more beneficial. As Daniel explains, "If you tell me that we could spend 2 or 3% of the federal budget that normally goes to research on making data available for reuse, I think that sounds great…If you tell me that its 20% of the federal budget, I begin to wonder…If its 30 or 40% then I'm not sure how much its worth it, as opposed to generating new research". Daniel highlighted that what is needed here are statistical measures of reuse to ensure that data science "*discoveries"* are 'true and not false', and that these are sadly lacking in journals and data repositories. For more details of this type of question see the following slides from one of Daniel Goroff's previous talks).
The Importance of Consent – an Issue for Citizen
Science?
Consent to reuse data was a major theme at the workshop, as was the
issue of respect. Anita Allen (University of Pennsylvania Law School),
in the session entitled "Implementing the NIH Data Management and
Sharing Policy: The Evolving Ethics of Data Sharing" echoed this
sentiment in a very thought-provoking way and highlighted a poignant
legal case in the University of Pennsylvania where human remains of a
young girl who died in the 1985 MOVE bombing were used as an
anthropological use case study without consent from the parents. The
Penn Museum has now apologized for keeping the remains, and according to
The Philadelphia Inquirer,
"the museum said the remains should have been returned, and pledged
to reassess its practices".
On a related note, Mark Rothstein (Herbert F. Boehl Chair of Law and Medicine, Founding Director of the Institute for Bioethics, Health Policy and Law at the University of Louisville School of Medicine) – in the session entitled "Shaping a Culture of Data Sharing – Reducing Barriers and Increasing Incentives" – highlighted potential issues with Citizen Scientists that may wish to reuse data, but that are not subject to Common Rule or FDA research regulations. From a legal perspective, Mark actually questions whether there should be restricted access for citizen scientists rather than the more familiar open access.
Data Availability in the Time of COVID-19: A Publisher's
Perspective
So, from a publishing perspective, has the COVID-19 pandemic
incentivized the sharing of clinical data? As members of the data
working group of the C19 Rapid Review Consortium this was a very
interesting question for us. In the session entitled, "Encouraging
Data Sharing Outside of Mandates", Ashley Farley (Bill &
Melinda Gates Foundation) highlighted the lack of Data Availability
Statements in COVID-19 related articles. Referring to a report by Georgina
Humphreys, who is Clinical Data Sharing Manager at Wellcome, Ashley
highlighted that only 9% of COVID-19 research articles in Europe PMC
have any data availability statement to indicate where, and under what
conditions, the data can be accessed. Referring to the original Wellcome
report, it is additionally noteworthy that this low percentage of data
availability statements in COVID-19 articles is compared to "22% of
all research articles published in 2020". This data publisher's
perspective is an interesting counterpoint to the Keynote talk by
Patricia Brennan, which had only emphasised the positive aspects of Data
Sharing in the time of COVID-19.
In summary, this workshop explored many different facets of Data Sharing, and raised more questions than answers. I agree with Richard Nakamura that a culture shift is needed, and that there needs to be coordination between researchers, funders, and publishers, but there are invaluable insights from policy makers, economists, and legal teams that additionally need to be considered to ensure that this culture shift is ethical, and that due consent is given to patient privacy.
The webcast was recorded and the entire video playlist is made publicly available at the following link.
The post Changing the Culture of Data Management and Sharing: A Report on the NASEM Workshop appeared first on GigaBlog.
Additional details
Description
The National Academies of Sciences, Engineering, and Medicine (NASEM) hosted the two-day virtual workshop "Changing the Culture of Data Management and Sharing" on 28th-29th April 2021 to discuss the challenges and opportunities for establishing effective data management and sharing practices and exploring the question of universal availability of scientific data.
Identifiers
- UUID
- 5198d5c3-c928-47e4-84c2-115fb469bbed
- GUID
- http://gigasciencejournal.com/blog/?p=3932
- URL
- https://wayback.archive-it.org/22098/2025-05-01T17:13:42Z/http://gigasciencejournal.com/blog/culture-of-data-sharing
Dates
- Issued
-
2021-05-26T12:53:37
- Updated
-
2021-05-26T12:53:39