Cancer Research Falls Short on Data and Code Sharing

Sharon Worcester, MA

December 02, 2022

Sharing data and code in medical research helps ensure the reliability and reproducibility of the findings, but this sharing occurs infrequently in oncology, a new study suggests.

Even when researchers declared that their data and code were available, the information was rarely readily accessible.

Specifically, fewer than 20% of cancer-related articles that were analyzed stated that their study data were publicly available. Only 16% made the data accessible, and just 1% complied with best sharing practices, such as posting the information to a recognized repository.

The current analysis found that data and code sharing occurs "at a lower rate than would be expected given the prevalence of mandatory sharing policies" and that there was "a large gap between those declaring data to be available, and those archiving data in a way that facilitates its reuse," say Daniel G. Hamilton, a PhD candidate at the University of Melbourne, Australia, and colleagues.

The findings were published online last month in BMC Medicine.

Concerns about the reliability of scientific findings continue to mount. One major obstacle has been the low rates of publicly available data, code, and other research materials that allow experts to analyze and reproduce research claims.

The importance of data and code sharing in medical research has become particularly clear given the high failure rates of clinical trials, the growing costs of drug development, and the increasing demand for more effective treatments.

The current study aimed to provide an accurate assessment of how often oncology researchers declared that their data and code were available and how often both actually were.

The cross-sectional analysis examined 306 randomly selected cancer-related articles indexed in PubMed in 2019. The authors found that about 1 in 5 articles (59/306) reported that their data were available and that 4% of articles that used inferential statistics reported that code was available (10/274).

However, only 16% (49 of 306) provided access to data, and fewer than 1% (1 of 306 articles) complied with key FAIR principles, which include posting data to a recognized repository, providing an outline of a data license, and presenting the data in a nonproprietary format.

Furthermore, although 88% (45 of 51 articles) included a data availability statement when required to do so by the journal, fewer than half of articles (14 of 29) that were published in journals that had a mandatory data sharing policy did in fact make their data available, and no articles in journals with mandatory code sharing policies (0 of 6) complied with those policies.

Notably, the authors also found that in journals with mandatory data sharing policies, researchers were nearly 10 times more likely to share data than those whose articles were published in journals that had no such policy (odds ratio [OR], 9.5). An association was also observed for articles published in journals that required authors to share data but under more limited circumstances (OR, 3.5). In addition, in journals that encouraged but did not require data sharing, authors appeared no more likely to share data than authors whose studies were published in journals that did not have a data sharing policy (OR, 1.1).

The findings are consistent with other studies that show "low, but increasing, declaration rates," the authors of the current study say. However, previous studies have reported even lower declaration rates. One such study found that 6% of cancer researchers declared data, and none declared code.

"Encouragingly, we note [declaration] rates that were three to four times higher," the authors say. They note that this increase "is likely due to the growing number of medical journals that are adopting stronger policies on data and code sharing."

The authors recommend that journals adopt editorial workflows to ensure that researchers comply with these data and code sharing policies.

In addition, the authors suggest that cancer researchers who plan to publicly archive research data and code consult resources such as the Registry of Research Data Repositories at to find the most appropriate repository and that they provide "as much clarity as possible on the conditions governing access and reuse of their research data and code."

The study was funded by Merck. Smith has received grant funding from Merck. Jones reports no relevant financial relationships.

BMC Med. Published online November 9, 2022. Full text

Sharon Worcester, MA, is an award-winning medical journalist based in Birmingham, Alabama, writing for Medscape, MDedge and other affiliate sites. She currently covers oncology, but she has also written on a variety of other medical specialties and healthcare topics. She can be reached at or on Twitter: @SW_MedReporter.

For more news, follow Medscape on Facebook, Twitter, Instagram, and YouTube.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.