Big Data in Oncology: Costly Fad or Invaluable Tool?

Nick Mulcahy

November 24, 2014

"Big data" is a buzz phrase that oncologists have been hearing for a while now.

Some of the biggest names in cancer — Memorial Sloan Kettering in New York City, M.D. Anderson in Houston, Johns Hopkins in Baltimore, and the American Society of Clinical Oncology — are part of big data initiatives.

And the technology and business partners in various high-profile big data projects include some giants in computing, such as IBM, Google, and Toshiba.

The assembled organizations and companies possess a lot of public relations power, and the media have responded. In just the past year, two major news outlets — US News & World Report and Fortune — have published stories that were both hyperbolically entitled Can Big Data Cure Cancer?

Some big claims by champions of big data have contributed to the big buzz.

For example, according to the American Society of Clinical Oncology (ASCO), its big data initiative and health information technology platform, known as CancerLinQ, "will revolutionize how we care for people with cancer."

"Shoppers have Amazon. Students have Google. Oncologists will have CancerLinQ," according to a 2013 CNN report, which suggested that the would-be big data tool will be a broad resource that meets many clinical needs.

But what exactly is big data, and what can it accomplish in the cancer clinic? The very people who are supposed to be the end-users of the data are not clear about that, said one prominent oncologist.

"There is a lot of confusion and uncertainty among oncologists about what big data means," Robert Carlson, MD, chief executive officer at the National Comprehensive Cancer Network (NCCN), told Medscape Medical News.

There is a lot of confusion and uncertainty among oncologists about what big data means.

The term big data is used in a variety of ways, but its developers claim some common qualities and efforts. Those include attempts to standardize oncology clinical data, improve quality of care through the more widespread collection of patient and disease information, and manage the burgeoning results of molecular and genetic testing.

In fact, the lack of comprehension about big data by oncologists might have a simple explanation, another expert suggested.

"We don't have a working product yet," said Robert Miller, MD, oncology medical information officer at the Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins University.

Furthermore, there is "a lot of skepticism and cynicism" among clinicians about information technology tools in the clinic because of the hype about how electronic health records (EHRs) would cut costs and improve efficiencies, he said in an interview with Medscape Medical News.

Nevertheless, big data and EHRs go hand in hand, Dr Miller explained.

He said that clinicians will eventually receive big data insights through EHRs. In other words, big data initiatives could improve clinical decision-making at the point of care. Big data "has to be embedded in EHRs; that's a tall order," he said. "This type of information has to be presented to you seamlessly."

"The skepticism will persist until products work," he argued.

Dr Miller is, however, a believer in the potential of big data. "Medicine, including oncology, needs to catch up with virtually every other industry," he said.

In contemporary industries, "the power of data aggregation and analysis is fundamental to daily operations," Dr Miller explained.

Huge amounts of data on cancer patients have not been analyzed in depth because most patients (about 97%) in the United States are not part of large-scale clinical trials, he said, echoing the words of other big data advocates. A lot of clinical insights might be waiting to be discovered from data collected during routine clinical practice, the argument goes.

Big data should be part of the future of the practice of medicine, summarized Dr Miller, who will be leaving Johns Hopkins in December to become medical director of the ASCO Institute for Quality.

It is not difficult to find prominent basic science cancer researchers and academic clinicians who agree with Dr Miller.

For example, the nonprofit Cancer Commons, whose motto is Donate Your Data, is dedicated to recruiting individual patients to voluntarily participate in data collection that will be freely available to researchers. The organization's website is chock full of video testimonials from big names in oncology, such as Frank McCormick, PhD, former president of the American Association of Cancer Research, and Laura Esserman MD, MBA, a surgeon and breast cancer oncology specialist at the University of California, San Francisco.

In her video, Dr Esserman states that if a patient shares his or her data, scientists can analyze it "smarter, better, faster."

"Your story holds the cure, so share it," she concludes.

Not everyone in academia is sold on the idea that more data is either better data or transformative, including a preeminent cancer researcher.

Robert Weinberg, PhD, from the Massachusetts Institute of Technology in Cambridge, who is credited with discovering the first human oncogene, RAS, and the first tumor suppressor gene, Rb, said in an interview earlier this year that there has already been a great deal of mining of cancer data, but "relative to the effort that's been put into it, there's been little in take-home lessons" for clinicians.

Another critic said that big data can be understood, in part, by "following the money."

"The big data fad is the direct result of a quarter century of hype about electronic medical records promoted by the health policy elite with the encouragement and financial assistance of the computer industry," writes Kip Sullivan in an essay published on the Physicians for a National Health Program website. Sullivan blogs for the site and is the author of a book entitled The Health Care Mess: How We Got Into It and How We'll Get Out of It.

As a case in point, Sullivan notes that the theme of the July issue of the journal Health Affairs was "using big data to transform care," and was funded by vested interests such as IBM and the UnitedHealth Foundation.

"We need to start paying attention to financial incentives that influence health policy experts," he recommends.

Big Data Is Observational Data for the Most Part

Oncologists make treatment and patient management decisions based on a number of resources, said Dr Miller.

Those resources include their reading of published randomized controlled trials and meta-analyses. But "that's a very shallow pool in terms of numbers," he said, and is limited by the types of patients who typically enroll in clinical trials (i.e., younger, healthier, and whiter than the general cancer population).

Clinicians also rely on personal experience and the recommendations of colleagues. "We all recognize that this is subject to bias," he explained.

By collecting data on scores of patients — and the treatment and management preferences of their oncologists — from routine clinical practices, big data initiatives could expand this pool of colleagues, he said. "You will be relying on the experience of thousands of oncologists," he noted.

However, it is generally accepted that this kind of aggregation shows, at best, correlations, and is not considered high-level data, because it is not the fruit of a hypothesis and experimental design, such as a randomized clinical trial.

"At most, one can develop correlations between complex datasets...and prognosis, i.e., future behavior in the oncology clinic," writes MIT cancer researcher Dr Weinberg in an essay published in Cell earlier this year (2014;157:267-271).

In his essay, Dr Weinberg explains that the collection of data on the molecular characteristics of cancers has become "an almost addictive undertaking," as genomics and other research methods produce "staggering amounts" of cancer data.

Such data are observational by definition, and their limitations would also apply to big data collected on, for example, chemotherapy choice.

There is already a lot of observational data that has been collected in oncology, especially from the Surveillance, Epidemiology, and End Results (SEER)–Medicare databases, which is regularly mined by researchers for clinical insights.

The SEER–Medicare project is sponsored by the National Cancer Institute (NCI), and has details on 1.6 million cancer patients and their treatments, outcomes, and second cancers, among other data, according to an NCI report.

However, a key difference between large datasets such as SEER–Medicare and big data is that the "volume and variety of big data is so much greater," said Dr Miller.

There are shortcomings with the SEER–Medicare data, including an absence of information on disease recurrence and no information about the molecular characteristics of tumors. Big data can help fill in the gaps, he said.

Gathering data on the molecular and genetic characteristics of cancers is part of the enterprise of getting oncology up to date, Dr Miller explained.

We are still basing our clinical decisions on 1980s technology.

"We are still basing our clinical decisions on 1980s technology," he said, referring to tools that establish the phenotype of a cancer (as opposed to the genotype or another more sophisticated biomarker).

However, Dr Carlson, from the NCCN, pointed out that the clinical importance of the molecular characteristics of tumors has not been widely established.

Instead, molecular testing with proven clinical value is limited to a few specific cancer types, he argued. He cited, as examples, DNA-based KRAS testing in colorectal cancer, which is predictive of response to the targeted therapies cetuximab (Erbitux, Bristol-Myers Squibb) and panitumumab (Vectibix, Amgen); testing for EGFR mutations in lung cancer, which is highly predictive of response to targeted therapies such as erlotinib (Tarceva, Genentech/Astellas); and testing for HER2 status in breast cancer, which is predictive of response to HER2-targeted agents such as trastuzumab (Herceptin, Genentech).

The list is not long, Dr Carlson pointed out. High-quality evidence is not there for the vast majority of molecular drivers, he said.

When asked to provide an example of a potential insight that big data could provide to a practicing oncologist, Dr Carlson proposed the management of "unusual, unexpected toxicities."

He said a very large pool of cancer patients could provide insight into how to manage such rarities and predict outcomes. Or big data might be able to identify characteristics of patients who have, say, an increased likelihood of neuropathy.

"We've got a long way to go," Dr. Carlson said about any implementation of big data insights into cancer care. "Hard-core computer systems engineering" will be needed to bring it about.

"Information technology tools need to facilitate care, not impede care," he added.

The NCCN is not in the big data business, but it is benefiting from the explosion of information technology in healthcare, Dr Carlson suggested.

The organization has 30 collaborative agreements with information technology vendors to provide its signature cancer management algorithms, which are a mix of data and opinion, for developers to use. That, in turn, will facilitate the embedding of guidelines into computer systems, including EHRs, he said.

The full range of NCCN guidelines has been licensed to the IBM Watson supercomputer program, and are part of the backbone of an experimental automated treatment algorithm in lung cancer, as previously reported by Medscape Medical News.

The supercomputer has synthesized an array of data gleaned from thousands of sources, including journal articles, national guidelines, individual hospital's best practices, clinical trials, and even textbooks.

The promise of big data has been greatly oversold.

"To my knowledge, Watson is the only functioning artificial intelligence in oncology," said Dr Carlson. However, a prototype was being used in two different oncology clinics in the United States last year.

Despite being an enthusiast about the computerized uptake of the NCCN guidelines, Dr Carlson was reserved about what can be accomplished in the clinic with ever-more data. "The promise of big data has been greatly oversold," he said.

Dr Miller and Dr Carlson have disclosed no relevant financial relationships.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.