Will Real-World 'Bias' of Big Data Improve Cancer Care?

Kathy D. Miller, MD; Peter Yu, MD


July 08, 2016

This feature requires the newest version of Flash. You can download it here.

Kathy D. Miller, MD: Hi. I am Kathy Miller, professor of medicine at the Indiana University School of Medicine in Indianapolis, Indiana. Welcome to Medscape Oncology Insights, coming to you from the 2016 annual meeting of the American Society of Clinical Oncology (ASCO).

We have been hearing a lot about big data over the past year or so. We need to think carefully about what the clinical value is of huge compilations of patient information. What can big data tell us that clinical trials cannot? What might some of the pitfalls of big data be?

I have found the perfect person to help us sort through these questions. I am pleased to welcome Dr Peter Yu. Peter is physician-in-chief of the Hartford HealthCare Cancer Institute and head of health informatics at the Memorial Sloan Kettering Cancer Alliance. He is also past president of ASCO. Peter, welcome.

Peter Yu, MD: Thank you. It is a great pleasure to be here.

Big Data Too Important to Entrust to Non-oncologists

Dr Miller: I really appreciate you taking the time from your busy schedule to talk to us about big data. I know this has been one of your passions for a long time.

Dr Yu: You're right, Kathy. It is such an important issue and we need to get this front and center to a broader audience, so I very much welcome this conversation today.

Dr Miller: This is also an area in which ASCO, our professional society, has started some really important initiatives. The flagship initiative is probably the CancerLinQ™ program. Can you tell our listeners about CancerLinQ?

ASCO feels that there are times when professionals, physicians, and oncologists need to take the lead in the conversation.

Dr Yu: Absolutely. ASCO feels that there are times when professionals, physicians, and oncologists need to take the lead in the conversation, because it is so important an issue that it should not be entrusted to others. We have a better understanding of what the needs are. Most people, at this point, have worked with electronic medical records (EMRs) and have had some experience and a lot of frustration with them.

Dr Miller: Many of us who have used them can tell which ones oncologists had a hand in developing, and maybe which ones were done elsewhere.

Dr Yu: I think so. A hand in developing, a hand in the input and language of the field, or domain experts who understand what doctors really need, what patients really need, and, more importantly, how work flows so doctors can work efficiently.

Dr Miller: Take us back to CancerLinQ. How did that initiative get started, and how far has it come?

Dr Yu: CancerLinQ traces back to the Institute of Medicine, now called the National Academy of Medicine, which held a series of workshops a few years ago about big data and what was called "rapid learning systems." The idea was, with the massive data that we will be acquiring in the decades to come, how can we learn from that data and create a system where we learn from real-world experiences, understand what is actually happening out in the field, and supplement what we learn from randomized clinical trials?

Dr Miller: How does all that data get brought together in CancerLinQ? Many of us use EMRs. We use a lot of different EMRs. They are not so good at talking to each other.

Dr Yu: They're very bad at talking to each other, in part because there isn't a trusted steward that can bring all of these elements together. That is one of the things that is lacking, and one of the things that ASCO is providing is being that trusted steward, trusted by our members, trusted by our doctors to bring the data together. We are asking our members to share their data and their electronic health records. To collect the data, to bring it together, is what ASCO will do. ASCO will begin to make sense of it and to explore it in a reasonable and trusted manner.

Big Data, Big Answers?

Dr Miller: When ASCO gets all of that data together, what are some of the questions that you envision that that big compilation of data might help us answer?

Dr Yu: There are several. One is quality improvement; it is the most obvious and one of our strengths at ASCO. For many years now, we have had Quality Oncology Practice Initiative® (QOPI), and now an electronic version of QOPI is being developed called eQOPI. It allows us to look at the retrospective performance of practices to see how they measured up with generally accepted quality guidelines. We would like to make that real-time. Rather than looking back and saying, "In the past 6 months you did really great here, but maybe could have done better there; think about that for next time, please," we would rather have real-time clinical decision support so that practices can say, "Maybe I should consider this, maybe I have not done that yet; time is slipping by and I need to do this." The first task will be to take our initiatives in quality improvement and measurement and make that into a real-time tool.

The second is hypothesis generation—looking at the data without any preconceived notion. A typical randomized clinical trial is set up with a specific question—generally an interventional question—and the design for a statistically accurate clinical trial to answer this question definitively.

Real-world big data—observational data—really complement the randomized clinical trial.

When you look at big data, there is opportunity to take it from a different point of view, which is to let the data speak to us. What hypothesis can we think about? What trends do we see that we do not understand but want to understand better? How can we dive deep into data to understand that, and maybe develop a randomized clinical trial to definitively answer [the questions]? Real-world big data—observational data or whatever you want to call it—really complement the randomized clinical trial.

CancerLinQ and SEER: Comrades in Arms

Dr Miller: We have seen that some studies have used things like the Surveillance, Epidemiology, and End Results (SEER) database. How does CancerLinQ compare to the SEER database? What might it allow us to do that we have not been able to do with the SEER database?

Dr Yu: First, I think that the SEER database is the gold standard, or the best source of cancer observational data. Why is this? Let's spend a couple minutes talking about that. The SEER database was created by the National Cancer Institute (NCI). It was designed as a research tool to actually ask questions. The NCI is about 50 or 60 years old, so this is not a new idea. The NCI itself has said that observational data are important—important for research use. And we should encourage that use.

The SEER database does not cover the entire US population but a good chunk of it. Twenty-five percent of the US cancer population is captured in SEER and in a statistically demographic matter to reflect the entire country's population. The data are coming out and being extracted with intense human curation by cancer registrars who work in hospitals and practices.

Many of the problems with observational data are often lack of data, inconsistency of data, and variability in collection. For the SEER database, this is as good as it gets. This has been massaged.

Dr Miller: For people like me who do not respond to the cancer registrar letters, were you looking for follow-up?

Dr Yu: That gets to another point, which is that data are only as good as what people enter. You can try to clean it up, but it takes a lot of human work, which is expensive and somewhat wasteful and inefficient. One of the things we are finding in CancerLinQ and other data sources is that we need to work with both the EHR vendors and our physicians to do a better job, frankly, of documentation so that it is more clear and accurate. We have all seen cutting and pasting; things like this really lead to the problem of not very good data.

One of the ways to make observational data more useful, more accurate, more reliable and trustworthy is to link datasets so that you can triangulate, fill in the gaps.

SEER data are highly accurate. Besides working with our practices to get better-quality data into their records so that better data come up to CancerLinQ, we have been having discussions with SEER for over a year now about linking data. One of the ways to make observational data more useful, more accurate, more reliable and trustworthy is to link datasets so that you can triangulate, fill in the gaps, and have a more complete understanding.

Dr Miller: Maybe also verify some of those data fields, because the SEER data are curated by humans. That is a very labor-intensive process, but that might also point out some inaccuracies in the electronic data that we are able to get directly.

Dr Yu: Absolutely.

Bias or a Real-World Reflection?

Dr Miller: There has been a hope that big data would get around some of the biases that might come into play in our clinical trial results. We have talked for years about the fact that only a tiny minority of adult cancer patients are treated on clinical trials, and they tend to be different in really important ways from our larger population. They tend to be younger, healthier, have a higher socioeconomic status, and are more geographically centered in urban rather than rural areas. Are there biases in big data that we are going to have to watch out for as well?

Dr Yu: Yes. "Bias" is a statistical term. There is an inherent implication that there is something wrong with bias, that it distorts what you are seeing in your conclusions. I think that it is not biases [in big data and clinical trials], because that suggests one is better than the other. I think it is more complementary. What we might call biases are actually reflections of what is happening in the real world.

Doctors use their clinical judgment. They read an article, they read a publication, they react to it, they go to yesterday's plenary session, they hear a discussion about extended length of treatment with aromatase inhibitors, and they come back to their clinics and talk to patients. They make judgments about how they interpret that data in light of their patient. They may decide that they are not going to follow the findings that were presented yesterday. They may decide, in this case, that they don't want to do extended aromatase inhibitor therapy. Is that a bias? Is that a bad thing or is that clinical judgment? The physician used his or her knowledge and experience to make a judgment. If you believe as I do, it is the latter. That is important to capture. That actually reflects the true value of yesterday's plenary session—the impact of that finding in the real world.

Big data are very good at showing us what we do but maybe not as good at telling us what we should do.

Dr Miller: I think you are right that "bias" is a loaded term that maybe is incorrect. Maybe it is better to say that big data are very good at showing us what we do and the outcome of what we do, but maybe not as good at telling us what we should do, because sometimes those clinical judgments that have been held for years might actually be wrong. It might take a clinical trial to show us that they are wrong. Reliance on big data might tend to reinforce those practices in things that we have never studied.

Measuring Outcomes: A 'Big, Hairy Problem'

Dr Yu: Right, so that opens up another big, hairy problem. I know that you didn't want to do that, but you did, so I am going to go there.

Dr Miller: I think it is one, though, that we need to talk about: how to use this really powerful tool with all of this information in ways that are responsible and going to be helpful.

Dr Yu: Correct. I do not think that we can accomplish that unless we start measuring outcomes. Much of our quality improvement is measuring process. Did you do this? Did you order that test? Did this thing happen? We don't do so well in thinking about what was the result of what we did. Our biases come into play and we make judgments: This is the way I was taught, I've always done it this way, I trust a person who presented at the plenary session because I know them personally. Those are biases—true biases.

The question is, did your judgment lead to a better outcome?

Dr Yu: At the end of the day, the question is, did your judgment lead to a better outcome? If it led to a better outcome, then we sometimes call that "warranted variation." You deviated from the pathway, you deviated from accepted rules, but you were right in this instance. In the rapid–learning system concept, we should capture those insights. We should understand when that override or decision not to follow the results of a clinical trial was, in fact, warranted and led to a better outcome. This can inform ASCO and others about our guidelines. We can say, "Okay, our guidelines did not quite catch this."

We need to go back and reread our guidelines to understand this, and then we need to put that into our computer systems and our decision-support systems. You begin to develop a virtual cycle of the randomized clinical trial—studying what actually happens, looking at the outcomes, measuring its value, and then coming back and improving the process. This might lead to a decision that we need another randomized clinical trial to answer the question. That's fine—it's not this or that, and/or. These are the tools in the toolkit. Sometimes I need a hammer, sometimes I need a Phillips screwdriver, sometimes I need a drill. But it's good to have all of those tools in the toolkit.

Compilations of 'Rare Bird' Information

Dr Miller: In our last minute, we should not forget that the other area where the big compilation of data might help us is with those really rare tumors where you may see one or two in an entire career—the ones that I still sometimes have to look up in the Textbook of Uncommon Cancer. There is never going to be a clinical trial for those rare birds, but the compilation could at least allow you to begin to inform some discussions with the next patient—that we have now seen, globally over the past 5 years, 30 patients with this tumor that was treated in these ways. This seems to be a better choice.

Dr Yu: Rare tumors, less common toxicities, and long-term toxicities. Because of limitation of the size of the patient population, and limitation on the length of follow-up in randomized clinical trials, these are three areas where observational data really help to fill in the gaps.

Dr Miller: Peter, thank you so much for coming to talk about this. It really is an incredibly powerful technology, and one that will give us insights for years to come.

Dr Yu: It is absolutely my pleasure.

Dr Miller: To you, thank you so much for joining us for this edition of Medscape Oncology Insights. This is Kathy Miller, reporting from ASCO 2016.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.