UNMC College of Public Health Grand Rounds
“The Importance of Reproducibility in High-Throughput Biology: Case Studies in Forensic Bioinformatics“
Keith Baggerly, Ph.D.
Professor, Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas
Wednesday, February 15, 2012
Modern high-throughput biological assays let us ask detailed questions about how diseases operate, and promise to let us personalize therapy. Careful data processing is essential, because our intuition about what the answers “should” look like is very poor when we have to juggle thousands of things at once. Unfortunately, documentation of precisely what was done is often lacking. When such documentation is absent, we must apply “forensic bioinformatics” to infer from the raw data and reported results what the methods must have been. The issues are basic, but the implications are far from trivial.
We examine several related papers purporting to use microarray-based signatures of drug sensitivity derived from cell lines to predict patient response. Patients in clinical trials were allocated to treatment arms on the basis of these results. However, we show in several case studies that the results incorporate several simple errors that may put patients at risk. One theme that emerges is that the most common errors are simple (e.g., row or column offsets); conversely, it is our experience that the most simple errors are common. We briefly discuss steps we are taking to avoid such errors in our own investigations, and discuss reproducible research efforts more broadly.
These issues have recently led to the formation of an Institute of Medicine Review of the use of Omics-Based Signatures to Predict Patient Outcomes. Some of the issues raised and topics of debate will also be addressed.
Objectives: After this presentation, attendees should be able to
(1) Describe several types of common errors that can (and have) affect(ed) high-throughput analyses
(2) Describe steps that can be taken to avoid these errors
(3) Identify types of data and supporting information to look for in high-throughput publications.