This post reviews a number of clinical trial anecdotes. If you’re less familiar with drug discovery and would like a refresher, check out this deep dive on drug discovery to get some additional context.
Everyone knew “bad” LDL cholesterol caused heart disease, and “good” HDL cholesterol prevented it. So when Eli Lilly developed Evacetrapib and saw HDL-C increase by 130% while decreasing LDL-C by 35%, the company thought they had a blockbuster. Much to their surprise, the 2016 trial showed no change in cardiovascular deaths at all. The conventional wisdom, backed up by decades of genetic and observational studies, was misleading. While these biomarkers were correlated with cardiovascular health, it seemed they were not causal. The drug was scrapped, Eli Lilly’s stock fell 8% in a day, and researchers were sent back to the drawing board looking for more nuanced biomarkers.
“I can see any failure as a chance. That result will teach you something else, something new.”
- Shinya Yamanaka, Nobel Laureate in Medicine
Numbers don’t lie (usually)
Clinical trial stats are grim. Trials can take more than a decade, cost upward of a billion dollars, and have a likelihood of success of around 10%. Every drug hunter has faced that moment of despair, knowing the odds are stacked against them and years of work may amount to nothing. They push through anyways, because on the other side, for the few drugs that succeed, lives are forever changed for the better. And the more we study those failures, the more we learn from them as we tackle the next drug.
Many studies1 have categorized trial failures, and largely agree around three reasons:
60% Efficacy: The drug doesn’t work well enough.
20% Safety: The side effects are too toxic.
20% Commercial: Strategic portfolio shifts or financial difficulties prevent follow-up trials.
While these categories cover how a drug fails, they don’t answer why - and more importantly, what could be done differently. Clinical trials are preceded by a long discovery process focused on efficacy and safety, so it’s worth asking: Why did we believe the drug would be safe and effective, and what did we miss?
Every clinical trial is a hypothesis to be proven or disproven, different for every drug. Look through enough individual examples, and a few stories continually show up to explain the disconnect between hypothesis and reality.
Poor patient selection: Humans are heterogenous - we have different genes, environments, diets, gut biomes, and more. So it’s no surprise that we also respond differently to drugs. In most clinical trials, only some of the patients show positive responses, and the challenge is figuring out beforehand which ones will respond well. Selecting the wrong patient population or not collecting the right data to stratify patients can prevent an effective drug from showing positive results, as the responding patient numbers are overwhelmed by non-responding ones.
Inaccurate models: Scientists use preclinical models as a proxy for human physiology. Computational tools, cell assays, and animal models all feed the iteration cycle to find and improve drug candidates. But those models can fail to capture the full complexity of a system: cells behave differently in test tubes than bodies, induced diseases look different from natural ones, and animals have different genes and proteins than humans do. Those disconnects can lead to false confidence, where a drug works well in the model but not in humans.
Non-causal biomarkers: We’ve never had more human data, from enormous health databases like the UK Biobank to clinical trial measurements kept at every pharma company. These datasets can correlate inputs like genetics, diets, and blood markers to outcomes like lifespan or disease risk, ultimately providing ideas for new drug targets and measurements to evaluate progress. But as with all data analysis, correlation does not mean causation. Even after taking precautions, it’s easy to think a biomarker drives a disease right up until a clinical trial disproves the hypothesis.
“All successful drugs are alike; each failed drug has failed in its own way.”
- Leo Tolstoy, almost
Of the near infinite stories to tell, let’s look at a few more examples.
Finding the right patients
Cancer is a notoriously challenging target because every tumor is different. So in the early 2000s, scientists were excited to find a pattern: the protein EGFR was overly abundant in many cancers and played an important role in cell growth. Yet hopes were dashed when the 2005 trial for gefitinib showed no advantage over standard chemotherapies. Undeterred, researchers segmented the patient population and found it worked better in non-smokers and Asians, but they couldn’t explain why. By the time a follow-up study was commissioned in 2009, a new technology emerged that could help. DNA sequencing costs had fallen from $10,000,000 to $100,000 in those four years, just cheap enough that sequencing patient tumors was now feasible. A sequence analysis revealed the true cause: patients with specific EGFR mutations responded well to gefitinib, while those with only overexpression did not. As it turned out, both non-smokers and Asians were far more likely to have those mutations than others. Approval and mutation-based screening quickly followed.
Working with imperfect models
Inspired by the success of EGFR drugs, scientists tried again with another overexpressed protein in cancers, IGF-1R. Repeated clinical trials over the course of the 2010s failed to outperform chemotherapy, and unlike EGFR, retrospective analysis could not identify reliable biomarkers to segment the population. Yet animal models continued to work, leading researchers to wonder where the disconnect was. More mechanistic experiments revealed the problem: in humans, when IGF-1R is blocked, other proteins like IR-A step in to compensate, quickly ramping up expression to take over where IGF-1R left off. Animal models largely miss or suppress this compensatory mechanism, and the difference in animal biology prevented researchers from noticing the issue.
Toxic side effects can be even more challenging than efficacy to predict, as a fateful trial in 1999 showed. Jesse Gelsinger suffered from a rare genetic disease affecting his liver, and doctors suggested he try an experimental new gene therapy: a modified adenovirus intended to insert corrected versions of the mutated gene into his cells. After receiving the therapy, his health degraded rapidly, and four days later, he died from a massive immune reaction. Unbeknownst to the clinicians, Jesse had been previously exposed to the adenovirus, likely from a common cold, priming his immune system to respond more strongly the second time. Jesse’s death sent shockwaves through the field2, leading to more scrutiny and a decade of careful safety studies before experimentation came back. Yet the central problem facing gene therapy remains mostly the same even today: unpredictable immune responses.
A bad biomarker, maybe?
The “amyloid hypothesis” has been the prevailing theory of Alzheimer’s Disease for more than 30 years: that the accumulation of beta amyloid plaques in the brain causes neurodegeneration. In that time, hundreds of clinical trials have tried and failed to treat Alzheimer’s by reducing these plaques. And if this story had been written five years ago, the summary would be clear - amyloids are a symptom and not a cause. But three recent and contentious drug approvals for Alzheimer’s add some ambiguity to the story, raising the question of whether amyloids are a bad biomarker, or if we just haven’t had the right drug for it. Others have written extensively on the subject3, and this is a story still being written. When we finally have good treatments for Alzheimer’s, there will be much to learn from the amyloid approach as either a success or failure of biomarkers.
New questions
Finding accurate models, good biomarkers, and a relevant patient population is a daunting gauntlet of challenges to tackle. Yet they also provide us a new set of questions to answer, ones that will hopefully lead to more successful clinical trials. Some of the questions I have are:
How do we segment patient populations?
How can we identify which drugs can be repurposed for new patient populations?
How can we design clinical trials to personalize medicines for each patient?
How do we build better preclinical models?
How can we generate relevant training data at unprecedented scale, and will that be enough for AI models to scale our way into biological understanding?
How can we represent increasingly complex systems with 3D organoids and organs-on-a-chip?
How do we find causal biomarkers?
What data should we be collecting in clinical trials, and how can we make that easy to obtain?
How can we break down data silos and make more data available to researchers?
How can we more effectively separate causality from correlation, and correlation from noise?
Each of these questions is deserving of its own essay, and intrepid researchers and entrepreneurs are already tackling them head on. I’m excited to see what they’ll accomplish.
As always, if you’re exploring these ideas, I’d love to hear what you’re building or learning. Reach out anytime.
Arrowsmith, J. Phase III and submission failures: 2007–2010. Nat Rev Drug Discov 10, 87 (2011). https://doi.org/10.1038/nrd3375
Arrowsmith, J. Phase II failures: 2008–2010. Nat Rev Drug Discov 10, 328–329 (2011). https://doi.org/10.1038/nrd3439
Arrowsmith, J., Miller, P. Phase II and Phase III attrition rates 2011–2012. Nat Rev Drug Discov 12, 569 (2013). https://doi.org/10.1038/nrd4090
Harrison, R. Phase II and phase III failures: 2013–2015. Nat Rev Drug Discov 15, 817–818 (2016). https://doi.org/10.1038/nrd.2016.184
Hwang TJ, Carpenter D, Lauffenburger JC, Wang B, Franklin JM, Kesselheim AS. Failure of Investigational Drugs in Late-Stage Clinical Development and Publication of Trial Results. JAMA Intern Med. 2016;176(12):1826–1833. https://doi.org/10.1001/jamainternmed.2016.6008
For more on how Jesse’s death impacted the field, I highly recommend The Death of Jesse Gelsinger, 20 years later.
A few of the most highly referenced articles include:


