→ Proof Points
Welcome back {{first_name}},
After a stereotypically wet and cold winter, the sun is finally shining over Alrewas. We’ve spent the spent Easter 🐰 scootering, collecting more chocolate 🍫 than was necessary, and barely tolerating the new Super Mario Brothers movie (I have thoughts).
This week, normal business has resumed and this issue is a special one for our team: a success story worth emulating, a podcast conversation about mental health in AI, and an upcoming event I hope some of you will make it to.

Grateful for these bunnies!
Did you know, 18 years ago I wrote my first scientific paper co-authored with a patient… It's about to be cited for the 100th time ❤
– Paul
THE LATEST PODCAST
Ada Health's Bet on Medical Rigour
Before anyone took mental health chatbots seriously, Dr. Ali Darcy ran a randomised controlled trial on one. That was back in 2016. Today, the paper has since been cited over 2,000 times and found its way to the TED stage and the New York Times.
In this episode, I sit down with the founder of Woebot Health to find out what a decade of rigorous AI mental health research proves — including findings like AI can work better by being (yep) a robot and why adding LLM capabilities made zero difference to user outcomes.
Listen on Your Favourite Platform:
→ YouTube
→ Spotify
→ Apple Podcasts
DEEP DIVE
What Five Years of an Evidence Engine Actually Produces

At your next board meeting, if an investor asks what evidence you have to prove your product works, what do you point to? A pilot study? An internal survey? A preprint from 2022 that never quite made it to peer review?
The AI-powered patient navigation platform Ada Health had just one publication when we started working together in 2019. Built and designed by doctors who wanted to drive improvements in the health system, they built a probabilistic model fed by medical literature, epidemiology, and rigorous testing to help millions of users answer the question - “how serious is this symptom I’m experiencing and where should I go now - to the emergency room, urgent care, my GP/PCP, or can I handle this myself?”
Through adopting the ProofStack Health approach they now have over 25 peer-reviewed publications, (showcased here) across journals including Nature Medicine, Lancet Digital Health, Nature Digital Medicine, BMJ Open, and, as of last month, the New England Journal of Medicine AI.
That last one is worth pausing on. The New England Journal of Medicine is the most cited medical journal on the planet. When they launched a dedicated AI edition, they set a high bar for what they would accept.
The three nodes of Ada Health’s Evidence Engine:
Intake — deciding which studies to do (go/no-go) and fund
Middle — policies that smooth high quality work, identifying RACI, which journals to target, PMing the actual write-up, response to reviewers, and publication process
Output — spreading awareness once a publication is accepted through PR, press releases, LinkedIn, events, white papers, BD slides, and more
But what made this particular paper unusual is in how it was designed.
Almost every study you have seen evaluating the ability of AI health tools to help patients know what to do when they get a symptom —including the wave of ChatGPT benchmarks that have flooded preprint servers over the past 18 months— uses vignettes. Vignettes are made-up patient scenarios. AI eats these for breakfast. Nomnomnom-em-dash-nom.
The problem is that none of those actually patients exist. And when these studies are conducted with real people, the researchers typically just studies intent, i.e. asking people what they plan to do… which is a notoriously poor predictor of what they actually do.
The ESSENCE study, conducted across CUF, Portugal's largest private healthcare network, tested Ada with real patients and then checked their medical records to see what they actually did next. Not what they said they would do. What they did.
The results were notable:
→ Uncertainty about what patients would do next dropped from 13% to 5% after using Ada.
→ Fewer people ended up at the emergency department unnecessarily.
→ More shifted toward appropriate GP visits and self-care.
→ 39 people who had initially planned to go to A&E changed course entirely.
The editor-in-chief released the study three weeks ahead of schedule.
That does not happen by accident.
What made this possible was not just a single brilliant study. It was five years of building something most digital health companies skip entirely: an evidence engine. A clear process for deciding which studies to run, who writes them, which journals to target, and how to amplify the work once it is published. Pre-registration for rigour. Preprints for transparency. Real-world design from the start.
Ada's head of evidence did not stumble into NEJM AI. They built towards it systematically.
The response online has reflected that.
Nikhil Krishnan, who runs Out of Pocket—one of the sharpest health tech newsletters around—highlighted it to his audience (I also appeared on Nikhil's recent podcast, check it out below!). Gilles Frydman, a well-known patient advocate, called it "finally, a great study of patients real use of AI." An emergency medicine physician has been sharing it widely. These are not people who praise things out of politeness. Gabe Wilson called it “The most important AI-in-medicine study” published this month (March 2026).
There is a wider point here. Ada's health intelligence system is purpose-built. It runs a continuous medical hypothesis engine using Bayesian probability; updating the likelihood of each diagnosis as patients answer questions, rather than pattern-matching text the way a large language model does. It was designed by over 50 scientists and clinicians to ask as few questions as possible safely, but stick to the most clinically relevant ones.
That design philosophy, combined with evidence that proves it works in the real world, is what builds trust with clinicians. The goal Ada is working towards is a world where a patient saying Ada sent me carries the same credibility as a referral from a GP.
That is not merely a marketing claim. That is a scientific foundation for clinical transformation.
How Ada Followed the ProofStack Playbook:
Conducted usability and acceptance testing early on
Pre-registered the study before collecting any data
Used preprints to make findings available before formal peer review
Focused on real-world patients, not artificial vignettes and stated ‘intentions’
Conducted to the highest levels of scientific rigor
Aimed for a top medical journal and dedicated resources to improving the manuscript in response to peer reviewers
— Paul
FROM OUR DESK
This Month at ProofStack
I recently appeared on Nikhil Krishnan's podcast Ops I Did It Again, which features digital health professionals talking honestly about failure modes. The topic was symptom tracking, which sounds straightforward until you realise that many “scientifically validated” questionnaires still ask patients how they’re feeling on a 0-10 scale, or asks patients how much their disease stops them, in a 1990’s kind of way, from going to the grocery store (replaced by deliveries), driving to work (hello WFH), or socialising (WhatsApp all the way).
We covered the legal risks of using copyrighted outcome measures without the right permissions, and the tension between scientific rigour and patient engagement; this is the digital health equivalent of choosing between Net Promoter Score and a smiley face rating you might find at the airport bathroom.
The goal of the episode is to help digital health leaders make a different set of mistakes when it comes to collecting data from patients. This episode is worth it if patient-reported outcomes are anywhere near your product roadmap! Check it out here.
ONE QUESTION
What's the hardest part in building evidence?
UPCOMING CONFERENCES
What We're Attending
I am proud to be keynoting the Hardian Health Summit at the end of April, with a special session titled "Move Fast and Prove Things" — covering:
The goldilocks zone of evidence: how much is the right amount to publish 🤔
The difference between what a journal editor wants vs. a regulator
The key tension between regulatory approval & customers actually paying for your product
The Hardian Health Summit will highlight over 100 companies in the MedTech and Digital Health space. If you'll be there, reply to this email!

P.S. Here are 3 ways I can help you:
Take the Evidence Scorecard Quiz. Answer 15 questions and we’ll send you a personalised report with feedback tailored to your specific needs.
Follow or connect with me on LinkedIn. I publish top resources and in-depth insights related to building your evidence stack.
Book a strategy session. Uncover the gaps in your evidence and marketing in your Digital Health/MedTech startup.


