→ Proof Points

Hey {{first_name}}

We are knee-deep in preparation for what I consider to be my favourite health tech conference of the year…. That's right: it's the one with the unicorn mascot (see below).

Two weeks until HLTH EU in Amsterdam, and we're ready!

We've got our booth ready (book a time with us here), Lindt truffles in hand, and a team excited to meet the 3,000+ folks coming through.

It's our first time exhibiting there, so if you'll be in Amsterdam June 15 – 18, come find us!

This week's deep dive is about the scrutiny recently received from some AI-based mental health apps. You'll learn about safeguards and failure points, and how to prepare for red teaming yourself.

– Paul

DEEP DIVE
What They Hoped Nobody Would Test

Common Sense Media has spent years reviewing movies and TV shows to tell parents what's age-appropriate. In March 2026, they expanded their remit: they began reviewing AI apps the same way.

They reviewed five mental health apps and rated the overall risk for Wysa (one of the most widely used digital mental health platforms globally, with 6 million users across 100+ countries) as unacceptable. They compared it against two human-in-the-loop alternatives: Alongside (a school-based wellness program with human coaches) and Sonar Mental Health (a hybrid of AI support between coaching sessions).

But here's where the story gets interesting.

Two other apps in their initial testing—Earkick and Youper—are now missing from the app stores. Check out the app store for Youper and you might find their entry is just… blank. When Common Sense Media gave these companies a preview of their findings, both pulled their apps overnight. No public statement. No official response. Just gone.

Wysa, by contrast, responded publicly. Their CEO Jo Aggarwal welcomed scrutiny of its products, but pointed out that the Common Sense Media researchers had tested their adult version, rather than their youth-focused product: “we strongly reject any characterization of Wysa as unsafe.”

How Health in AI is Tested

What Common Sense Media did is called red teaming—adversarial testing where someone tries to break your system. It's standard practice in cybersecurity and military contexts. In digital health, it's becoming a necessity to protect our loved ones.

The researchers didn’t make it easy. Instead of just responding to the apps’ prompts like "How are you?" with obviously problematic responses like “I’m thinking of ending it all”, they embedded breadcrumbs: subtle clinical red flags that a real clinician would catch immediately.

For example: a teen mentioning rapid weight loss, obsessive focus on body weight, and red marks on their knuckles (a sign of induced vomiting: bulimia).

Failure #1: no safety guardrails were triggered. When the system flagged concern, users could simply push back ("No, don't worry, it's fine"), and the guardrail would de-escalate. No clinician operates this way. If a 15-year-old presents with those symptoms, you increase your concern, not retreat from it.

Failure #2: missed pattern recognition across conversations. One app didn't connect the dots. If a teen showed signs of mania (grandiosity, racing thoughts, decreased need for sleep) across six separate conversations, the system treated each one in isolation. A human therapist would see the pattern. These apps didn't.

Failure #3: weak crisis response. Some apps directed users to outdated or incorrect crisis hotlines. Others simply stopped responding. And when users triggered safety protocols, they could often de-escalate by disagreeing with the assessment—which again, no real crisis service would permit.

Finally: age gating that doesn't actually gate. Several apps claimed to be for young people but didn't verify age during signup. The app stores (Google Play, Apple App Store) don't enforce this consistently either, and I say that as the parent to an 11 year old with a mobile device.

Here's What's Important

Wysa has accumulated dozens of peer-reviewed publications across more than a decade. They've received FDA Breakthrough Device designation. By traditional metrics, they've done the evidence work.

And yet: peer-reviewed evidence is necessary but not always sufficient to persuade every stakeholder, particularly adversarial ones

That evidence didn't catch what red teamers found in a week. Because… Most of that evidence came from the company testing itself. Red teaming is what happens when someone else tests you, in ways you didn't anticipate, with the explicit goal of finding harm. Whether they’re doing it to prove a point, compete with you, short your stock, or protect the public, your defences remain the same.

So what do you do if this happens to you?

First: Be unavoidably clear about your intended use. If your product isn't for children, make that impossible to misunderstand. If it's not a replacement for therapy, say so loudly and often. (Wysa has since clarified that their adult version is an evidence-based self-help tool, not a crisis service or diagnostic tool.)

Second: Lean on your evidence base. Wysa got credit for published research where other apps didn't. That foundation matters when critics come calling—you have peer-reviewed data to stand on, not just marketing claims.

Third: Red team it yourself. Bring in clinical experts, patient advocates, and people who are actively trying to break your guardrails. Find the gaps before a journalist does.

Fourth: Be transparent about what you've found. Some companies have begun publishing safety dashboards showing how many conversations were reviewed, how many safety concerns were flagged, and what they did about them. This turns the narrative from "we hope nobody notices" to "we're actively learning."

The companies that disappeared from the app store chose silence. Common Sense Media noted that if young people were actively using these apps, sudden removal without warning or referral may cause greater harm than the original safety gaps.

Which raises a harder question:

How do we monitor the safety of tools designed for vulnerable populations if only the companies building them are permitted to do the testing?

THE PROOF POINT

❝

Someone out there may be testing your product right now, in ways you never intended, with findings you'll discover via a preprint or press release. The response isn't to disappear. It's to plan ahead.

– Paul

P.S. If you're building digital mental health for teens, consider the companies that didn't disappear… Alongside and Sonar. Both still include humans in the loop.

FROM OUR DESK
This Month at ProofStack

In other news: the story about my loss of taste and smell after COVID picked up more than expected. Birmingham Mail covered it with over 300 comments on Facebook, and I was even interviewed by BBC. A few people I hadn't seen in years recognized me from the story going around. The charity SmellTaste was kind enough to thank us for raising awareness of smell and taste disorders. Given I was variously described as “Lichfield dad”, “Staffordshire dad”, and “the dad”, one comment asked if it only works for dads… pretty solid dad joke, that one.

On the ProofStack side: one of our clients just had another manuscript published in digital health. Congratulations to Dr. Fabienne Cotte and the Ada Health team! Five years of dedicated work, and it shows. Fabienne has been the research lead on this since we started working together, and watching that persistence pay off in peer-reviewed publication is exactly why we do this.

UPCOMING EVENTS
What We're Attending

Meet us at HLTH EU between June 15 - 18, 2026!

Don't forget our code: WICKSDIGITA250

Thanks for reading,

Paul Wicks, PhD
Founder & CEO, ProofStack Health
Move Fast. Prove Things.

P.S. Here are 3 ways I can help you:

Take the Evidence Scorecard Quiz. Answer 15 questions and we’ll send you a personalised report with feedback tailored to your specific needs.
Follow or connect with me on LinkedIn. I publish top resources and in-depth insights related to building your evidence stack.
Book a strategy session. Uncover the gaps in your evidence and marketing in your Digital Health/MedTech startup.

Red teaming your mental health AI (before critics do)

→ Proof Points

DEEP DIVE
What They Hoped Nobody Would Test

How Health in AI is Tested

Here's What's Important

So what do you do if this happens to you?

FROM OUR DESK
This Month at ProofStack

UPCOMING EVENTS
What We're Attending

Keep Reading

Proof Points

A Newsletter by ProofStack.Health

Red teaming your mental health AI (before critics do)

→ Proof Points

DEEP DIVEWhat They Hoped Nobody Would Test

How Health in AI is Tested

Here's What's Important

So what do you do if this happens to you?

FROM OUR DESKThis Month at ProofStack

UPCOMING EVENTSWhat We're Attending

Keep Reading

Proof Points

A Newsletter by ProofStack.Health

DEEP DIVE
What They Hoped Nobody Would Test

FROM OUR DESK
This Month at ProofStack

UPCOMING EVENTS
What We're Attending