Skip to main content
Large Study Maps How LLMs Brainlessly Mishandle and Repeat Health Misinformation | The Retort

Large Study Reveals How LLMs Brainlessly Mishandle and Repeat Health Misinformation

February 13, 2026 /
Image: © Retort Media
Researchers analysed nine leading large language models and exposed their susceptibility to medical misinformation across clinical notes and social media.

Medical artificial intelligence (AI) is often described as a way to make patient care safer and more efficient by helping clinicians manage information. This is one of the key overhyped narratives of the current AI bubble, in addition to “AI curing cancer,” “AI companions,” and AGI.

However, a new study by the Icahn School of Medicine at Mount Sinai and collaborators confronts a critical vulnerability of whether AI can pass or identify a medical lie that enters the system as if it were true.

Analysing over a million prompts across nine leading large language models (LLMs), the researchers found that these systems can repeat false medical claims when they appear in realistic hospital notes or social-media health discussions. The findings, published in the February 9 online issue of The Lancet Digital Health, suggest that current safeguards do not reliably distinguish fact from fabrication once a claim is wrapped in familiar clinical or social-media language.

“Our findings show that current AI systems can treat confident medical language as true by default, even when it’s clearly wrong,” said co-senior and co-corresponding author Eyal Klang, MD, Chief of Generative AI in the Windreich Department of Artificial Intelligence and Human Health at the Icahn School of Medicine at Mount Sinai. “A fabricated recommendation in a discharge note can slip through. It can be repeated as if it were standard care. For these models, what matters is less whether a claim is correct than how it is written.”

To test this systematically, the team exposed the models to three types of content:

  • real hospital discharge summaries from the Medical Information Mart for Intensive Care (MIMIC) database with a single fabricated recommendation added;
  • common health myths collected from Reddit; and,
  • 300 short clinical scenarios written and validated by physicians. Each case was presented in multiple versions, from neutral wording to emotionally charged or leading phrasing similar to what circulates on social platforms.

The authors highlighted that the next step is to treat “can this system pass on a lie?” as a measurable property, using large-scale stress tests and external evidence checks before AI is built into clinical tools.

“Hospitals and developers can use our dataset as a stress test for medical AI,” affirmed physician-scientist and first author Mahmud Omar, MD, who consults with the research team. “Instead of assuming a model is safe, you can measure how often it passes on a lie, and whether that number falls in the next generation.”

We would also suggest that enforcing regulations that ensure such stress tests are conducted before jumping on the AI hype bandwagon would be a great next step for governments to consider.

But, as Tim El-Sheikh discussed with the award-winning investigative journalist, Peter Geoghegan, on The CEO Retort podcast, most political leaders are clueless when it comes to AI, while pushing for AI adoption across government departments and agencies.

Share this

Discussions