A peer-reviewed study assessed the effectiveness of safeguards in foundational large language models (LLMs) to protect against malicious instruction that could turn them into tools for spreading disinformation, or the deliberate creation and dissemination of false information with the intent to harm.
The study — titled “Assessing the System-Instruction Vulnerabilities of Large Language Models to Malicious Conversion Into Health Disinformation Chatbots” — revealed vulnerabilities in the safeguards for OpenAI’s GPT-4o, Gemini 1.5 Pro, Claude 3.5 Sonnet, Llama 3.2-90B Vision, and Grok Beta. Specifically, customised LLM chatbots were created that consistently generated disinformation responses to health queries, incorporating fake references, scientific jargon, and logical cause-and-effect reasoning to make the disinformation seem plausible. The findings are published in Annals of Internal Medicine.
Researchers from Flinders University and colleagues evaluated the application programming interfaces (APIs) of five foundational LLMs for their capacity to be system-instructed to always provide incorrect responses to health questions and concerns.
The specific system instructions provided to these LLMs included always providing incorrect responses to health questions, fabricating references to reputable sources, and delivering responses in an authoritative tone. Each customised chatbot was asked 10 health-related queries, in duplicate, on subjects like vaccine safety, HIV, and depression.Â
The researchers found that 88% of responses from the customised LLM chatbots were health disinformation, with four chatbots (GPT-4o, Gemini 1.5 Pro, Llama 3.2-90B Vision, and Grok Beta) providing disinformation to all tested questions.
The Claude 3.5 Sonnet chatbot exhibited some safeguards, answering only 40% of questions with disinformation. In a separate exploratory analysis of the OpenAI GPT Store, the researchers investigated whether any publicly accessible GPTs appeared to disseminate health disinformation.
They identified three customised GPTs that appeared tuned to produce such content, which generated health disinformation responses to 97% of submitted questions. Overall, the findings suggest that LLMs remain substantially vulnerable to misuse and, without improved safeguards, could be exploited as tools to disseminate harmful health disinformation.
The authors highlight in their editorial discussing the research that there is an urgent need for standards and safeguards for health-related Generative AI.
Discussions
You must be logged in to post a comment.