Navigating the Challenges of GenAI in Medical Diagnoses

Type	news
Area	Medical
Source	https://hai.stanford.edu/news/generating-medical-errors-genai-and-erroneous-medical-references
Tag	newsletter
Checkbox

A groundbreaking study reveals that while large language models (LLMs) like ChatGPT are making strides in medical diagnostics, their reliability in substantiating medical claims is questionable. Despite their widespread use in healthcare, from aiding doctors to self-diagnosis by patients, these models often fail to provide accurate medical references for their claims. The study, conducted by researchers at Stanford University, highlights a critical issue: a significant portion of responses from advanced LLMs, including GPT-4 with retrieval augmented generation, lack support from valid medical sources. This raises concerns about the use of GenAI in medical decision-making, especially given the FDA's current regulatory challenges with these technologies.

The researchers' approach, called SourceCheckup, evaluates the ability of LLMs to generate substantiated medical advice. Their findings indicate a substantial gap in the models' capacity to produce fully supported responses, with up to 30% of statements made by the most sophisticated models being unsupported by the provided sources. This issue is more pronounced in inquiries from the general public, suggesting that the models perform poorly in providing reliable information to those who may need it most. The study calls for more domain-specific adaptations and rigorous evaluations of source verification to ensure LLMs can offer credible and reliable medical information. As the healthcare industry continues to integrate GenAI into practice, the ability of these models to substantiate their claims with accurate references remains a pivotal concern for regulators, healthcare professionals, and patients alike.