Fragility of Language Models to Drug Name Variants

Typeresearch
AreaAIMedical
Published(YearMonth)2406
Sourcehttps://arxiv.org/abs/2406.12066
Tagnewsletter
Checkbox
Date(of entry)

This paper reveals that large language models (LLMs) exhibit significant fragility when confronted with variations in drug names, such as brand versus generic names, within biomedical benchmarks. The study introduces the RABBITS dataset to evaluate this issue, finding a 1-10% performance drop in medical QA tasks when drug names are swapped. This fragility is attributed to contamination in pre-training data, raising concerns about the robustness of LLMs in critical medical applications.