Trajectory Volatility: A Novel Approach to Out-of-Distribution Detection in Mathematical Reasoning

Type	research
Area	AI
Published(YearMonth)	2410
Source	https://arxiv.org/abs/2405.14039
Tag	newsletter
Checkbox
Date(of entry)	@December 1, 2024

This study introduces the TV score, a trajectory volatility-based method for out-of-distribution (OOD) detection in mathematical reasoning tasks handled by generative language models (GLMs). Traditional OOD methods—such as uncertainty estimation and embedding distance measurement—struggle with the high-density output spaces inherent to mathematical reasoning, which amplify discrepancies in embedding shifts. The TV score capitalizes on these discrepancies, analyzing the volatility of embedding trajectories in latent space to effectively detect OOD samples. Experimental results demonstrate that this approach surpasses existing algorithms in OOD detection for mathematical reasoning and shows promise for other applications with dense output features, like multiple-choice questions. This work offers a significant advancement in ensuring the robustness and security of GLMs in complex reasoning tasks.