DeepSeek-R1: Reinforcement Learning for Enhanced LLM Reasoning

Typeresearch
AreaAI
Published(YearMonth)2501
Sourcehttps://arxiv.org/abs/2501.12948
Tagnewsletter
Checkbox
Date(of entry)

DeepSeek-AI introduces DeepSeek-R1, a new line of reasoning-focused large language models (LLMs) trained with reinforcement learning (RL). The first model, DeepSeek-R1-Zero, is developed solely through large-scale RL without supervised fine-tuning (SFT), exhibiting emergent reasoning capabilities. However, it faces issues such as poor readability and language mixing. To address these, DeepSeek-R1 employs multi-stage training and cold-start data before RL, achieving performance on par with OpenAI-o1-1217 in reasoning tasks. To advance open research, DeepSeek-AI has open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six model variants ranging from 1.5B to 70B parameters, distilled from Qwen and Llama architectures. This initiative aims to push the boundaries of reinforcement learning-based reasoning in LLMs while fostering transparency and collaboration in AI research.