DeepSeek-R1: Reinforcement Learning for Enhanced LLM Reasoning
Type | research |
---|---|
Area | AI |
Published(YearMonth) | 2501 |
Source | https://arxiv.org/abs/2501.12948 |
Tag | newsletter |
Checkbox | |
Date(of entry) |
DeepSeek-AI introduces DeepSeek-R1, a new line of reasoning-focused large language models (LLMs) trained with reinforcement learning (RL). The first model, DeepSeek-R1-Zero, is developed solely through large-scale RL without supervised fine-tuning (SFT), exhibiting emergent reasoning capabilities. However, it faces issues such as poor readability and language mixing. To address these, DeepSeek-R1 employs multi-stage training and cold-start data before RL, achieving performance on par with OpenAI-o1-1217 in reasoning tasks. To advance open research, DeepSeek-AI has open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six model variants ranging from 1.5B to 70B parameters, distilled from Qwen and Llama architectures. This initiative aims to push the boundaries of reinforcement learning-based reasoning in LLMs while fostering transparency and collaboration in AI research.