s1: Simple test-time scaling
Minimal recipe for test-time scaling and strong reasoning performance matching o1-preview with just 1,000 examples & budget forcing
Pager:
https://arxiv.org/pdf/2501.19393
文章目录
Test-time scaling is a promising new approach to language modeling that uses extra test-time compute
to improve performance. Recently, OpenAI’s o1 model showed this capability but did not publicly
share its methodology, leading to many replication efforts. We seek the simplest approach to achieve test-time scaling and strong reasoning performance.
First, we curate a small dataset s1K of 1,000 questions paired with