【OpenAI-O1-Preview 推理缩放定律亲测】 9.11 和 9.8 哪个大 ? 程序员怎样实现财富自由完整可执行可落地的方案步骤？（文末附国内可用的测试链接）

本文链接：https://blog.csdn.net/universsky2015/article/details/142216842

在这里插入图片描述

OpenAI-O1-Preview 亲测：（文末附国内可用的测试链接）

OpenAI 草莓 (o1) 已出炉！我们终于看到推理时间扩展的范例在生产中得到普及和部署。正如萨顿在《痛苦的教训》中所说，只有两种技术可以随着计算无限扩展：

学习和搜索。

是时候将焦点转移到后者了。

你不需要一个巨大的模型来进行推理。许多参数专门用于记忆事实，以便在琐事 QA 等基准测试中表现良好。可以从知识中分解出推理，即知道如何调用浏览器和代码验证器等工具的小型“推理核心”。预训练计算量可能会减少。
大量计算转移到推理服务，而不是训练前/训练后。 LLMs是基于文本的模拟器。通过在模拟器中推出许多可能的策略和场景，模型最终将收敛到良好的解决方案。这个过程是一个经过充分研究的问题，就像 AlphaGo 的蒙特卡罗树搜索（MCTS）一样。
OpenAI 肯定很早之前就已经弄清楚了推理缩放定律，而学术界最近才发现这一定律。上个月，Arxiv 上相隔一周发表了两篇论文：

大型语言猴子：通过重复采样扩展推理计算。布朗等人。发现 DeepSeek-Coder 在 SWE-Bench 上从 1 个样本的 15.9% 提高到 250 个样本的 56%，击败了 Sonnet-3.5。
优化LLM测试时间计算比缩放模型参数更有效。斯内尔等人。发现 PaLM 2-S 通过测试时搜索在数学上击败了 14 倍大的模型。
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling. Brown et al. finds that DeepSeek-Coder increases from 15.9% with one sample to 56% with 250 samples on SWE-Bench, beating Sonnet-3.5.
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters. Snell et al. finds that PaLM 2-S beats a 14x larger model on MATH with test-time search.

将o1产品化比确定学术基准要困难得多。对于野外推理问题，如何决定何时停止搜索？奖励函数是什么？成功的标准？何时在循环中调用代码解释器等工具？如何考虑这些 CPU 进程的计算成本？他们的研究帖子没有太多分享。
草莓很容易成为数据飞轮。如果答案正确，整个搜索轨迹将成为训练示例的小型数据集，其中包含正面奖励和负面奖励。

这反过来又改进了 GPT 未来版本的推理核心，类似于 AlphaGo 的价值网络（用于评估每个棋盘位置的质量）如何随着 MCTS 生成越来越精细的训练数据而改进。

OpenAI Strawberry (o1) is out! We are finally seeing the paradigm of inference-time scaling popularized and deployed in production. As Sutton said in the Bitter Lesson, there’re only 2 techniques that scale indefinitely with compute: learning & search. It’s time to shift focus to the latter.

You don’t need a huge model to perform reasoning. Lots of parameters are dedicated to memorizing facts, in order to perform well in benchmarks like trivia QA. It is possible to factor out reasoning from knowledge, i.e. a small “reasoning core” that knows how to call tools like browser and code verifier. Pre-training compute may be decreased.
A huge amount of compute is shifted to serving inference instead of pre/post-training. LLMs are text-based simulators. By rolling out many possible strategies and scenarios in the simulator, the model will eventually converge to good solutions. The process is a well-studied problem like AlphaGo’s monte carlo tree search (MCTS).
OpenAI must have figured out the inference scaling law a long time ago, which academia is just recently discovering. Two papers came out on Arxiv a week apart last month:

Large Language Monkeys: Scaling Inference Compute with Repeated Sampling. Brown et al. finds that DeepSeek-Coder increases from 15.9% with one sample to 56% with 250 samples on SWE-Bench, beating Sonnet-3.5.
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters. Snell et al. finds that PaLM 2-S beats a 14x larger model on MATH with test-time search.

Productionizing o1 is much harder than nailing the academic benchmarks. For reasoning problems in the wild, how to decide when to stop searching? What’s the reward function? Success criterion? When to call tools like code interpreter in the loop? How to factor in the compute cost of those CPU processes? Their research post didn’t share much.
Strawberry easily becomes a data flywheel. If the answer is correct, the entire search trace becomes a mini dataset of training examples, which contain both positive and negative rewards.

This in turn improves the reasoning core for future versions of GPT, similar to how AlphaGo’s value network — used to evaluate quality of each board position — improves as MCTS generates more and more refined training data.

用 golang 实现一个 DAG 任务调度系统源代码

以下是一个用 Go 语言实现的简单的 DAG（有向无环图）任务调度系统的源代码：

// dag_scheduler.go
package main

import (
    "fmt"
    "errors"
)

// 定义任务结构体
type Task struct {
    ID          string                 // 任务ID
    DependsOn   []string               // 依赖的任务ID列表
    Run         func() error           // 任务执行的函数
}

// 定义DAG结构体
type DAG struct {
    Tasks map[string]*Task             // 任务ID到任务的映射
}

// 添加任务到DAG
func (dag *DAG) AddTask(task *Task) error {
    if dag.Tasks == nil {
        dag.Tasks = make(map[string]*Task)
    }
    if _, exists := dag.Tasks[task.ID]; exists {
        return errors.New("任务已存在: " + task.ID)
    }
    dag.Tasks[task.ID] = task
    return nil
}

// 拓扑排序，用于确定执行顺序
func (dag *DAG) TopologicalSort() ([]*Task, error) {
    var result []*Task
    visited := make(map[string]bool)     // 已访问的任务
    tempMarked := make(map[string]bool)  // 临时标记，用于检测循环依赖

    // 深度优先搜索函数
    var visit func(string) error
    visit = func(id string) error {
        if tempMarked[id] {
            return errors.New("发现循环依赖")
        }
        if !visited[id] {
            tempMarked[id] = true
            task, exists := dag.Tasks[id]
            if !exists {
                return errors.New("未找到任务: " + id)
            }
            for _, depID := range task.DependsOn {
                if err := visit(depID); err != nil {
                    return err
                }
            }
            visited[id] = true
            tempMarked[id] = false
            result = append(result, task)
        }
        return nil
    }

    // 遍历所有任务，执行DFS
    for id := range dag.Tasks {
        if !visited[id] {
            if err := visit(id); err != nil {
                return nil, err
            }
        }
    }

    return result, nil
}

// 按拓扑排序的顺序执行任务
func (dag *DAG) Run() error {
    sortedTasks, err := dag.TopologicalSort()
    if err != nil {
        return err
    }

    // 按顺序执行任务
    for _, task := range sortedTasks {
        fmt.Printf("正在执行任务: %s\n", task.ID)
        if err := task.Run(); err != nil {
            return err
        }
    }
    return nil
}

// 主函数，示例如何使用
func main() {
    dag := &DAG{}

    // 定义任务A
    taskA := &Task{
        ID:        "A",
        DependsOn: []string{},
        Run: func() error {
            fmt.Println("运行任务 A")
            return nil
        },
    }

    // 定义任务B，依赖于任务A
    taskB := &Task{
        ID:        "B",
        DependsOn: []string{"A"},
        Run: func() error {
            fmt.Println("运行任务 B")
            return nil
        },
    }

    // 定义任务C，依赖于任务A
    taskC := &Task{
        ID:        "C",
        DependsOn: []string{"A"},
        Run: func() error {
            fmt.Println("运行任务 C")
            return nil
        },
    }

    // 定义任务D，依赖于任务B和任务C
    taskD := &Task{
        ID:        "D",
        DependsOn: []string{"B", "C"},
        Run: func() error {
            fmt.Println("运行任务 D")
            return nil
        },
    }

    // 将任务添加到DAG
    dag.AddTask(taskA)
    dag.AddTask(taskB)
    dag.AddTask(taskC)
    dag.AddTask(taskD)

    // 运行DAG中的所有任务
    if err := dag.Run(); err != nil {
        fmt.Println("执行出错:", err)
    }
}