打破CloseAI的垄断,开源一个多模态版本的OpenAI o1!

关注公众号,发现CV技术之美

本文为粉丝投稿。

项目简介

OpenAI 的 O1 通过引入推理链条来增强逻辑推理能力,受到了广泛关注。然而,CloseAI 并没有对 OpenAI O1 进行开源,甚至没有公布足够的技术细节。最近,社区内尝试了一些低成本实现 OpenAI O1 的方案,比如 G1 和 O1。

为了应对图像理解这一关键能力的需求,我们开发了 Multimodal Open-O1(MO1)项目,利用开源多模态模型(如 Qwen2VL 7B)作为测试模型,来实现多模态版本的 OpenAI O1。值得一提的是,MO1 项目中的模型可以灵活替换为其他开源或闭源的多模态大模型(MLLM),以满足不同的应用需求。

目前,该项目还处于开发初期阶段,我们真诚地欢迎更多开发者和研究者加入我们,共同参与开发和优化。请给项目加星(star),并贡献你的智慧和力量!

项目链接

https://github.com/xmu-xiaoma666/Multimodal-Open-O1

实验设置

我们使用 Qwen2VL 7B 作为测试模型,该模型可以灵活替换为其他开源或闭源的多模态大模型(MLLM)。实验结果显示,在某些案例中,利用推理链策略,模型性能确实优于原有方案。

实验记录

多模态场景

Case1: 物体计数

fcb33d8b0d53c5ad9d2c6ef5046e3c6c.png
image

问题:How many fruits are there in the picture?

思考过程:

Thinking Step 1:The task is to count the number of fruits in the picture. The image contains eight different fruits.
Thinking Step 2:To count the fruits, I will examine each one individually. There are 8 fruits in total.
Final AnswerThere are 8 fruits in the picture.

最后的回答:

There are 8 fruits in the picture.

Case2: 方向感知

523154726c953cf9359e8b0b20d85ab5.png
image 1

问题:右下角的水果是什么?

思考过程:

StepContent
Thinking Step 1:The image shows a collection of fruits. The bottom right corner contains a cluster of dark purple grapes. This is the fruit we need to identify.
Final AnswerThe fruit in the bottom right corner of the image is a cluster of dark purple grapes.

最后的回答:

The fruit in the bottom right corner of the image is a cluster of dark purple grapes.

Noting:在相同的问题下,QWen2-VL的答案是:右下角的水果是石榴。

5d4a9e8279e09e36871ba58d159f990e.png
image 2

Case3: 行列感知

975fce972f8226f41db623f8da52870b.png
image 3

问题:What is the second fruit in the first line?

思考过程:

StepContent
Thinking Step 1:To find the second fruit in the first line, I need to analyze the image and determine the sequence of fruits. The first fruit in the first line is an apple. The second fruit should be the next in the sequence. Let's examine the options.
Thinking Step 2:Upon examining the sequence of fruits in the image, I can see that the first fruit is an apple. The second fruit in the first line should logically follow the sequence. Let's consider the options available in the image.
Thinking Step 3:After analyzing the sequence of fruits in the image, I can conclude that the second fruit in the first line is a strawberry. This is based on the logical progression of the sequence and the presence of the strawberry in the image.
Final AnswerThe second fruit in the first line is a strawberry.

最后的回答:

The second fruit in the first line is a strawberry.

纯文本场景

Case1: 字母计数(英文)

bc893e5451768a318a0ca0e4ab17f90d.png
image 4

问题:How many 'R's are in the word strawberry?

思考过程:

StepContent
Thinking Step 1:The problem is to count the number of 'R's in the word 'strawberry'. To solve this, I need to examine each letter in the word and determine if it is an 'R'.
Thinking Step 2:Now, I will go through each letter in the word 'strawberry' and check if it is an 'R'. I will start with the first letter and continue until the end of the word.
Thinking Step 3:I have now examined each letter in the word 'strawberry'. The letters are: s, t, r, a, w, b, r, r, b, r, r, y. Among these, the letters 'R' appear three times.
Final AnswerThe word 'strawberry' contains three 'R's.

最后的回答:

The word 'strawberry' contains three 'R's.

Noting:在相同的问题下,QWen2-VL的答案是:The word "strawberry" contains one 'R'.

158056e9af9e54afefd138d09b238a13.png
image 5

Case2: 六年级数学题

a6893c544cc989a11dfa5c9697fbee73.png
image 6

问题:A tank of water, using 50% and 5 barrels, leaving 30%. How many barrels of water are in this tank?

思考过程:

StepContent
Thinking Step 1:The problem involves a tank of water with a certain amount of water initially. The tank is filled with 50% of its capacity and then 5 barrels of water are added. After this addition, the tank is filled to 30% of its capacity. The task is to determine the total number of barrels of water in the tank.
Thinking Step 2:Failed to generate step after 3 attempts. Error: JSON decode error : {"title": "Analyzing the Problem", "content": "Let's denote the total capacity of the tank as C barrels. Initially, the tank is filled with 50% of its capacity, which means it contains 0.5C barrels of water. Then, 5 barrels of water are added to the tank, increasing the amount of water to 0.5C + 5 barrels. After this addition, the tank is filled to 30% of its capacity, which means it contains 0.3C barrels of water. We can set up the equation 0.5C + 5 =
Final AnswerTo find the total number of barrels of water in the tank, we can set up the equation 0.5C + 5 = 0.3C, where C represents the total capacity of the tank in barrels. Solving for C, we get C = 25 barrels. Therefore, the tank contains 25 barrels of water.

最后的回答:

To find the total number of barrels of water in the tank, we can set up the equation 0.5C + 5 = 0.3C, where C represents the total capacity of the tank in barrels. Solving for C, we get C = 25 barrels. Therefore, the tank contains 25 barrels of water

Noting:在相同的问题下,QWen2-VL的答案是错误的:It is not possible to determine the total number of barrels of water in the tank based on the information provided. The statement "using 50% and 5 barrels, leaving 30%" does not provide enough information to determine the total number of barrels in the tank.

7515582b18870ef708ccdbf50452de29.png
image 7

Case3: 一年级数学题(中文)

536dd157d0034f8403881e8a0abfca2f.png
image 8

问题:小明折了9只纸飞机,比小军少折3只,小军折了几只纸飞机?

思考过程:

StepContent
Thinking Step 1:小明折了9只纸飞机,比小军少折3只。这意味着小军折的纸飞机数量比小明多3只。为了找出小军折了多少只纸飞机,我们需要从小明折的纸飞机数量中加上这个差值。
Thinking Step 2:根据题目,小明折了9只纸飞机,比小军少折3只。这意味着小军折的纸飞机数量比小明多3只。因此,我们可以用小明折的纸飞机数量加上3只来计算小军折的纸飞机数量。所以,小军折了9 + 3 = 12只纸飞机。
Final Answer小军折了12只纸飞机。

最后的回答:

The word 'strawberry' contains three 'R's.

Case4: 一年级数学题(英文)

d74aa52f91ea7cad1a3819f1f90a9f77.png
image 9

问题:There are 10 apples in the basket, one of which was eaten by Little Ding Ding and two by Dad. How many more are there now?

思考过程:

StepContent
Thinking Step 1:There are 10 apples in the basket. One apple was eaten by Little Ding Ding and two apples were eaten by Dad. To find out how many apples are left, I need to subtract the apples eaten from the total number of apples. This involves basic arithmetic operations.
Thinking Step 2:Little Ding Ding ate 1 apple and Dad ate 2 apples. To find out how many apples are left, I need to subtract the total number of apples eaten from the initial number of apples. This involves basic arithmetic operations.
Thinking Step 3:To find out how many apples are left, I need to subtract the total number of apples eaten from the initial number of apples. This involves basic arithmetic operations. After subtracting, I will provide the final answer.
Final AnswerThere are 7 apples left in the basket.

最后的回答:

There are 7 apples left in the basket.

Case5: 三年级数学题(中文)

ba2945c461b36b8e5cc8d96015c3723e.png
image 10

问题:商店有 4筐苹果,每筐 55 千克,已经卖出 135 千克,还剩多少千克苹果?

思考过程:

StepContent
Thinking Step 1:首先,我们需要确定商店总共有多少千克苹果。由于有4筐苹果,每筐55千克,我们可以计算出总重量。然后,我们需要从总重量中减去已经卖出的135千克,以找出剩余的苹果重量。
Thinking Step 2:To find out how many kilograms of apples are left, we need to subtract the amount sold from the total amount available. The total amount available is 4 baskets * 55 kilograms/basket = 220 kilograms. After selling 135 kilograms, the remaining amount is 220 kilograms - 135 kilograms = 85 kilograms.
Final AnswerThere are 85 kilograms of apples left.

最后的回答:

There are 85 kilograms of apples left.

实现思路

我们参考了 G1 和 O1 项目的方法,通过在提示(prompt)设计上促进模型的推理链(Chain of Thought, CoT)过程。具体步骤如下:

  1. 问题分析:提示模型首先对问题进行详细分析。

  2. JSON格式响应:模型被要求以 JSON 格式响应,包含以下键值:

  • title: 分析步骤的标题。

  • content: 分析的具体内容。

  • next_action: 接下来的操作,值为 continue 或 **final_answer**。

如果 next_action 为 **continue**,模型将继续思考并提供下一步的分析;如果为 **final_answer**,则输出最终答案。

(完整的提示设计和具体实现细节请参见代码)

致谢:

https://github.com/win4r/o1

https://github.com/bklieger-groq/g1

8231288b74aa2c7e3692dd515b15c642.jpeg

END

欢迎加入「LLM交流群👇备注:LLM

4016696ddfd60c665c6bf3bdeb1643d2.png

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值