【Qwen变体】 Marco-o1: 为开放式解决方案建立开放式推理模型

在这里插入图片描述
🎯Marco-o1 不仅关注有标准答案的学科,如数学、物理和编码–这些学科非常适合强化学习 (RL)–而且更加重视开放式的解决方案。 我们的目标是解决以下问题 "

目前,Marco-o1 大语言模型(LLM)由思维链微调(CoT)、蒙特卡洛树搜索(MCTS)、反射机制和创新推理策略提供支持,并针对复杂的现实世界问题解决任务进行了优化。

局限性: 我们想强调的是,这项研究工作的灵感来自于 OpenAI 的 o1(其名称也来源于此)。 这项工作旨在探索潜在的方法,以揭示大型推理模型目前尚不明确的技术路线图。 此外,我们的重点是开放式问题,并且在多语言应用中观察到了有趣的现象。 然而,我们必须承认,目前的模型主要表现出类似于 o1 的推理特征,其性能与完全实现的 "o1 "模型仍有差距。 这并非一朝一夕之功,我们将继续致力于不断优化和持续改进。

在这里插入图片描述

🚀亮点

目前,我们的工作有以下亮点:

  • 🍀利用 CoT 数据进行微调: 我们利用开源 CoT 数据集和自主开发的合成数据,对基础模型进行全参数微调,从而开发出 Marco-o1-CoT。
  • 🍀通过 MCTS 扩展解决方案空间: 我们将 LLM 与 MCTS(Marco-o1-MCTS)相结合,利用模型的输出置信度来引导搜索并扩展解空间。
  • 🍀推理行动策略: 我们实施了新颖的推理行动策略和反思机制(Marco-o1-MCTS 小步骤),包括在 MCTS 框架内探索不同的行动粒度,并促使模型进行自我反思,从而显著提高模型解决复杂问题的能力。
  • 🍀翻译任务中的应用: 我们首次将大型推理模型(LRM)应用于机器翻译任务,探索多语言和翻译领域的推理时间缩放规律。

OpenAI 最近推出了开创性的 o1 模型,该模型以其卓越的推理能力而闻名。 该模型在 AIME、CodeForces 等平台上表现出色,超越了其他领先模型。 受这一成功的启发,我们的目标是进一步推动 LLM 的发展,增强其推理能力,以应对复杂的现实世界挑战。

🌍Marco-o1 利用 CoT 微调、MCTS 和推理行动策略等先进技术来增强其推理能力。 如图 2 所示,通过结合过滤后的 Open-O1 CoT 数据集、Marco-o1 CoT 数据集和 Marco-o1 指令数据集对 Qwen2-7B-Instruct 进行微调,Marco-o1 提高了对复杂任务的处理能力。 MCTS 允许探索多种推理路径,使用软最大值(softmax)应用前 k 个备选词组的对数概率得出的置信度分数,引导模型找到最佳解决方案。 此外,我们的推理行动策略包括在步骤和小步骤中改变行动的粒度,以优化搜索效率和准确性。

在这里插入图片描述
🌏 如图 3 所示,Marco-o1 在 MGSM(英文)数据集上的准确率提高了 +6.17%,在 MGSM(中文)数据集上的准确率提高了 +5.60%,展示了更强的推理能力。

在这里插入图片描述
🌎此外,在翻译任务中,我们发现 Marco-o1 在翻译俚语表达方面表现出色,例如将 “这只鞋拥有踩屎感”(直译为 “这只鞋给人一种踩在屎上的感觉”)翻译为 “这只鞋的鞋底很舒服”,这表明它对口语细微差别的把握非常出色。

在这里插入图片描述

使用

我这里用了Qwen2 的 Demo

from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
    "AIDC-AI/Marco-o1",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("AIDC-AI/Marco-o1")

prompt = "How many r in strawberry."
messages = [
    {"role": "system", "content": "You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
Alright, I need to determine how many 'r's are in the word "strawberry." Let's break this down step by step to ensure accuracy.

First, I'll write out the word "strawberry" to visualize it clearly:
s - t - r - a - w - b - e - r - r - y

Now, I'll identify each occurrence of the letter 'r':
1. The first 'r' is between 's' and 't'.
2. The second 'r' is between 'a' and 'w'.
3. The third 'r' is between 'e' and 'y'.

So, counting these, there are three instances of the letter 'r' in "strawberry."

To double-check, I'll consider the pronunciation of the word. In English, the 'r' sounds are typically pronounced at the beginning and sometimes in the middle of words. In "strawberry," the 'r' sounds are present in the first syllable ("str-") and again in the second syllable ("erry"). This further confirms that there are two distinct 'r's in the pronunciation, but when looking at the written form, there are actually three 'r's as I identified earlier.

Additionally, considering the etymology of the word, "strawberry" comes from the Old French "estrawebere," which eventually evolved into Middle English and then Modern English. The presence of multiple 'r's can be traced back to these historical linguistic changes, indicating that the current spelling with three 'r's is consistent with its etymological roots.

Moreover, if I were to compare "strawberry" with other similar words like "strawberry" and "strawberry," the pattern remains consistent, reinforcing that there are three 'r's in the word.

In conclusion, after thorough analysis and verification through multiple methods—written count, pronunciation, etymology, and comparison—it's clear that the word "strawberry" contains three 'r's.

Q: How many e in reinforence.
第1次

I’m not sure what you mean by “reinforence.” It seems like there might be a typo or a misspelling. Could you please clarify the term? Is it “reinforcement” or perhaps something else? Once I have the correct spelling, I can help you determine how many 'e’s are in that word.

第2次

I apologize, but I’m not entirely sure what you’re asking about with “how many e in reinforcement.” It seems like there might be a typo or a specific term you’re referring to. Could you please clarify or provide more context? Are you asking about the number of letters ‘e’ in the word “reinforcement,” or perhaps something related to reinforcement learning in computer science? Providing more information will help me give you an accurate and helpful response.

Q: How many e in reinforcement.

I'm sorry, but the question "How many e in reinforcement" seems incomplete or unclear. It appears to be asking about the number of letters 'e' in the word "reinforcement," but it's missing the context or the specific aspect related to reinforcement that you're inquiring about. 

If you're referring to counting the number of times the letter 'e' appears in the word "reinforcement," I can help with that. Let's break it down:

1. **Identify the word**: The word is "reinforcement."
2. **List all letters**: r, e, i, n, f, o, r, c, e, n, c, e, m, e, n, t.
3. **Count the 'e's**:
   - First 'e' at position 2
   - Second 'e' at position 7
   - Third 'e' at position 15
4. **Total count**: There are three 'e's in "reinforcement."

However, if your question is related to something else, such as the number of 'e's in a specific context of reinforcement learning or another field, please provide more details so I can assist you accurately.

在Community看到一个密西西比州的单词数数
Q : How may S’s are there in Mississippi

Alright, I need to determine how many 'S's are in the word "Mississippi." Let's break this down step by step to ensure accuracy.

First, I'll start by writing out the word "Mississippi" to visualize it clearly:

M-I-S-S-I-S-S-I-P-P-I

Now, I'll count each letter one by one:

1. **First 'M'**: This is not an 'S', so I'll move on.
2. **First 'I'**: Not an 'S'.
3. **First 'S'**: This is the first 'S'.
4. **Second 'S'**: This is the second 'S'.
5. **Third 'S'**: This is the third 'S'.
6. **First 'I'**: Not an 'S'.
7. **Second 'S'**: This is the fourth 'S'.
8. **Second 'I'**: Not an 'S'.
9. **Third 'S'**: This is the fifth 'S'.
10. **First 'P'**: Not an 'S'.
11. **Second 'P'**: Not an 'S'.
12. **Second 'I'**: Not an 'S'.

After carefully counting each letter, I've identified five 'S's in total within the word "Mississippi."

To double-check, I'll recount:

- Starting from the beginning: M, I, S, S, I, S, S, I, P, P, I.
- Counting the 'S's: 1 (third letter), 2 (fifth letter), 3 (seventh letter), 4 (ninth letter), 5 (eleventh letter).

Yes, that confirms there are indeed five 'S's in "Mississippi."

Additionally, considering the pronunciation, the word "Mississippi" has a distinct 'S' sound in each of these positions, further validating the count.

Therefore, after thorough examination, I conclude that there are five 'S's in the word "Mississippi."

他在这个问题上第五个5. **Third 'S'**: This is the third 'S'.多了。

反观 QwQ-32B-Preview

So I have this question: "How many S's are there in Mississippi?" Hmm, seems straightforward at first glance. But let's think about it carefully.

First, I need to look at the word "Mississippi" and count how many times the letter "S" appears in it. Okay, let's spell out the word: M-i-s-s-i-s-s-i-p-p-i.

Now, I'll go through each letter one by one:

- M: Not S

- I: Not S

- S: This is an S

- S: Another S

- I: Not S

- S: Another S

- S: Another S

- I: Not S

- P: Not S

- P: Not S

- I: Not S

So, from this, I can see that there are S's in the third, fourth, sixth, and seventh positions. That's four S's in total.

Wait a minute, but sometimes people might be tricky with questions like this. Is there any trick here? Let me think. The question is about the word "Mississippi," which is a word in English, so I should consider it as it's spelled.

Alternatively, if this was about the actual number of S's in the state of Mississippi, that would be different, but the question specifies "in Mississippi," referring to the word.

Also, considering that the question is "How may S's are there in Mississippi?" there seems to be a typo in the question; it should probably be "How many S's are there in Mississippi?" But I think I understand what is being asked.

So, sticking to the word "Mississippi," and counting the S's, I have M-i-s-s-i-s-s-i-p-p-i, which has four S's.

Is there any other way to interpret this? Maybe considering capitalization? But in the word "Mississippi," the S's are all lowercase except for the first letter, which is capital M, but in counting letters, we usually consider both uppercase and lowercase as the same letter.

Wait, but in programming, when you compare letters, uppercase and lowercase are different. But here, since it's a word in English, I think we treat them as the same.

Alternatively, if this was in a different context, like in programming or something, case sensitivity might matter, but in this case, it's just about the word "Mississippi," and all S's are lowercase.

Also, is there a possibility that the word has more S's that I'm missing? Let's spell it again: M-i-s-s-i-s-s-i-p-p-i. Clearly, four S's.

Alternatively, sometimes people might miscount because of the repeated S's, but in this case, it's straightforward.

So, I think the answer is four.

But to be thorough, let me consider if there's any other interpretation. For example, if the question was about the number of S's in the state's name, which is "Mississippi," then again, it's the same as the word.

Alternatively, if it was about the number of S's in the state's nickname or something, but I don't think that's the case here.

Another thought: is there a variant spelling of Mississippi with more or fewer S's? I don't think so; "Mississippi" is consistently spelled with four S's.

Also, considering that Mississippi is a river and a state, but again, the spelling remains the same.

So, in conclusion, there are four S's in Mississippi.

**Final Answer**

\[ \boxed{4} \]

真是一个准确的机器人。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值