360团队开源360智脑模型，让任何人都可以搭建360AI知识库

本文链接：https://blog.csdn.net/weixin_41446370/article/details/147493113

🎉🎉🎉 近日，奇虎 360 对其自主研发的 7B 参数模型 360Zhinao3-7B 进行了开源升级。目前，该模型已在 Github 开源社区 360zhinao3 上发布，并可免费用于商业用途。该模型的功能得到了全面提升。与小于 10B 的小参数模型相比，360Zhinao3-7B 在多个基准测试中取得了第一名的优异成绩。

360Zhinao3-7B
360Zhinao3-7B-Instruct
360Zhinao3-7B-O1.5

我们的 360Zhinao3 型号的显著特点是：

360Zhinao3-7B 是在 360Zhinao2-7B 的基础上使用 700B 高质量词块进行持续预训练的。两个模型的结构完全相同。模型性能的提高主要源于训练数据质量的提高。

在这里插入图片描述

模型评估

基本型号

我们使用开源工具 opencompass 对模型进行了多维度评估。该模型的基准平均得分在参数小于 10B 的模型中排名第一。在同等规模的模型中，它具有很强的竞争力。

Type	Datasets	language	glm4-9b	Qwen2.5-7B	internlm2.5-7b	Yi1.5-9B	gemma2-9b	Llama3.1-8B	360Zhinao2-7B	360Zhinao3-7B
Exam	ceval	zh	75.83	81.41	77.71	73.51	56.36	51.67	83.04	84.7
	mmlu	en	75.5	75.5	71.55	71.43	72.22	66.75	67.84	75.42
	cmmlu	zh	74.24	81.79	78.77	74.2	58.89	52.49	73.8	82.17
	ARC-c	en	94.92	80	85.08	87.46	77.63	80.68	87.12	88.14
	ARC-e	en	98.41	84.83	95.24	94.53	78.84	89.77	92.77	94
Language	WiC	en	51.57	52.82	50.78	50.63	50.47	50	49.84	50.31
Language	WSC	en	68.27	68.27	69.23	66.35	68.27	67.31	65.38	71.15
Knowledge	BoolQ	en	81.8	83.88	89.51	84.46	85.6	82.2	88.29	88.38
Knowledge	commonsense_qa	en	71.17	73.22	68.55	71.58	68.47	71.25	69.78	71.33
Understanding	C3	zh	91.51	92	93.04	85.86	81.64	83.51	93.26	92.77
	race-middle	en	91.99	91.02	92.06	91.16	88.09	81.69	90.46	90.04
	race-high	en	90.71	87.91	90.08	88.34	82.08	78.73	86.74	85.96
	lcsts	zh	18.29	15.82	15.96	16.49	10.62	17.29	18.61	18.85
	eprstmt-dev	zh	91.88	86.88	91.25	91.88	48.12	83.12	90	92.50
	lambada	en	71.67	71.14	69.98	70.64	75.43	74.23	72.56	68.17
Reasoning	hellaswag	en	70.25	72.76	70.38	71.55	66.83	74.65	71.49	73.61
	siqa	en	81.73	72.52	78.97	76.2	58.96	64.18	77.12	79.02
	bbh	en	73.68	54.63	59.43	67.86	68.45	59.9	46.54	73.74
Code	humaneval	en	69.51	75	60.37	26.22	5.49	27.44	60.98	64.63
Code	mbpp	en	60	60	43.6	56.8	51.2	42.6	54	67.80
Math	math	en	26.86	38	27.14	27.06	28.52	15.32	38.34	37.60
Math	gsm8k	en	78.54	79.76	52.54	71.11	73.09	56.25	75.51	78.77
Overall	avg_zh		70.35	71.58	71.35	68.39	51.13	57.62	71.74	74.20
Overall	avg_all		73.11	71.78	69.60	68.88	61.60	62.32	70.61	74.83

Instruct Model

We have evaluated and compared the 360Zhinao3-7B-Instruct model on three popular evaluations: IFEval, MT-bench, and CF-Bench. MT-bench and CFBench both rank first among open-source models of the same level and have strong competitiveness. In IFEval (prompt strict), it is second only to glm4-9b and has the highest score in the 7B size.

Model	MT-bench	IFEval(strict prompt)	CFBench(CSR,ISR,PSR)
Qwen2.5-7B-Instruct	8.07	0.556	0.81	0.46	0.57
Yi-9B-16k-Chat	7.44	0.455	0.75	0.4	0.52
GLM4-9B-Chat	8.08	0.634	0.82	0.48	0.61
InternLM2.5-7B-Chat	7.39	0.540	0.78	0.4	0.54
360Zhinao2-7B-Chat-4k	7.86	0.577	0.8	0.44	0.57
360Zhinao3-7B-Instruct	8.17	0.626	0.83	0.52	0.64

Long COT Model

我们利用之前开源的知网Light-R1方法，继续微调了360知网3-7B-Instruct的长COT，以及RFT和GRPO。与最新的OpenThinker2-7B相比仍有一定差距，但已经超越了以往所有基于通用Qwen2.5-7B-Instruct的模型。

Model	Date	Base Model	AIME24	AIME25	GPQA Diamond
OpenThinker2-7B	25.4.3	Qwen2.5-7B-Instruct	50	33.3	49.3
OpenThinker-7B	25.1.28	Qwen2.5-7B-Instruct	31.3	23.3	42.4
360Zhinao3-7B-O1.5	25.4.14	360Zhinao3-7B-Instruct	54.2	36.3	40.0
OpenR1-Qwen-7B	25.2.11	Qwen2.5-Math-7B-Instruct	48.7	34.7	21.2
DeepSeek-R1-Distill-Qwen-7B	25.1.20	Qwen2.5-Math-7B-Instruct	57.3	33.3	47.3
Light-R1-7B-DS	25.3.12	DeepSeek-R1-Distill-Qwen-7B	59.1	44.3	49.4
Areal-boba-RL-7B	25.3.31	DeepSeek-R1-Distill-Qwen-7B	61.9	48.3	47.6

快速入门

一个简单的例子来说明如何快速使用 360Zhinao3-7B、360Zhinao3-7B-Instruct 和 360Zhinao3-7B-O1.5，以及 🤗 Transformers

基础模型推理演示

from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig

MODEL_NAME_OR_PATH = "qihoo360/360Zhinao3-7B"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH, 
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True).cuda()

generation_config = GenerationConfig.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)
generation_config.max_new_tokens = 1024

inputs = tokenizer('中国二十四节气\n1. 立春\n2. 雨水\n3. 惊蛰\n4. 春分\n5. 清明\n', return_tensors='pt')
inputs = inputs.to(model.device)

pred = model.generate(input_ids=inputs["input_ids"], generation_config=generation_config)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

指令模型推理演示

from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig

MODEL_NAME_OR_PATH = "qihoo360/360Zhinao3-7B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True).cuda()

generation_config = GenerationConfig.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)
generation_config.max_new_tokens = 2048

messages = []

#round-1
print(f"user: 简单介绍一下刘德华")
messages.append({"role": "user", "content": "简单介绍一下刘德华"})
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
pred = model.generate(input_ids=input_ids, generation_config=generation_config)
response = tokenizer.decode(pred.cpu()[0][len(input_ids[0]):], skip_special_tokens=True)
messages.append({"role": "assistant", "content": response})
print(f"gpt: {response}")


#round-1
print(f"user: 他有什么代表作?")
messages.append({"role": "user", "content": "他有什么代表作?"})
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
pred = model.generate(input_ids=input_ids, generation_config=generation_config)
response = tokenizer.decode(pred.cpu()[0][len(input_ids[0]):], skip_special_tokens=True)
messages.append({"role": "assistant", "content": response})
print(f"gpt: {response}")

长 COT 模型推理演示

import re
import json
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig

MODEL_NAME_OR_PATH = "qihoo360/360Zhinao3-7B-O1.5"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True).cuda()

generation_config = GenerationConfig.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)
generation_config.max_new_tokens = 2048


def extract_thinking_and_answer(input_string):
    thinking, answer = "", ""
    # 提取答案
    pattern_answer = r'.*</think>(.*)$'
    match_answer = re.search(pattern_answer, input_string, re.S)
    if match_answer:
        answer = match_answer.group(1)
    else:
        return thinking, input_string

    # 提取思考过程
    pattern_thinking = r'<think>(.*?)</think>'
    match_thinking = re.search(pattern_thinking, input_string, re.S)
    if match_thinking:
        thinking = match_thinking.group(1)

    return thinking, answer


messages = []
messages.append({"role": "user", "content": "现有一笼子，里面有鸡和兔子若干只，数一数，共有头14个，腿38条，求鸡和兔子各有多少只？"})
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
pred = model.generate(input_ids=input_ids, generation_config=generation_config)
response = tokenizer.decode(pred.cpu()[0][len(input_ids[0]):], skip_special_tokens=True)
thinking, answer = extract_thinking_and_answer(response)
messages.append({"role": "assistant", "content": answer, "reasoning_content": thinking})
print(json.dumps(messages, ensure_ascii=False, indent=4))