一个超小型llm 大模型smollm:135m_ollama最小的大语言模型-CSDN博客

本文链接：https://blog.csdn.net/skywalk8163/article/details/145755860

这是 Hugging Face 提供的一系列高效、轻量级的 AI 模型。其目标是创建功能强大而紧凑的文本和视觉模型，这些模型可以在端设备上有效运行，同时保持强大的性能。

官网:GitHub - huggingface/smollm: Everything about the SmolLM2 and SmolVLM family of models

只有135m大小，用来做实验太合适了！想着以后用它作为测试分布式的模型。

Ollama部署

直接部署，Ollama本身部署见：使用Ollama 在Ubuntu运行deepseek大模型：以deepseek-r1为例_ubuntu deepseek-CSDN博客

下载模型命令：

ollama run smollm:135m

ollma调用

from ollama import chat
from ollama import ChatResponse
 
response: ChatResponse = chat(model='smollm:135m', messages=[
  {
    'role': 'user',
    'content': '你是谁？',
  },
])
# 打印响应内容
print(response['message']['content'])
 
# 或者直接访问响应对象的字段
print(response.message.content)

但是用交互，ollama卡死了.....没再去解决。

Huggingface部署

直接python代码：

# 下载
from transformers import AutoModelForCausalLM, AutoTokenizer

# 指定模型名称
model_name = "HuggingFaceTB/SmolLM-135M-Instruct"
# 加载 tokenizer 和模型
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# 测试模型
input_text = "Hello, how are you?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)

# 解码输出
output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(output_text)

输出：

Hello, how are you?

**Sarah:** (nervously) I'm fine. Just... I'm trying to get out of bed.

**John:** (smiling) I'm fine too. I

效果还是不错的。

调试

ollama交互使用小模型，卡死了

可能是模型loop循环了