本地运行MLX模型：轻松实现模型推理

dfvcbipanjr

于 2024-10-02 18:17:03 发布

阅读量200

点赞数 3

文章标签： python

本文链接：https://blog.csdn.net/dfvcbipanjr/article/details/142683429

版权

本地运行MLX模型：轻松实现模型推理

引言

随着机器学习的普及，越来越多的模型可供社区使用。MLX社区提供的模型可以通过MLXPipeline类在本地运行。这篇文章将介绍如何利用MLX模型进行本地推理，帮助开发者在本地环境下运行和测试开源模型。

主要内容

MLX社区在Hugging Face Model Hub上托管了超过150个开源模型，开发者可以方便地通过API调用这些模型。使用MLXPipeline类，我们可以在本地实现这些模型的推理功能。为了开始，我们需要安装一些必要的Python包。

环境配置

首先，确保安装了以下Python包：

%pip install --upgrade --quiet mlx-lm transformers huggingface_hub

模型加载

我们可以通过from_model_id方法加载模型：

from langchain_community.llms.mlx_pipeline import MLXPipeline

# 使用API代理服务提高访问稳定性
pipe = MLXPipeline.from_model_id(
    "mlx-community/quantized-gemma-2b-it",
    pipeline_kwargs={"max_tokens": 10, "temp": 0.1},
)

或者，直接使用transformers库加载模型：

from mlx_lm import load

model, tokenizer = load("mlx-community/quantized-gemma-2b-it")
pipe = MLXPipeline(model=model, tokenizer=tokenizer)

创建推理链

加载模型后，可以将其与提示结合形成推理链。例如：

from langchain_core.prompts import PromptTemplate

template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)

chain = prompt | pipe

question = "What is electroencephalography?"

print(chain.invoke({"question": question}))