探索MLX Local Pipelines：本地运行AI模型的新方式-CSDN博客

本文链接：https://blog.csdn.net/qq_29929123/article/details/142729924

探索MLX Local Pipelines：本地运行AI模型的新方式

引言

在当今快速发展的人工智能领域，能够本地运行模型提供了更高的灵活性和控制力。MLX Local Pipelines允许开发者在本地环境中调用众多开源模型，这些模型托管在Hugging Face Model Hub上，通过LangChain的封装类来实现。这篇文章将介绍如何使用这些管道，提供代码示例，并讨论使用过程中可能遇到的挑战和解决方案。

主要内容

1. 环境准备

首先，确保安装必要的Python包以使用MLX Local Pipelines：

%pip install --upgrade --quiet mlx-lm transformers huggingface_hub

2. 加载模型

MLX模型可以通过from_model_id方法来加载。以下是一个示例，如何加载一个名为quantized-gemma-2b-it的模型：

from langchain_community.llms.mlx_pipeline import MLXPipeline

# 使用API代理服务提高访问稳定性
pipe = MLXPipeline.from_model_id(
    "mlx-community/quantized-gemma-2b-it",
    pipeline_kwargs={"max_tokens": 10, "temp": 0.1},
)

你也可以直接传入现有的transformers管道：

from mlx_lm import load

model, tokenizer = load("mlx-community/quantized-gemma-2b-it")
pipe = MLXPipeline(model=model, tokenizer=tokenizer)

3. 创建一个Chain

加载模型后，可以将其与提示组合，形成一个chain：

from langchain_core.prompts import PromptTemplate

template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)

chain = prompt | pipe

question = "What is electroencephalography?"

print(chain.invoke({"question": question}))

代码示例

完整的代码示例显示了如何创建和使用一个MLX管道来回答问题。

from langchain_community.llms.mlx_pipeline import MLXPipeline
from langchain_core.prompts import PromptTemplate

# 使用API代理服务提高访问稳定性
pipe = MLXPipeline.from_model_id(
    "mlx-community/quantized-gemma-2b-it",
    pipeline_kwargs={"max_tokens": 10, "temp": 0.1},
)

template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)

chain = prompt | pipe

question = "What is electroencephalography?"

answer = chain.invoke({"question": question})
print(answer)