MacBook M1 32G配置大语言模型

最新推荐文章于 2025-04-23 20:00:00 发布

auto_star

最新推荐文章于 2025-04-23 20:00:00 发布

阅读量1.5k

点赞数

分类专栏：深度学习文章标签：语言模型人工智能自然语言处理

本文链接：https://blog.csdn.net/auto_star/article/details/131617349

版权

深度学习专栏收录该内容

9 篇文章

订阅专栏

这里是假装的“大目录”

环境准备:
- 1）模型下载链接失效
- 2）针对Mac的Python库函数需要重新编译

在线使用大语言模型，因为公司限制的问题，存在诸多不便。如果能在本地搭建一个聊天机器人🤖️，不仅能解决起草文档，英文润色的费用问题，不需要再购买GPT-4，还可以消遣无聊时间。得益于苹果🍎Arm框架统一内存的优势，而且Tensorflow和Pytorch都已适配Arm，在MacBook上得以较低的代价部署一个可随身携带的大语言模型。

Alt

环境准备:

在MacBook上安装LLaMA这个大语言模型，主要follow大佬Andrew的博客https://agi-sphere.com/install-llama-mac/#Step_1_Install_Homebrew，它里面的图十分漂亮！需要提前安装好Python 3.10.X和Pytorch-Mac，Python 3.10.X的安装采用pyenv，而Pytorch-Mac的安装直接上官网下载最新版就好。准备好这些，按照博客上的步骤，会遇到两个坑，1）模型下载链接失效，2）针对Mac的Python库函数需要重新编译。

1）模型下载链接失效

尽管可以向Meta申请访问权限，但似乎现在在Meta上申请已经很慢。但咱社区有力量，可以从第三方下载模型参数，主要有7B，13B，30B，和65B参数量四种模型。模型的下载地址列在下方：

 wget https://agi.gpt4.org/llama/LLaMA/tokenizer.model -O ./tokenizer.model
wget https://agi.gpt4.org/llama/LLaMA/tokenizer_checklist.chk -O ./tokenizer_checklist.chk
wget https://agi.gpt4.org/llama/LLaMA/7B/consolidated.00.pth -O ./7B/consolidated.00.pth
wget https://agi.gpt4.org/llama/LLaMA/7B/params.json -O ./7B/params.json
wget https://agi.gpt4.org/llama/LLaMA/7B/checklist.chk -O ./7B/checklist.chk
wget https://agi.gpt4.org/llama/LLaMA/13B/consolidated.00.pth -O ./13B/consolidated.00.pth
wget https://agi.gpt4.org/llama/LLaMA/13B/consolidated.01.pth -O ./13B/consolidated.01.pth
wget https://agi.gpt4.org/llama/LLaMA/13B/params.json -O ./13B/params.json
wget https://agi.gpt4.org/llama/LLaMA/13B/checklist.chk -O ./13B/checklist.chk
wget https://agi.gpt4.org/llama/LLaMA/30B/consolidated.00.pth -O ./30B/consolidated.00.pth
wget https://agi.gpt4.org/llama/LLaMA/30B/consolidated.01.pth -O ./30B/consolidated.01.pth
wget https://agi.gpt4.org/llama/LLaMA/30B/consolidated.02.pth -O ./30B/consolidated.02.pth
wget https://agi.gpt4.org/llama/LLaMA/30B/consolidated.03.pth -O ./30B/consolidated.03.pth
wget https://agi.gpt4.org/llama/LLaMA/30B/params.json -O ./30B/params.json
wget https://agi.gpt4.org/llama/LLaMA/30B/checklist.chk -O ./30B/checklist.chk
wget https://agi.gpt4.org/llama/LLaMA/65B/consolidated.00.pth -O ./65B/consolidated.00.pth
wget https://agi.gpt4.org/llama/LLaMA/65B/consolidated.01.pth -O ./65B/consolidated.01.pth
wget https://agi.gpt4.org/llama/LLaMA/65B/consolidated.02.pth -O ./65B/consolidated.02.pth
wget https://agi.gpt4.org/llama/LLaMA/65B/consolidated.03.pth -O ./65B/consolidated.03.pth
wget https://agi.gpt4.org/llama/LLaMA/65B/consolidated.04.pth -O ./65B/consolidated.04.pth
wget https://agi.gpt4.org/llama/LLaMA/65B/consolidated.05.pth -O ./65B/consolidated.05.pth
wget https://agi.gpt4.org/llama/LLaMA/65B/consolidated.06.pth -O ./65B/consolidated.06.pth
wget https://agi.gpt4.org/llama/LLaMA/65B/consolidated.07.pth -O ./65B/consolidated.07.pth
wget https://agi.gpt4.org/llama/LLaMA/65B/params.json -O ./65B/params.json
wget https://agi.gpt4.org/llama/LLaMA/65B/checklist.chk -O ./65B/checklist.chk
————————————————
原文链接：https://blog.csdn.net/u014297502/article/details/129829677

2）针对Mac的Python库函数需要重新编译

之后安装，还会碰到一些缺包的问题，缺啥补啥就好。但还会遇到cannot import name '_itree' from partially initialized module 'itree' (mach-o file, but is an incompatible architecture (have (x86_64), need (arm64e))
，具体报错信息如下：

python3 -m llama.download        
Traceback (most recent call last):
  File "/Users/paulo/Library/Python/3.9/lib/python/site-packages/itree/__init__.py", line 5, in <module>
    from . import _itree
ImportError: cannot import name '_itree' from partially initialized module 'itree' (most likely due to a circular import) (/Users/paulo/Library/Python/3.9/lib/python/site-packages/itree/__init__.py)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 188, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/Users/paulo/Library/Python/3.9/lib/python/site-packages/llama/__init__.py", line 4, in <module>
    from .model_single import ModelArgs, Transformer
  File "/Users/paulo/Library/Python/3.9/lib/python/site-packages/llama/model_single.py", line 8, in <module>
    import hiq
  File "/Users/paulo/Library/Python/3.9/lib/python/site-packages/hiq/__init__.py", line 57, in <module>
    from .tree import (
  File "/Users/paulo/Library/Python/3.9/lib/python/site-packages/hiq/tree.py", line 9, in <module>
    import itree
  File "/Users/paulo/Library/Python/3.9/lib/python/site-packages/itree/__init__.py", line 7, in <module>
    import _itree
ImportError: dlopen(/Users/paulo/Library/Python/3.9/lib/python/site-packages/_itree.cpython-39-darwin.so, 0x0002): tried: '/Users/paulo/Library/Python/3.9/lib/python/site-packages/_itree.cpython-39-darwin.so' (mach-o file, but is an incompatible architecture (have (x86_64), need (arm64e)))

这就需要卸载pip uninstall py-itree，然后手动下载，

pip install https://github.com/juncongmoo/itree/archive/refs/tags/v0.0.18.tar.gz

Bingo，一个比较傻的7B参数量LLaMA模型就可以成功部署在MacBookPro上了，如果需要提升聪明程度，可能就需要大内存开销来用65B的那么个模型。

conda_llm@Mac% ./examples/chat.sh
main: build = 801 (3e08ae9)
main: seed  = 1688828046
llama.cpp: loading model from ./pyllama_data/7B/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.08 MB
llama_model_load_internal: mem required  = 5439.94 MB (+ 1026.00 MB per state)
llama_new_context_with_model: kv self size  =  256.00 MB

system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 
main: interactive mode on.
Reverse prompt: 'User:'
sampling: repeat_last_n = 64, repeat_penalty = 1.000000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 256, n_keep = 48


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

 Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.

User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User:
 Who is the president of France?
Bob: Nicolas Sarkozy.
User:Who am I?
Bob: You are a User.
User:Where am I?
Bob: You are in your home in Los Angeles, California.
User:Where are you?
Bob: I am in my home in Seattle, Washington.
User:How you doing?
Bob: I am doing fine, thank you.