【开荒日志】基于huggingface transformers的falcon-mamba-7b-instruct本地部署踩坑记录

tiiuae最近发布了mamba版本的falcon-7b模型,mamba是去年发布的一种新的模型架构,这种架构不使用自注意力机制,转而使用空间向量框架。其特点之一,在于超长的prompt输入,而且不需要占用很多的显存。挺吸引人,值得一试。

原理已有不少大牛讨论过,作者目前还是半桶水,所以就不在这里展开了。详情见《Mamba: Linear-Time Sequence Modeling with Selective State Spaces》

0. 模型的下载

apt update
apt install git-lfs
git lfs intall
git clone https://hf-mirror.com/tiiuae/falcon-mamba-7b-instruct

或者直接在https://hf-mirror.com/tiiuae/falcon-mamba-7b-instruct获取每一个下载链接,然后逐一wget, 这种方式比git clone更稳定一些。
根据huggingface上的资料显示, falcon可以通过transformers调用,和很多其他模型一样,这样就不需要写日志了。
不过有个问题,mamaba脱离了transformer,却还是用名叫transformers的库,所以第一个坑来了。

1.transformers的下载。

通常是pip install transformers
不过目前这个时间点很可能会出现以下的错误
ValueError: The checkpoint you are trying to load has model type falcon_mamba but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

写这篇日志的时候,各大国内源可能还没来得及更新,最高版本在4.44
为了解决这个问题,需要前往https://github.com/huggingface/transformers,这里的说明写的比较具体,需要transformers >= 4.45.0。
于是进行git clone https://github.com/huggingface/transformers. 然后迎来了第二个坑

2. git clone。

直接在github进行git clone, 可能会遇到障碍,然后就会寻找以下替代方案。
https://kkgithub.com/huggingface/transformers
https://gitclone.com/github.com/huggingface/transformers
然而这两个替代方案的transformers也没来得及更新。所以还是得反复尝试git clone https://github.com/huggingface/transformers. 或者使用科学手段来尝试。

成功下载了transformers之后,进入transformers路径,然后通过

pip install -e .

完成transformers包的安装。
“pip install -e .”(可编辑安装)这个命令在当前目录下寻找setup.py,但安装的包不会被复制到site-packages目录。这样的话,可以随时修改这个包。但更有意义的是,不需要将安装包放在默认路径,这种方式在云服务器上特别有用,尤其是安装包比较大,默认路径的空间又比较小,且无法扩展的时候。这样我们就可以把安装包放在空间大的路径了。

完成这步之后,模型基本上可以运行了。如果用jupyter notebook, 可能还得重启一下电脑。这样就能面对第三个坑了

3. falcon-mamba-7b-instruct初次运行

既然擅长长文本,那就看看它有多擅长吧。以下代码中的grandstrings,来自网上搜索的英文文本。目前有2350个单词。用的是RTX4090显卡,24GB显存。

model_id = “autodl-tmp/falcon-mamba-7b-instruct”
tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=“auto”, device_map=“auto”)
print(‘falcon-mamba-7b-instruct ready’)

print(‘inferencing’)
prompts = f"““summarise the following texts in 100 words.
text:{grandstrings}””"
inputs = tokenizer(prompts, return_tensors=“pt”).to(0)

output = model.generate(**inputs, max_new_tokens=4096, do_sample=True)
print(tokenizer.decode(output[0], skip_special_tokens=True))
等了很久很久,OOM了,看来提示词越长,消耗的显存越多,因此falcon-mamba-7b-instruct在显存有限的时候,输入的文本也不能太长. 反复测试,在文本长度不到2100的时候,才不出现OOM。当然,后续可以使用量化后的模型,增加输入文本的长度。然后,就来到了第四个坑。

4.复读机

由于这个faclon主要的语言是英文,因此用英文文章做测试
提示词比较短的时候,还算是能回答问题。在这里插入图片描述
但是按下面这种方式构建提示词的话
grandstrings = “”"The Great Crusade was the period in the early history of the Imperium of Man, lasting from approximately 798.M30 into the first decades of the 31st Millennium (approximately a little over two standard centuries), when the Imperium was expanding as it sought to bring all of human-settled space back under the control of a single ruler, the Emperor of Mankind.

The Great Crusade marked a brief era in human history of rebuilding, reunification and rapid technological advancement following the complete regression of Mankind into techno-barbarism during the 5,000 Terran years of the Age of Strife. It was a time when the Emperor of Mankind still lived in the conventional sense and led His people in person. The Great Crusade era is considered the Golden Age of the Imperium before the long, slow slide into cultural stagnation, political repression and increasing warfare that marked the ten millennia after the Great Crusade and the Horus Heresy had ended.
“”"
prompts = f"““This is a description of a story.{grandstrings} What is the description about?””"
在这里插入图片描述
这种构建提示词的办法,让模型把回答变成了续写,先将提示词重复一遍,然后在这个基础上发挥,最后的回答变成未完成的续写。

5. 正确的提示词模版

因此官方的资料是按以下方式构建的提示词

messages = [
    {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

因此我把大远征的介绍下载下来,然后做成提示词,然后将模型量化成int4,再测试一下模型的忠诚度 回答效果

# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from bs4 import BeautifulSoup
import requests
headers = {
  "Accept": "application/json, text/javascript, */*; q=0.01",
  "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6",
  "Connection": "keep-alive",
  "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
  "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36 Edg/118.0.2088.57",
  "X-Requested-With": "XMLHttpRequest"
}
url = 'https://warhammer40k.fandom.com/wiki/Great_Crusade'
response = requests.get(url, headers=headers)
bser = BeautifulSoup(response.content,"html.parser")
response.close()
stfs = bser.find_all('div',{"class": "mw-parser-output"})
ps = stfs[0].find_all('p')
grandstrings = ''
for i in range(0,len(ps)):
    grandstrings += ps[i].getText()

grandstrings.count(' ')

print('string contents ready')
print(f"about {str(grandstrings.count(' '))} words")


tokenizer = AutoTokenizer.from_pretrained("autodl-tmp/falcon-mamba-7b-instruct")
model = AutoModelForCausalLM.from_pretrained("autodl-tmp/falcon-mamba-7b-instruct", device_map="auto", quantization_config=BitsAndBytesConfig(load_in_4bit=True))

thequestion = "Why did the Emperor launch the Cruisade?"
messages = [
    {"role": "user", "content": f"Answer my question according to the content.\ncontent: {grandstrings}\nQuestion: {thequestion}"},
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids, max_new_tokens=4096)
answer = tokenizer.decode(outputs[0])
refined_answer = answer[answer.find('<|im_start|>assistant'):len(answer)]
print(refined_answer)

在这里插入图片描述
在这里插入图片描述
int4量化后的模型,面对5000个单词的英文文章,使用了不到19GB的显存。

“Why did the Emperor launch the Cruisade?”
“<|im_start|>assistant
The Emperor launched the Great Crusade to unite all of humanity under a single banner, the Imperial banner, and bring peace and order to the galaxy. He believed that the human race had to unite if it was to survive the threat of extinction that it faced from the many dangers that existed in the galaxy, such as the Orks, the Dark Eldar, and the Tyranids. The Emperor also believed that the human race had to unite if it was to progress and achieve its full potential. The Great Crusade was a massive undertaking that involved the creation of vast fleets of starships and the deployment of millions of soldiers and Space Marines to conquer new worlds and bring them into the fold of the Imperium. The Great Crusade was a massive undertaking that involved the deployment of vast fleets of starships and the deployment of millions of soldiers and Space Marines to conquer new worlds and bring them into the fold of the Imperium.<|im_end|>
<|end_of_text|>”
其实falcon的回答还是会将提示词尽可能复读一遍,然后再加上自己的理解。也许跟falcon的训练方式有关吧。这种构建提示词的模版,可以在输出的文本里面插入"<|im_start|>“、”<|im_end|>“、”<|end_of_text|>"这种字样, 方便后续的处理。
“who is the leader of space wolves?”
“<|im_start|>assistant
The leader of the Space Wolves is not explicitly mentioned in the given content. However, the Space Wolves are one of the many Legions of the Great Crusade, and their leader during the Great Crusade would have been their Primarch, Leman Russ.<|im_end|>
<|end_of_text|>”
这么看来falcon-mamba-7b-instruct还是有忠诚的知识储备的。

“Which Chaos God contributed to Horus Heresy?”
“<|im_start|>assistant
The Chaos God that contributed to the Horus Heresy was primarily the god of change and destruction, Tzeentch. Tzeentch, also known as the Everchangin…<|im_end|>
<|end_of_text|>”
荷鲁斯被纳诟的诅咒所伤,无法治疗。最后是怀言者军团通过特殊手段让荷鲁斯康复,但是这种特殊手段也腐蚀了战帅的心智,由此种下了大叛乱的祸根。而怀言者早就受到了奸奇Tzeentch的蛊惑。因此这个回答还是很可以的,像是理解了这个故事的好几重关系。

以上就是基于huggingface transformers的falcon-mamba-7b-instruct本地部署踩坑记录,以及一些简单的测试。期待更多的新模型,包括几个月前出现的KAN架构模型。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值