OneKE配置以及使用记录

the_seventh_dog

已于 2024-08-23 14:51:09 修改

阅读量1.7k

点赞数 23

文章标签：知识图谱

于 2024-08-14 14:00:43 首次发布

本文链接：https://blog.csdn.net/the_seventh_dog/article/details/141184320

版权

真的心累大概是我太菜了

模型下载
环境配置

总结要写在开头：体验请直接用3，长期使用请用2，1暂时还没试出来。以及OneKE目前版本只适合短文本，简单schema，文档级的据作者在issue所说下个版本会考虑，拭目以待叭。
OneKE

模型下载

根据官网说明就行，有好几个渠道，本人使用下载地址是这个：模型，下载方法我使用了git，

git clone https://www.modelscope.cn/zjunlp/oneke.git

其他还有很多下载方法供选择。

环境配置

1.Windows本地版

首先我跟nvidia驱动、cuda以及pytorch搏斗了一番。
接着使用了官方github上给到的requirements.txt文件下载安装相应的包。
但是这中间存在一个问题：bitsandbytes官方包默认是不支持win的，所以运行rREADME的快速运行代码会报错。看了网上教程先下载了bitsandbytes-windows（貌似目前最高是0.37.5），然后也修改了相应位置的bitsandbytes至bitsandbytes-windows，但是还是报如下错误：

Traceback (most recent call last):
  File "F:\OneKE\try.py", line 26, in <module>
    model = AutoModelForCausalLM.from_pretrained(
  File "C:\Users\Administrator\.conda\envs\OneKE\lib\site-packages\transformers\models\auto\auto_factory.py", line 563, in from_pretrained
    return model_class.from_pretrained(
  File "C:\Users\Administrator\.conda\envs\OneKE\lib\site-packages\transformers\modeling_utils.py", line 3165, in from_pretrained
    hf_quantizer.validate_environment(
  File "C:\Users\Administrator\.conda\envs\OneKE\lib\site-packages\transformers\quantizers\quantizer_bnb_4bit.py", line 62, in validate_environment
    raise ImportError(
ImportError: Using `bitsandbytes` 8-bit quantization requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes: `pip install -i https://pypi.org/simple/ bitsandbytes`

接着也尝试了重装transformers包的办法，失败告终，还是同样的错误。
再尝试按照这位老师的教程，重新安装支持windows的bitsandbytes版本，可以看原po的github地址。目前还在缓慢下载中。
下载失败

2.ubantu服务器版

接下来把之前充钱还没用完的服务器捡回来，GPU配置为：A40(48GB) * 1，也是配置了环境，中间老是出现 socket time out的错，真的吐血。
尝试根据issue解决问题，也就是加两行代码：

import os
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'

这次没有继续报错，但是我直接上传了本地下载好的文件，结果因为网络原因没能把模型文件全部上传成功，所以报错如下：

bin /root/miniconda3/envs/OneKE/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
/root/miniconda3/envs/OneKE/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /root/miniconda3/envs/OneKE did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/root/miniconda3/envs/OneKE/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib64'), PosixPath('/usr/local/nvidia/lib')}
  warn(msg)
/root/miniconda3/envs/OneKE/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/root/miniconda3/envs/OneKE/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//autodl-container-8ce5118fae-4922fd94'), PosixPath('8888/jupyter'), PosixPath('http')}
  warn(msg)
/root/miniconda3/envs/OneKE/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('Asia/Shanghai')}
  warn(msg)
/root/miniconda3/envs/OneKE/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('https'), PosixPath('6443'), PosixPath('//u291766-8fae-4922fd94.neimeng.seetacloud.com')}
  warn(msg)
/root/miniconda3/envs/OneKE/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//hf-mirror.com'), PosixPath('https')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/root/miniconda3/envs/OneKE/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /root/miniconda3/envs/OneKE/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
Loading checkpoint shards:   0%|                                                                              | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/root/miniconda3/envs/OneKE/lib/python3.9/site-packages/transformers/modeling_utils.py", line 488, in load_state_dict
    return torch.load(checkpoint_file, map_location=map_location)
  File "/root/miniconda3/envs/OneKE/lib/python3.9/site-packages/torch/serialization.py", line 797, in load
    with _open_zipfile_reader(opened_file) as opened_zipfile:
  File "/root/miniconda3/envs/OneKE/lib/python3.9/site-packages/torch/serialization.py", line 283, in __init__
    super().__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/miniconda3/envs/OneKE/lib/python3.9/site-packages/transformers/modeling_utils.py", line 492, in load_state_dict
    if f.read(7) == "version":
  File "/root/miniconda3/envs/OneKE/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 128: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/autodl-tmp/try.py", line 33, in <module>
    model = AutoModelForCausalLM.from_pretrained(
  File "/root/miniconda3/envs/OneKE/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
    return model_class.from_pretrained(
  File "/root/miniconda3/envs/OneKE/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3175, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/root/miniconda3/envs/OneKE/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3548, in _load_pretrained_model
    state_dict = load_state_dict(shard_file)
  File "/root/miniconda3/envs/OneKE/lib/python3.9/site-packages/transformers/modeling_utils.py", line 504, in load_state_dict
    raise OSError(
OSError: Unable to load weights from pytorch checkpoint file for 'zjunlp/OneKE/pytorch_model-00001-of-00003.bin' at 'zjunlp/OneKE/pytorch_model-00001-of-00003.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

毕竟24G.。。为什么不用git，因为这个服务器下载真的慢，配置环境都下载了很久。最终运行出来啦！！！
服务器运行结果
接下来我改了下schema和input，想着跑跑自己目标场景的东西，其中input的文本为800字，然后就开始报错了：

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.25 GiB (GPU 0; 47.37 GiB total capacity; 41.32 GiB already allocated; 2.87 GiB free; 43.50 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

汗流浃背了属实是，目前使用的是4b版本，接下来尝试改变max_length=1024, max_new_tokens=512这里的参数，还是一样的报错；接下来尝试缩短input文本，还是一样的错误；直到我把schema变短了，可以运行了，但是提取结果也不是很理想：
instruction、schema和input:

sintruct = "{" \
           "\"instruction\": \"你是一个知识图谱实体知识结构化专家。根据输入具有唯一实体标识码（entity_id）的实体类型(entity_type)以及实体类型间关系（relationship_type）的schema描述，从文本中抽取出相应的实体实例和其属性信息，不存在的属性不输出, 属性存在多值就返回列表，并输出为可解析的json格式。\", " \
           "\"schema\": [{'entity_type': '桥梁', 'entity_id': '2687'}, {'relationship_type': '包含', 'relationship_id': '3385', 'start': {'entity_type': '桥梁', 'entity_id': '2687'}, 'end': {'entity_type': '桥墩', 'entity_id': '2929'}}, {'entity_type': '桥墩', 'entity_id': '2929', 'attributes': {'标识码': '', '分类编码': '', '桩号': '如K5+230', '高度': '', '桥墩类型': '如单柱墩、双柱墩、多柱墩、桁架式墩等', '截面类型': '如矩形、圆形、尖端形、圆端形等', '防撞形式': '如桩支撑系统、人工岛系统、漂浮式保护系统、系缆桩保护系统、防护板系统', '受力特点': '如刚性、柔性', '其他要求': ''}}], " \
           "\"input\":  \1、本桥上部构造采用小箱梁。2、桥墩为钢筋混凝土等截面实心墩，钻孔灌注桩基础，桥墩尺寸为1.5x1.5米，下设钻孔桩直径为1.8米，桩基长度均按摩擦计算。3、左右幅0号桥台为肋板式桥台，左右幅12号桥台为桩柱式桥台，钻孔灌注桩基础，桩基长度按摩擦桩计算。4、本桥支座采用:左右幅桥台及左右幅6号桥墩处均采用 LNR-d345x118mm 圆形滑动型水平力分散型橡胶支座，其余桥墩处采用LNR-d670x199mm 圆形固定型水平力分散型支座。5、桥墩盖梁及桥台帽梁设支座垫石及防震挡块。\"}"

输出：

{"桥梁": {}, "桥墩": {"分类编码": "钢筋混凝土等截面实心墩", "桩号": "1.5x1.5米", "桥墩类型": "实心墩", "截面类型": "矩形", "防撞形式": "钻孔灌注桩基础", "受力特点": "无"}}}

3.Notebook快速开发-试用版

emmn，有36+64h的免费试用，体验模型的话够用了。git下载非常丝滑，当然也可以用其他方法。优点：下载丝滑，不用配置环境，缺点请看下一段。这里进去

免费实例环境图示
重要提醒，实例关了以后其他文件都没了！！！只有这两个文件了！！所以不要尝试上传文件！！！！要关实例之前确认好有没有要下载的东西！！！

在这里插入图片描述
重新下载模型（非常快，一分钟就出来了）：

modelscope download --model ZJUNLP/OneKE

在这里插入图片描述

最终也运行出来啦！！！
魔塔服务器结果