近期,看到了一个大模型只是抽取项目OneKE,该项目声称在多个全监督及零样本实体/关系/事件抽取任务上取得了相对较好的效果,并开源了基于Chinese-Alpaca-2-13B全参数微调的版本。
我一直对大模型信息抽取的能力非常感兴趣,看到这个项目后首先想到就是看看目前未针对性进行微调的基础版开源大模型在信息抽取任务上做的怎么样了。
于是着手进行一波主观的测试。
测试针对llama3-8b模型,通过huggingface transformers库加载模型并进行推理
实现代码如下:
- 加载模型
在8G显出的设备上进行测试,因此进行了量化,量化后实际占用显存6G。
import torch
from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM, GenerationConfig,BitsAndBytesConfig
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)
model_path = "J:\llm_model\meta-llama_Meta-Llama-3-8B"
config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
llm_int8_threshold=6.0,
llm_int8_has_fp16_weight=False,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type='nf4',
)
model = AutoModelForCausalLM.from_pretrained(
model_path,
config=config,
device_map="auto",
quantization_config = quantization_config,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
model.eval()
- 提示词构建
测试相对简单的NER任务,这里是data来自于CoNLL2003数据集
prompt_template = ("'instruction':'You are an expert in named entity recognition. "
"Please extract entities that match the schema definition from the input."
"Return a empty list if the entity type does not exist."
"Please respond in the format of a JSON string.', \n"
"'schema':{schema} \n"
"'input':'{text}' \n"
"'result':")
data = {"text": "Only France and Britain backed Fischler 's proposal .",
"entity": [{"entity": "Fischler", "entity_type": "person", "pos": [31, 39]},
{"entity": "France", "entity_type": "location", "pos": [5, 11]},
{"entity": "Britain", "entity_type": "location", "pos": [16, 23]}],
"task": "NER"}
schema = ['person','location']
sinstruct = prompt_template.format(schema=schema, text=data['text'])
print(sinstruct)
- 执行推理
input_ids = tokenizer.encode(sinstruct, return_tensors='pt').to(device)
input_length = input_ids.size(1)
generation_output = model.generate(input_ids=input_ids,
generation_config=GenerationConfig(max_length=512,
max_new_token=256,
return_dict_in_generate=True),
pad_token_id = tokenizer.eos_token_id,
eos_token_id = tokenizer.eos_token_id,
repetition_penalty=1.2,
# no_repeat_ngram_size=2,
)
generation_output = generation_output.sequences[0]
generation_output = generation_output[input_length:]
output = tokenizer.decode(generation_output, skip_special_tokens=True)
print(output)
- 模型输出:
针对不同的repetition_penalty,产生了几个不同的结果
['France','Britain']
[{'type': "person", 'name': ["Fischer"]}, {'type": "location","name":["France"]}]}
[{'type': "Person", value: ["Fischer"]}, {'Type": location, Value :["France","Britain"]}]}
- 总结
仅从这一个数据就可以看到,即使是相对简单的NER任务,量化后的8b模型默认参数也无法保证质量稳定
遇到问题:
①显存限制: 通过量化解决
②模型生成结果为不断重复字符串: 询问chatglm和gpt,添加eos_token_id
、repetition_penalty
、no_repeat_ngram_size
参数解决