paddlenlp文本摘要模型实现

bianchencainiao

已于 2023-04-23 15:31:32 修改

阅读量868

点赞数

文章标签：自然语言处理 python

于 2023-04-10 17:43:33 首次发布

本文链接：https://blog.csdn.net/bianchencainiao/article/details/130057787

版权

1、创建虚拟环境

conda create -n paddle python==3.8

2、下载库

pip install paddlenlp
pip install paddlepaddle

升级库

pip3 install --upgrade 库名

PaddleNLP帮助文档
程序

from paddlenlp import Taskflow
 
text ="1912年,河南大学的前身河南留学欧美预备学校,在古城开封清代贡院旧址诞生,首任校长为著名教育家林伯襄先生。1936年，河南大学南大门建成后，学校就将校训用柳体金字镌刻在正门内侧的门楣之上，正中上额横书“止于至善”，左书“明德”，右书“新民”，八字校训耀眼夺目，发人深省，予河大学子以光大学术，恢宏文化的启示，一入校门便油然而生对国家、民族崇高无上的历史责任感。"
 
# 分词
def word_segmentation():
    # 没有CPU时，device_id设置为-1
    seg = Taskflow("word_segmentation", device_id=-1)
    my_list = seg(text)
    print(my_list)
 
    pass
 
# 命名实体时别
def entity_recognition():
    seg = Taskflow("ner", device_id=-1)
    my_list = seg(text)
    print(my_list)
 
 
if __name__ == '__main__':
    # word_segmentation()
    # entity_recognition()
    summarizer = Taskflow('text_summarization',model='IDEA-CCNL/Randeng-Pegasus-523M-Summary-Chinese-V1')
    res = summarizer('你们有没有感觉这个车子空调功率有问题 手机远程启动空调，或者不猜刹车只按启动键，这2种情况下得空调效果很弱，制冷效果差。踩住刹车再按启动键(可以开始驾驶)，空调的风速和出风温度立马就加强了，感觉是之前只用了-小半功率在运行。问4S店，说这个是正常现象，有没有其他车友也遇到这种情况的')
    
    # corrector= Taskflow('text_correction')
    # res = corrector(['人生就是如此,经过磨练才能让自己更加拙壮','遇到逆竟时，我们必须勇于面对'])
    print(res)

报错解决：openpyxl Value must be either numerical or a string containing a wildcard 将openpyxl库版本降到3.0.9
报错解决：partially initialized module ‘charset_normalizer’ has no attribute ‘md__mypyc’：

pip install --force-reinstall charset-normalizer==3.1.0 --user

信息抽取：

from pprint import pprint
from paddlenlp import Taskflow

schema = {'评价维度': ['观点词', '情感倾向[正向，负向]']} # Define the schema for entity extraction
ie = Taskflow('information_extraction', schema=schema)
pprint(ie("店面干净，很清静，服务员服务热情，性价比很高，发现收银台有排队")) # Better print results using pprint

其他框架实现：

from transformers import PegasusForConditionalGeneration
# Need to download tokenizers_pegasus.py and other Python script from Fengshenbang-LM github repo in advance,
# or you can download tokenizers_pegasus.py and data_utils.py in https://huggingface.co/IDEA-CCNL/Randeng_Pegasus_523M/tree/main
# Strongly recommend you git clone the Fengshenbang-LM repo:
# 1. git clone https://github.com/IDEA-CCNL/Fengshenbang-LM
# 2. cd Fengshenbang-LM/fengshen/examples/pegasus/
# and then you will see the tokenizers_pegasus.py and data_utils.py which are needed by pegasus model

from fengshen.examples.pegasus.tokenizers_pegasus import PegasusTokenizer

model = PegasusForConditionalGeneration.from_pretrained("Randeng-Pegasus-523M-Summary-Chinese-V1")
tokenizer = PegasusTokenizer.from_pretrained("Randeng-Pegasus-523M-Summary-Chinese-V1")

text = "在北京冬奥会自由式滑雪女子坡面障碍技巧决赛中，中国选手谷爱凌夺得银牌。祝贺谷爱凌！今天上午，自由式滑雪女子坡面障碍技巧决赛举行。决赛分三轮进行，取选手最佳成绩排名决出奖牌。第一跳，中国选手谷爱凌获得69.90分。在12位选手中排名第三。完成动作后，谷爱凌又扮了个鬼脸，甚是可爱。第二轮中，谷爱凌在道具区第三个障碍处失误，落地时摔倒。获得16.98分。网友：摔倒了也没关系，继续加油！在第二跳失误摔倒的情况下，谷爱凌顶住压力，第三跳稳稳发挥，流畅落地！获得86.23分！此轮比赛，共12位选手参赛，谷爱凌第10位出场。网友：看比赛时我比谷爱凌紧张，加油！"
inputs = tokenizer(text, max_length=1024, return_tensors="pt")

# Generate Summary
summary_ids = model.generate(inputs["input_ids"])
tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

# model Output: 自由式滑雪女子坡面障碍技巧决赛谷爱凌摘银