llm入门01-Huggingface核心模块的使用

最新推荐文章于 2024-06-07 09:40:54 发布

奥乐米拉oo

最新推荐文章于 2024-06-07 09:40:54 发布

阅读量718

点赞数 20

文章标签：语言模型

本文链接：https://blog.csdn.net/weixin_47100065/article/details/137395769

版权

Huggingface的安装

直接pip这个包即可

pip install transformers

在使用时可能会因为网络的原因无法访问huggingface，可以采用科学上网，或者使用hf-mirror镜像即可解决问题。

简单进行一个情感分析

from transformers import pipeline
classifier = pipeline("sentiment-analysis")
classifier(
    [
        "I love YUN.",
        "I hate this movie.",
    ]
)

基本流程

——>Tokenizer——>Model——>Post Processing——>

Raw text–>Input IDs–>Logits–>Predictions

This course is amazing–>[101,2023,2607,2003,6429,999,102]–>[-4.3630,4.6859]–>[POSITIVE:99.98%,NEGATIVE:0.11%]

Tokenizer

Tokenzier进行分词，分字及特殊字符–>对每一个token映射得到一个ID，并且得到一些辅助信息（当前词属于哪个句子。。）

from transformers import AutoTokenizer
model = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model)
raw_inputs = [
        "I love YUN.",
        "I hate this movie.",
]
inputs = tokenizer(raw_inputs,padding=True,truncation=True,return_tensors="pt")
print(inputs)

在这里插入图片描述

tokenizer.decode([   101,  1045,  2293, 22854,  1012,   102,    0])

在这里插入图片描述

模型的加载

from transformers import AutoModel

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModel.from_pretrained(checkpoint)

model

在这里插入图片描述
ps：注意attention_mask的设定，否则会计算padding
下一篇将记录模型基本训练方法

奥乐米拉oo

关注

20
点赞
踩
7

收藏

觉得还不错? 一键收藏
打赏
1
评论
llm入门01-Huggingface核心模块的使用

Tokenzier进行分词，分字及特殊字符–>对每一个token映射得到一个ID，并且得到一些辅助信息（当前词属于哪个句子。在使用时可能会因为网络的原因无法访问huggingface，可以采用科学上网，或者使用hf-mirror镜像即可解决问题。ps：注意attention_mask的设定，否则会计算padding。下一篇将记录模型基本训练方法。直接pip这个包即可。
复制链接

扫一扫