访问huggingface模型的两种方式

昵称很野

已于 2024-08-24 17:31:56 修改

阅读量1.2k

点赞数 9

文章标签： python

于 2024-08-24 15:29:01 首次发布

本文链接：https://blog.csdn.net/weixin_42569547/article/details/141499920

版权

总结两种访问huggingface模型的方法。

下面两种方式加载huggingface模型到本地，访问流程都是：
1）首先去“C:\Users\用户\.cache\huggingface\hub”路径下查询有没有缓存的模型；
2）如果没有找到模型则通过镜像加载模型到“C:\Users\用户\.cache\huggingface\hub”路径；
3）下次访问时则直接到“C:\Users\用户\.cache\huggingface\hub”路径下读取模型。

1.使用镜像方式

import os
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'

ps：注意一定要把上面代码放在加载模块的最上面。

2.有上网手段

import os
os.environ['HTTP_PROXY'] = 'http://proxy_ip_address:port'
os.environ['HTTPS_PROXY'] = 'http://proxy_ip_address:port

比如我的，设置成如下格式。

import os
os.environ['HTTP_PROXY'] = 'http://127.0.0.1:33210'
os.environ['HTTPS_PROXY'] = 'http://127.0.0.1:33210'

【用下面代码测试一下】

from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer('moka-ai/m3e-large')
sentences = [
    '山东人爱吃煎饼',
    '山东人爱吃煎饼已经是一个众所周知的事情',
    '山东省没有一个市不卖煎饼',
    '山东人不爱吃煎饼',
    '山东人爱吃煎饼果子'
]

# Sentences are encoded by calling model.encode()
embeddings = model.encode(sentences)

# Compute cosine similarity between all pairs
cos_sim = util.cos_sim(embeddings, embeddings)

# Add all pairs to a list with their cosine similarity score
all_sentence_combinations = []
for i in range(len(cos_sim) - 1):
    for j in range(i + 1, len(cos_sim)):
        all_sentence_combinations.append([cos_sim[i][j], i, j])

# Sort list by the highest cosine similarity score
all_sentence_combinations = sorted(all_sentence_combinations, key=lambda x: x[0], reverse=True)

print("Top-5 most similar pairs:")
for score, i, j in all_sentence_combinations[0:10]:
    print("{} \t {} \t {:.4f}".format(sentences[i], sentences[j], cos_sim[i][j]))