总结两种访问huggingface模型的方法。
下面两种方式加载huggingface模型到本地,访问流程都是:
1)首先去“C:\Users\用户\.cache\huggingface\hub”路径下查询有没有缓存的模型;
2)如果没有找到模型则通过镜像加载模型到“C:\Users\用户\.cache\huggingface\hub”路径;
3)下次访问时则直接到“C:\Users\用户\.cache\huggingface\hub”路径下读取模型。
1.使用镜像方式
import os
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'
ps:注意一定要把上面代码放在加载模块的最上面。
2.有上网手段
import os
os.environ['HTTP_PROXY'] = 'http://proxy_ip_address:port'
os.environ['HTTPS_PROXY'] = 'http://proxy_ip_address:port
比如我的,设置成如下格式。
import os
os.environ['HTTP_PROXY'] = 'http://127.0.0.1:33210'
os.environ['HTTPS_PROXY'] = 'http://127.0.0.1:33210'
【用下面代码测试一下】
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('moka-ai/m3e-large')
sentences = [
'山东人爱吃煎饼',
'山东人爱吃煎饼已经是一个众所周知的事情',
'山东省没有一个市不卖煎饼',
'山东人不爱吃煎饼',
'山东人爱吃煎饼果子'
]
# Sentences are encoded by calling model.encode()
embeddings = model.encode(sentences)
# Compute cosine similarity between all pairs
cos_sim = util.cos_sim(embeddings, embeddings)
# Add all pairs to a list with their cosine similarity score
all_sentence_combinations = []
for i in range(len(cos_sim) - 1):
for j in range(i + 1, len(cos_sim)):
all_sentence_combinations.append([cos_sim[i][j], i, j])
# Sort list by the highest cosine similarity score
all_sentence_combinations = sorted(all_sentence_combinations, key=lambda x: x[0], reverse=True)
print("Top-5 most similar pairs:")
for score, i, j in all_sentence_combinations[0:10]:
print("{} \t {} \t {:.4f}".format(sentences[i], sentences[j], cos_sim[i][j]))