Bert模型正式使用，跑数据得出表征向量

最新推荐文章于 2023-02-12 22:38:44 发布

ywq9696

最新推荐文章于 2023-02-12 22:38:44 发布

阅读量420

点赞数

分类专栏： 1234 文章标签： tensorflow 人工智能 python

原文链接：http://wap.ihain.cn/thread-203504759-1-1.html http://wap.ihain.cn/thread-203504680-1-1.html http://wap.ihain.cn/thread-203504607-1-1.html http://wap.ihain.cn/thread-203504514-1-1.html http://wap.ihain.cn/thread-203504446-1-1.html http://wap.ihain.cn/threa

版权

1234 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

上篇文章主要是安装问题，其实后面还出现了两个问题，安装TensorFlow后anaconda prompt报错和anaconda打不开的问题

问题解决
1.anaconda prompt报错
大概就是出现下面的问题，只要打开prompt就会出现，无法输入指令。最终查出的办法只有卸载重装anaconda，相当于整个过程重新来过。

但是也找到了成因，就是因为安装的时候使用了下面这个指令……

pip install --ignore-installed --upgrade tensorflow
然而只要写如下即可，但这个是概率因素，不一定会触发，我可能也只是倒霉。。

pip install tensorflow
2.anaconda打开不出现主页面
主要就是点击后出现加载绿圈，随后就一直没有反应，无法打开主页面，而且点击绿圈就会消失，并报错，报错的大概意思就是已经有一个anaconda在运行。这个问题的解决是在CSDN中找到的解决方法，CSDN提供了很多，比如

修改文件法

取消代理法

修改文件并升级法

彻底卸载清楚文件重新安装法

我最终使用的是更改了版本号，但是具体操作忘记了，那个方法也是淹没在网页中看到随便试试的，下次有缘就更新过来。

Bert实际使用
我跑的是老师给的.csv，最终生成每一列的表征向量，老师只给了一段源代码，要求我们自己根据项目情况整理改写，最终整理出一版：

from bert_serving.client import BertClient
import numpy as np
from pandas import read_csv

bc = BertClient(ip="localhost", check_length=False)

def write_txt(root_dir, content):
with open(root_dir, 'a+', encoding='utf-8')as f:
f.write(content)
# 传入参数为root_dir, content；root_dir为需要写入的内容，数据类型为字符串，content为写入的内容，数据类型为字符串。
# 写入content文件，’a’表示在原有内容后追加写入，'utf8'表述写入的编码，可以换成' utf 16'等。

def generate_text(data_path):
items = read_csv(data_path)
items.to_csv('routeName.txt', sep='\t', index=False, header=None, columns=['routeName'], encoding='utf-8')
# 换行分隔，去掉列首，读取名为'routeName'的列，默认是utf-8的编码

generate_text('data/Travel Package Information.csv')

def read_txt(data_path):
with open(data_path, 'r', encoding='utf-8')as f:
lines = f.readlines()
return lines
# 读取文本内容读取结果返给lines 最后输出

def embedding_item(data_path):
lines = read_txt(data_path)
content_list = []
for line in lines:
content_list.append(line.strip("\n"))
http://wap.ihain.cn/thread-202777030-1-1.html
http://wap.ihain.cn/thread-202777096-1-1.html
http://wap.ihain.cn/thread-202777126-1-1.html
http://wap.ihain.cn/thread-203505086-1-1.html
http://wap.ihain.cn/thread-203505046-1-1.html
http://wap.ihain.cn/thread-203504967-1-1.html
http://wap.ihain.cn/thread-203504895-1-1.html
http://wap.ihain.cn/thread-203504837-1-1.html

vec = bc.encode(content_list)
print("vec shape:", vec.shape)
np.save("data/ic routeName.npy", vec)
print("结束")

embedding_item(data_path='routeName.txt')
这个是初步整理，每次只能整理出一列，六万多条，跑了二十多分钟，老师电脑就跑了一秒……真羡慕。

下面是老师给的代码，四列一次性输出，并且可以在末尾算出用时。

from bert_serving.client import BertClient
import numpy as np
from pandas import read_csv
import time

def write_txt(root_dir, content):
with open(root_dir, 'a+', encoding='utf-8') as f:
f.write(content)

http://wap.ihain.cn/thread-203504759-1-1.html
http://wap.ihain.cn/thread-203504680-1-1.html
http://wap.ihain.cn/thread-203504607-1-1.html
http://wap.ihain.cn/thread-203504514-1-1.html
http://wap.ihain.cn/thread-203504446-1-1.html
http://wap.ihain.cn/thread-203504380-1-1.html
http://wap.ihain.cn/thread-203504258-1-1.html
http://wap.ihain.cn/thread-203504182-1-1.html
http://wap.ihain.cn/thread-203504088-1-1.html
http://wap.ihain.cn/thread-203504011-1-1.html
http://wap.ihain.cn/thread-203503917-1-1.html
http://wap.ihain.cn/thread-203503829-1-1.html
http://wap.ihain.cn/thread-203503745-1-1.html
http://wap.ihain.cn/thread-203503676-1-1.html
http://wap.ihain.cn/thread-203503594-1-1.html
http://wap.ihain.cn/thread-203503301-1-1.html
http://wap.ihain.cn/thread-203503107-1-1.html
http://wap.ihain.cn/thread-203503012-1-1.html
def generate_text(data_path):
items = read_csv(data_path)
items.to_csv('data/routeName.txt', sep='\t', index=False, columns=['routeName'], encoding="utf_8", header=0)
items.to_csv('data/destination.txt', sep='\t', index=False, columns=['destination'], encoding="utf_8", header=0)
items.to_csv('data/destinationLarge.txt', sep='\t', index=False, columns=['destinationLarge'], encoding="utf_8",
header=0)
items.to_csv('data/type.txt', sep='\t', index=False, columns=['type'], encoding="utf_8", header=0)

def read_txt(data_path):
with open(data_path, 'r', encoding='utf-8') as f:
lines = f.readlines()
return lines

def embedding_item(feature):
lines = read_txt('data/' + feature + '.txt')
content_list = []
for line in lines:
content_list.append(line.strip("\n"))
vec = bc.encode(content_list)
print(feature + " vec shape:", vec.shape)
np.save("data/" + feature + '.npy', vec)
print(feature + " Embedding结束！")

if __name__ == "__main__":
starttime = time.time()
bc = BertClient(ip="localhost", check_length=False)
generate_text('data/Travel Package Information.csv')
embedding_item(feature='routeName')
embedding_item(feature='destination')
embedding_item(feature='destinationLarge')
embedding_item(feature='type')
endtime = time.time()
running_time = endtime - starttime
print('Running Time:', running_time / 60.0, '分')
————————————————

ywq9696

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Bert模型正式使用，跑数据得出表征向量

上篇文章主要是安装问题，其实后面还出现了两个问题，安装TensorFlow后anaconda prompt报错和anaconda打不开的问题问题解决1.anaconda prompt报错大概就是出现下面的问题，只要打开prompt就会出现，无法输入指令。最终查出的办法只有卸载重装anaconda，相当于整个过程重新来过。但是也找到了成因，就是因为安装的时候使用了下面这个指令……pip install --ignore-installed --upgrade tensorflow然而只要写如
复制链接

扫一扫