记写word2vec问题汇总

最新推荐文章于 2022-10-18 22:28:41 发布

dy20174530

最新推荐文章于 2022-10-18 22:28:41 发布

阅读量492

点赞数

本文链接：https://blog.csdn.net/dy20174530/article/details/101036651

版权

1.编码问题

代码：

def read_data(filename):#解压下载的压缩文件 
    #with zipfile.ZipFile(filename) as f:
    data=[]
    f=open(filename,"r")
    for line in f:
        line=line.strip("\n")
        data.append(jieba.lcut(line,False))#将数据转成单词列表
    #data=tf.compat.as_str(list(jieba.cut(f)))
    return data

出现问题：

网上搜索后有人说是加编码

于是改成

f=open(filename,"r",encoding='UTF-8')

就能正确读取了。

2.AttributeError: 'str' object has no attribute 'decode'

好像是pyhton2和python3语法的原因

3.TypeError: unhashable type: 'list'

出问题的代码：

def build_dataset(words):
    count=[['UNK',-1]]
    count.extend(collections.Counter(words).most_common(vocabulary_size-1))#统计单词频数
    dictionary=dict()#top5000存入dictionary
    for word,_ in count:#将单词转换为频数的编号 
        dictionary[word]=len(dictionary)
    data=list()
    unk_count=0
    for word in words:
        if word in dictionary:
            index = dictionary[word]
        else:#没出现在dictionary中编号为零 定为unknown
            index=0
            unk_count+=1
        data.append(index)
    count[0][1]=unk_count
    reverse_dictionary=dict(zip(dictionary.values(),dictionary.keys()))
    return data,count,dictionary,reverse_dictionary#转换后编码、每个单词频数统计、词汇表、反转形式

查了一下各函数的用法

collections.Counter(word)

返回word中各个元素的个数

from collections import Counter
colors = ['red', 'blue', 'red', 'green', 'blue', 'blue']
c = Counter(colors)
print (c)
print(dict(c))

输出：
Counter({'blue': 3, 'red': 2, 'green': 1})
{'red': 2, 'blue': 3, 'green': 1}

most_common（n）

返回top-n的元素

colors = ['red', 'blue', 'red', 'green', 'blue', 'blue']
c = Counter(colors)
print (c.most_common(2))

输出
[('blue', 3), ('red', 2)]

经过一番排查之后发现原来是我自己在存放word时出错了

data.append(jieba.lcut(line,False))#将数据转成单词列表

这里用append，data列表中添加的新元素是每一句话经过分词之后形成的列表，也就是说添加的是列表变量

改成extend（）就好了，添加的是列表里的元素

dy20174530

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
记写word2vec问题汇总

1.编码问题代码：def read_data(filename):#解压下载的压缩文件 #with zipfile.ZipFile(filename) as f: data=[] f=open(filename,"r") for line in f: line=line.strip("\n") data.append(ji...
复制链接

扫一扫