李宏毅老师的作业四也同样是不好对付的,这次我仍然用TensorFlow实现一遍,记录踩坑过程。
迫于心疼我的笔电,这次作业在Kaggle编写程式、而本次作业的数据集,Kaggle上也有现成的。
附上课程作业4的Kaggle地址:点击前往
Introduction for HomeWork
给的数据文件夹总共有三个档案
-
training_label.txt:有 label 的 training data(句子配上 0-negative or 1-postive,+++$+++ 是分隔符)
-
e.g., 1 +++$+++ are wtf … awww thanks !
-
training_nolabel.txt:沒有 label 的 training data(只有句子),用做 semi-supervised learning
-
e.g: hates being this burnt !! ouch
-
testing_data.txt:你要判断 testing data 里的句子是 0 or 1
id,text
0,my dog ate our dinner . no , seriously … he ate it .
1,omg last day sooon n of primary noooooo x im gona be swimming out of school wif the amount of tears am gona cry
2,stupid boys … they ’ re so … stupid !
Load Data and Word2Vector
拷贝ExampleCode的函数式:
def load_training_data(path='data/training_label.txt'):
if 'training_label' in path:
with open(path, 'r') as f:
lines = f.readlines()
lines = [line.strip('\n').split(' ') for line in lines]
x = [line[2:] for line in lines]
y = [line[0] for line in lines]
return x, y
else:
with open(path, 'r') as f:
lines = f.readlines()
x = [line.strip('\n').split(' ') for line in lines]
return x
def load_testing_data(path='data/testing_data'):
with open(path, 'r') as f:
lines = f.readlines()
X = ["".join(line.strip('\n').split(",")[1:])