目录
1、FastText
FastText是一种典型的深度学习词向量的表示方法,它非常简单通过Embedding层将单词映射到稠密空间,然后将句子中所有的单词在Embedding空间中进行平均,进而完成分类操作。

1.1、FastText网络结构
#FastText网络结构
from keras.models import Sequential
from keras.layers import Embedding
from keras.layers import GlobalAveragePooling1D
from keras.layers import Dense
import faxt
vocab_size = 2000
embedding_dim = 100
max_word = 500
class_num = 5
def build_fastText():
model = Sequential()
model.add(Embedding(vocab_size,embedding_dim,input_length = max_word))
model.add(GlobalAveragePooling1D())
model.add(Dense(class_num,activation = 'softmax'))
model.compile(loss='categorical_crossentropy', optimizer='SGD',metrics = ['accuracy'])
return model
if __name__ == '__main__':
model = build_fastText()
print(model.summary())
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 500, 100) 200000
global_average_pooling1d_1 (None, 100) 0
(GlobalAveragePooling1D)
dense_1 (Dense) (None, 5) 505
=================================================================
Total params: 200,505
Trainable params: 200,505
Non-trainable params: 0
_________________________________________________________________
None
1.2、基于fastText的文本分类
#基于fastText的文本分类
import pandas as pd
from sklearn.metrics import f1_score
import fasttext.FastText
train_df = pd.read_csv('新建文件夹/天池—新闻文本分类/train_set.csv', sep='\t', nrows=15000)
train_df['label_ft'] = '__label__' + train_df['label'].astype(str)
train_df[['text', 'label_ft']].iloc[:-5000].to_csv('train.csv', index=None, header=None, sep='\t')
model = fasttext.train_supervised('train.csv', lr=1.0, wordNgrams=2,
verbose=2, minCount=2, epoch=25,

最低0.47元/天 解锁文章
1495

被折叠的 条评论
为什么被折叠?



