深度学习模型在文本数据上的实践应用 FastText CNN RNN LSTM RCNN Seq2Seq Attention
深度网络模型使用keras实现。
Keras提供了一个Sequential模型API。
它是创建深度学习模型的一种相对简单的方法,我们通过创建Keras的Sequential类别实例(instance),然后创建模型图层并添加到其中。
一层一层向Sequential实例中添加网络层即可。
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# 构建模型
model = Sequential()
model.add(Dense(2,input_shape=(1,)))
model.add(Dense(1))
本文使用到有的模型涉及到多输入或者多输出操作,这里使用keras函数式API来构建模型。
FastTest要求输入的数据和其他模型不太一样,数据处理这里不做介绍。
FastTest
fasttext是facebook开源的一个词向量与文本分类工具,在学术上没有太多创新点,好处是模型简单,训练速度非常快。
fastText把文档中所有词通过lookup table变成向量,取平均之后直接用线性分类器得到分类结果。fastText和ACL-15上的deep averaging network(DAN,如下图)比较相似,是一个简化的版本,去掉了中间的隐层。
安装方式:pip install fastText 如果报错请下载whl文件安装。
fastText做文本分类要求文本是如下的存储形式:
__label__1 , The high-level seminar, attended by Luo Huining, director of the Liaison Office of the Central People's Government in the HKSAR, and other high-ranking national and local officials, was held to celebrate the 100th anniversary of the Party, which falls on July 1.
代码实现: fastText可以训练监督式和非监督式的,这里训练一个监督模型, 返回一个模型对象
import fasttext
classifier = fasttext.train_supervised(input='train_data.txt', dim=100, epoch=5,
lr=0.1, wordNgrams=2, loss='softmax')
classifier.save_model('classifier.model')
CNN代码实现
下面代码实现的网络结果是这样的:
from tensorflow.keras import Input, Model
from tensorflow.keras.layers import Embedding, Dense, Conv1D, GlobalMaxPooling1D, Concatenate, Dropout
class TextCNN(object):
def __init__(self, maxlen, max_features, embedding_dims,
class_num=5,
last_activation='softmax'):
self.maxlen = maxlen
self.max_features = max_features
self.embedding_dims = embedding_dims
self.class_num = class_num
self.last_activation = last_activation
def get_model(self):
input = Input((self.maxlen,))
embedding = Embedding(self.max_features, self.embedding_dims, input_length=self.maxlen)(input)
convs = []
for kernel_size in [3, 4, 5]:
c = Conv1D(128, kernel_size, activation='relu')(embedding)
c = GlobalMaxPooling1D()(c)
convs.append(c)
x = Concatenate()(convs)
output = Dense(self.class_num, activation=self.last_activation)(x)
model = Model(inputs=input, outputs=output)
return model
RNN代码实现
下面代码实现的网络结果是这样的:
from tensorflow.keras import Input, Model
from tensorflow.keras.layers import Embedding, Dense, Dropout, LSTM
class TextRNN(object):
def __init__(self, maxlen, max_features, embedding_dims,
class_num=5,
last_activation='softmax'):
self.maxlen = maxlen
self.max_features = max_features
self.embedding_dims = embedding_dims
self.class_num = class_num
self.last_activation = last_activation
def get_model(self):
input = Input((self.maxlen,))
embedding = Embedding(self.max_features, self.embedding_dims, input_length=self.maxlen)(input)
x = LSTM(128)(embedding)
output = Dense(self.class_num, activation=self.last_activation)(x)
model = Model(inputs=input, outputs=output)
return model
Bidirectional双向RNN结构
下面代码实现的网络结果是这样的:
from tensorflow.keras import Input, Model
from tensorflow.keras.layers import Embedding, Dense, Dropout, Bidirectional, LSTM
class TextBiRNN(object):
def __init__(self, maxlen, max_features, embedding_dims,
class_num=5,
last_activation='softmax'):
self.maxlen = maxlen
self.max_features = max_features
self.embedding_dims = embedding_dims
self.class_num = class_num
self.last_activation = last_activation
def get_model(self):
input = Input((self.maxlen,))
embedding = Embedding(self.max_features, self.embedding_dims, input_length=self.maxlen)(input)
x = Bidirectional(LSTM(128))(embedding)
output = Dense(self.class_num, activation=self.last_activation)(x)
model = Model(inputs=input, outputs=output)
return model
RNN与CNN结合
具体原理参考论文
下面代码实现的网络结果是这样的:
from tensorflow.keras import Input, Model
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Embedding, Dense, SimpleRNN, Lambda, Concatenate, Conv1D, GlobalMaxPooling1D
class RCNN(object):
def __init__(self, maxlen, max_features, embedding_dims,
class_num=5,
last_activation='softmax'):
self.maxlen = maxlen
self.max_features = max_features
self.embedding_dims = embedding_dims
self.class_num = class_num
self.last_activation = last_activation
def get_model(self):
input_current = Input((self.maxlen,))
input_left = Input((self.maxlen,))
input_right = Input((self.maxlen,))
embedder = Embedding(self.max_features, self.embedding_dims, input_length=self.maxlen)
embedding_current = embedder(input_current)
embedding_left = embedder(input_left)
embedding_right = embedder(input_right)
x_left = SimpleRNN(128, return_sequences=True)(embedding_left)
x_right = SimpleRNN(128, return_sequences=True, go_backwards=True)(embedding_right)
x_right = Lambda(lambda x: K.reverse(x, axes=1))(x_right)
x = Concatenate(axis=2)([x_left, embedding_current, x_right])
x = Conv1D(64, kernel_size=1, activation='tanh')(x)
x = GlobalMaxPooling1D()(x)
output = Dense(self.class_num, activation=self.last_activation)(x)
model = Model(inputs=[input_current, input_left, input_right], outputs=output)
return model
TextAttBiRNN双向注意力LSTM神经网络文本分类
TextAttBiRNN是在双向LSTM文本分类模型的基础上改进的,主要是引入了注意力机制(Attention)。对于双向LSTM编码得到的表征向量,模型能够通过注意力机制,关注与决策最相关的信息。其中注意力机制原理参考论文学习
下面代码实现的网络结果是这样的:
Attention机制在keras中不是一个完整的封装好直接导入使用的模型,这里我们使用源码,使用时直接创建Attention对象使用即可。
from tensorflow.keras import backend as K
# from tensorflow.python.keras import backend as K
from tensorflow.keras import initializers, regularizers, constraints
from tensorflow.keras.layers import Layer
class Attention(Layer):
def __init__(self, step_dim,
W_regularizer=None, b_regularizer=None,
W_constraint=None, b_constraint=None,
bias=True, **kwargs):
"""
Keras Layer that implements an Attention mechanism for temporal data.
Supports Masking.
Follows the work of Raffel et al. [https://arxiv.org/abs/1512.08756]
# Input shape
3D tensor with shape: `(samples, steps, features)`.
# Output shape
2D tensor with shape: `(samples, features)`.
:param kwargs:
Just put it on top of an RNN Layer (GRU/LSTM/SimpleRNN) with return_sequences=True.
The dimensions are inferred based on the output shape of the RNN.
Example:
# 1
model.add(LSTM(64, return_sequences=True))
model.add(Attention())
# next add a Dense layer (for classification/regression) or whatever...
# 2
hidden = LSTM(64, return_sequences=True)(words)
sentence = Attention()(hidden)
# next add a Dense layer (for classification/regression) or whatever...
"""
self.supports_masking = True
self.init = initializers.get('glorot_uniform')
self.W_regularizer = regularizers.get(W_regularizer)
self.b_regularizer = regularizers.get(b_regularizer)
self.W_constraint = constraints.get(W_constraint)
self.b_constraint = constraints.get(b_constraint)
self.bias = bias
self.step_dim = step_dim
self.features_dim = 0
super(Attention, self).__init__(**kwargs)
def build(self, input_shape):
assert len(input_shape) == 3
self.W = self.add_weight(shape=(input_shape[-1],),
initializer=self.init,
name='{}_W'.format(self.name),
regularizer=self.W_regularizer,
constraint=self.W_constraint)
self.features_dim = input_shape[-1]
if self.bias:
self.b = self.add_weight(shape=(input_shape[1],),
initializer='zero',
name='{}_b'.format(self.name),
regularizer=self.b_regularizer,
constraint=self.b_constraint)
else:
self.b = None
self.built = True
def compute_mask(self, input, input_mask=None):
# do not pass the mask to the next layers
return None
def call(self, x, mask=None):
features_dim = self.features_dim
step_dim = self.step_dim
e = K.reshape(K.dot(K.reshape(x, (-1, features_dim)), K.reshape(self.W, (features_dim, 1))),
(-1, step_dim)) # e = K.dot(x, self.W)
if self.bias:
e += self.b
e = K.tanh(e)
a = K.exp(e)
# apply mask after the exp. will be re-normalized next
if mask is not None:
# cast the mask to floatX to avoid float64 upcasting in theano
a *= K.cast(mask, K.floatx())
# in some cases especially in the early stages of training the sum may be almost zero
# and this results in NaN's. A workaround is to add a very small positive number ε to the sum.
a /= K.cast(K.sum(a, axis=1, keepdims=True) + K.epsilon(), K.floatx())
a = K.expand_dims(a)
c = K.sum(a * x, axis=1)
return c
def compute_output_shape(self, input_shape):
return input_shape[0], self.features_dim
"""
TextAttBiRNN使用
"""
from tensorflow.keras import Input, Model
from tensorflow.keras.layers import Embedding, Dense, Dropout, Bidirectional, LSTM
class TextAttBiRNN(object):
def __init__(self, maxlen, max_features, embedding_dims,
class_num=5,
last_activation='softmax'):
self.maxlen = maxlen
self.max_features = max_features
self.embedding_dims = embedding_dims
self.class_num = class_num
self.last_activation = last_activation
def get_model(self):
input = Input((self.maxlen,))
embedding = Embedding(self.max_features, self.embedding_dims, input_length=self.maxlen)(input)
x = Bidirectional(LSTM(128, return_sequences=True))(embedding) # LSTM or GRU
x = Attention(self.maxlen)(x)
output = Dense(self.class_num, activation=self.last_activation)(x)
model = Model(inputs=input, outputs=output)
return model
总结
在这篇文章中有一些个人学习到的重点:
使用keras也可以很灵活地来构建复杂的深度学习网络
每一种深度学习网络拓补基本上都可以找得到一篇论文
了解每种深度学习网络拓补架构的原理与应用的方向是强化内力的不二法门。