项目总结一：情感分类项目（emojify）

最新推荐文章于 2022-07-19 23:39:07 发布

weixin_34209851

最新推荐文章于 2022-07-19 23:39:07 发布

阅读量562

点赞数

文章标签：人工智能

一、Emojifier-V1 模型

1、模型

（1）前向传播过程：

（2）损失函数：计算the cross-entropy cost

（3）反向传播过程：计算dW,db

dz = a - Y_oh[i]
dW = np.dot(dz.reshape(n_y,1), avg.reshape(1, n_h))
db = dz

(4) 参数更新： the stochastic gradient descent algorithm

W = W - learning_rate * dW
b = b - learning_rate * db

2、输入输出数据类型

(1)输入输出数据

输入数据X:句子

输出数据Y：表情

例如：

I am proud of your achievements ?

（2）训练样本、检验样本

训练样本：X contains 127 sentences (strings)，Y contains a integer label between 0 and 4 corresponding to an emoji for each sentence

检验样本：X contains 56 sentences (strings)，Y contains a integer label between 0 and 4 corresponding to an emoji for each sentence

3、结果

（1）训练与检验结果

Training set:
Accuracy: 0.977272727273   #训练集准确率
Test set:
Accuracy: 0.857142857143   #检验集准确率

（2）测试结果：

i adore you ❤️

i love you ❤️

funny lol ?

lets play with a ball ⚾

food is ready ?

not feeling happy ? #错误

（1）the word "adore" does not appear in the training set.Because adore has a similar embedding as love, the algorithm has generalized correctly even to a word it has never seen before. Words such as heart, dear, beloved or adore have embedding vectors similar to love.

（2）ote though that it doesn't get "not feeling happy" correct. This algorithm ignores word ordering, so is not good at understanding phrases like "not happy."

4、总结：

Even with a 127 training examples, you can get a reasonably good model for Emojifying. This is due to the generalization power word vectors gives you.
Emojify-V1 will perform poorly on sentences such as "This movie is not good and not enjoyable" because it doesn't understand combinations of words--it just averages all the words' embedding vectors together, without paying attention to the ordering of words. You will build a better algorithm in the next part.

二、Emojifier-V2 模型

1、模型

（1）Keras and mini-batching

　　most deep learning frameworks require that all sequences in the same mini-batch have the same length.The common solution to this is to use padding. Specifically, set a maximum sequence length, and pad all sequences to the same length.

（2）The Embedding layer

Embedding layer. This example shows the propagation of two examples through the embedding layer. Both have been zero-padded to a length of max_len=5. The final dimension of the representation is (2,max_len,50) because the word embeddings we are using are 50 dimensional.

（3）Building the Emojifier-V2

【Building model】

# GRADED FUNCTION: Emojify_V2

def Emojify_V2(input_shape, word_to_vec_map, word_to_index):
    """
    Function creating the Emojify-v2 model's graph.
    
    Arguments:
    input_shape -- shape of the input, usually (max_len,)
    word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
    word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)

    Returns:
    model -- a model instance in Keras
    """
    
    ### START CODE HERE ###
    # Define sentence_indices as the input of the graph, it should be of shape input_shape and dtype 'int32' (as it contains indices).
    sentence_indices = Input(shape = input_shape, dtype =np.int32)
    
    # Create the embedding layer pretrained with GloVe Vectors (≈1 line)
    embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
    
    # Propagate sentence_indices through your embedding layer, you get back the embeddings
    embeddings = embedding_layer(sentence_indices) 
    
    # Propagate the embeddings through an LSTM layer with 128-dimensional hidden state
    # Be careful, the returned output should be a batch of sequences. 
    # return_sequences: Boolean. Whether to return the last output in the output sequence, or the full sequence.
    X = LSTM(128, return_sequences=True)(embeddings)
    
    # Add dropout with a probability of 0.5
    X = Dropout(rate=0.5)(X)
    
    # Propagate X trough another LSTM layer with 128-dimensional hidden state
    # Be careful, the returned output should be a single hidden state, not a batch of sequences.
    # units: Positive integer, dimensionality of the output space.
    # return_sequences: Boolean. Whether to return the last output in the output sequence, or the full sequence.
    X = LSTM(128, return_sequences=False)(X)   
    
    # Add dropout with a probability of 0.5
    X = Dropout(rate=0.5)(X)
    
    # Propagate X through a Dense layer with softmax activation to get back a batch of 5-dimensional vectors.
    #units: Positive integer, dimensionality of the output space.
    X = Dense(5)(X)
    
    # Add a softmax activation
    X =Activation('softmax')(X)
    
    # Create Model instance which converts sentence_indices into X.
    model = Model(inputs=sentence_indices, outputs=X)
    
    ### END CODE HERE ###
    
    return model

【model summary】

Layer (type)                 Output Shape              Param #   
=================================================================
input_4 (InputLayer)         (None, 10)                0         
_________________________________________________________________
embedding_5 (Embedding)      (None, 10, 50)            20000050  
_________________________________________________________________
lstm_5 (LSTM)                (None, 10, 128)           91648     
_________________________________________________________________
dropout_4 (Dropout)          (None, 10, 128)           0         
_________________________________________________________________
lstm_6 (LSTM)                (None, 128)               131584    
_________________________________________________________________
dropout_5 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 5)                 645       
_________________________________________________________________
activation_2 (Activation)    (None, 5)                 0         
=================================================================
Total params: 20,223,927
Trainable params: 223,877
Non-trainable params: 20,000,050
_________________________________________________________________

【model compile】

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

【model fit】

model.fit(X_train_indices, Y_train_oh, epochs = 50, batch_size = 32, shuffle=True)