分层softmax_分层注意网络的一个PyTorch实现

本文档介绍了如何利用PyTorch实现一种分层注意力网络(Hierarchical Attention Network,简称HAN),该网络利用双向GRU捕获单词的上下文信息,并包含词级和句子级两层注意力机制。网络使用word2vec进行词嵌入,并采用负对数似然作为损失函数。实验中,网络在Yelp数据集上训练,得到了约64.6%的最高准确率。
摘要由CSDN通过智能技术生成

Hierarchical Attention Networks for Document Classification

We know that documents have a hierarchical structure, words combine to form sentences and sentences combine to form documents. We can try to learn that structure or we can input this hierarchical structure into the model and see if it improves the performance of existing models. This paper exploits that structure to build a classification model.

This is a (close) implementation of the model in PyTorch.

Keypoints

The network uses Bidirectional GRU to capture the contextual information about a word.

There are two layers of attention, one Word level, and another Sentence level.

It uses word2vec for word embeddings.

Negative Log Likelihood is used as the loss function.

The dataset was divided in the ratio 8:1:1 for training, validation, and test respectively.

Note: If you are using NLLLoss from pytorch make sure to use the log_softmax function from the functional class and not softmax

Notebook

The notebook contains was trained on yelp dataset taken from here.

The best accuracy that I got was around ~ 64.6%. This dataset has only 10000 samples and 29 classes. Here is the training loss for the dataset.

Here is the training accuracy for the process.

Here is the validation accuracy for the process.

Attachments

You can find the word2vec model trained on this dataset here and the trained weights of the HAN model here

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值