fastNLP源码分析

最新推荐文章于 2024-03-22 09:50:05 发布

数学工具构造器

最新推荐文章于 2024-03-22 09:50:05 发布

阅读量718

点赞数

分类专栏： NLP

本文链接：https://blog.csdn.net/TQCAI666/article/details/112250096

版权

这篇博客深入探讨了fastNLP库中的fastHAN、CNNCharEmbedding和BertEmbedding的实现细节，包括代码分析和作者的创新思考。文章还介绍了ConditionalRandomField的正向传播和计算似然值的过程，以及在实际操作中结合字符和词进行文本分类的应用。最后，博主提出尝试使用attention和CNN改进charCNN的意向。

摘要由CSDN通过智能技术生成

fastHAN

self.label_vocab
Out[6]: 
{
   'POS': Vocabulary(['S-root', 'B-NR', 'M-NR', 'E-NR', 'B-NN']...),
 'CWS': Vocabulary(['S', 'B', 'E', 'M']...),
 'NER': Vocabulary(['O', 'B-NT', 'M-NT', 'E-NT', 'B-NR']...),
 'Parsing': Vocabulary(['APP', 'nn', 'nsubj', 'rcmod', 'cpm']...),
 'pos': Vocabulary(['root', 'NR', 'NN', 'VV', 'DEC']...)}

self.char_vocab
Out[10]: Vocabulary(['[unused12]', '有', '的', '厂', '长']...)
len(self.char_vocab)
Out[11]: 8675

内容太多，感觉有点干不动了，先挂起这条线

https://zhuanlan.zhihu.com/p/67106791

CNNCharEmbedding

看代码

txt = ["中华 人民 共和国",
       "中央 人民 政府"]

words
Out[2]: 
tensor([[3, 2, 4],
        [5, 2, 6]])

根据words_to_chars_embedding这个lookup-table转chars

chars.shape
Out[3]: torch.Size([2, 3, 7])
# 2 batch 3 words 7 chars

割最大长度

说实话我不知道+2怎么来的

max_word_len = word_lengths.max()
chars = chars[:, :, :max_word_len]

chars.shape
Out[4]: torch.Size([2, 3, 5])

做Embedding

chars = self.char_embedding(chars)

chars.shape
Out[6]: torch.Size([2, 3, 5, 50])

把字符和batch合并为batch

reshaped_chars = chars.reshape(batch_size * max_len, max_word_len, -1)

reshaped_chars.shape
Out[5]: torch.Size([6, 5, 50</

最低0.47元/天解锁文章

数学工具构造器

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
fastNLP源码分析

文章目录CNNCharEmbeddingCNNCharEmbedding有空提issuetxt = ["中华人民共和国", "中央人民政府"]wordsOut[2]: tensor([[3, 2, 4], [5, 2, 6]])根据words_to_chars_embedding这个lookup-table转charschars.shapeOut[3]: torch.Size([2, 3, 7])# 2 batch 3 words 7 char
复制链接

扫一扫

专栏目录