![](https://img-blog.csdnimg.cn/20201014180756923.png?x-oss-process=image/resize,m_fixed,h_64,w_64)
NLP
文章平均质量分 52
步步咏凉天
这个作者很懒,什么都没留下…
展开
-
人民日报训练word2vec实验
数据集人民日报:2020年10月04日-2021年10月04日概况25590 articles742362 sentences0.021 billion words294730 tokens182004942 pairs (window size: 5)词云(120 words)训练参数vector dimension: 100window size: 5K: 5batch size: 50epoch: 10learning rate: 0.025训练结果原创 2021-11-16 08:56:33 · 517 阅读 · 2 评论 -
【文献阅读】Hybrid model for Chinese character recognition based on Tesseract-OCR
Tesseract-OCR engine+KNN+LSTMIntroductionChinese OCR is more difficultThe number of English letters is only 26. But the number of Chinese characters that used commonly are about 2,500.the strokes of Chinese characters are complex and similar.The dif原创 2021-11-15 19:34:13 · 1525 阅读 · 0 评论 -
【文献阅读】基于深层语言模型的古汉语知识表示及自动断句研究
概述:BERT+CRF/CNN实现古文知识表示和断句2 古汉语自动断句模型条件随机场是一种经典的序列标注模型,在中文分词、词性标注、命名实体识别等自然语言处理任务中均有着广泛应用Zheng X,ChenJ,Shang G.Deep neuralnetwork-basedChinesesemanticrolelabeling[J/OL].ZTECommunications,2018:1-12.http://kns.cnki.net/kcms/detail/34.1294.TN.20180102.1045原创 2021-11-11 16:29:46 · 1272 阅读 · 0 评论 -
pytorch实现BiLSTM
import torch.nn as nnclass BidirectionalLSTM(nn.Module): def __init__(self, input_size, hidden_size, output_size): super(BidirectionalLSTM, self).__init__() self.rnn = nn.LSTM(input_size, hidden_size, bidirectional=True, batch_first原创 2021-11-07 10:41:24 · 2719 阅读 · 1 评论