读论文《Natural Language Processing (Almost) from Scratch》

本文介绍了一种从头开始的自然语言处理方法,通过神经网络预训练word embedding,用于词性标注、短语识别、命名实体识别和语义角色标注。论文采用多任务学习,共享词嵌入层参数,并提出window和sentence两种模型,分别适用于不同的NLP任务。此外,还讨论了词嵌入的无监督训练和优化目标,以及这种方法对未来研究的启示。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

读论文《Natural Language Processing (Almost) from Scratch》


原文地址:http://blog.csdn.net/qq_31456593/article/details/77504902

introduce

本文也是神经网络语言模型和词嵌入的经典论文,本文与之前的《A Neural Probabilistic Language Model》模型的不同在于,本文的核心目标是训练好的word embedding以完成词性标注 (POS)、短语识别(CHUNK)、命名实体识别(NER) 和语义角色标注 (SRL)等任务。

本文的语言模型网络只是用来预训练word embedding,最后要做的是拿word embedding作为具体任务(其共同目标就是标注)网络第一层(将词的one-hot表示变为word embedding)的参数继续在具体任务中训练。最后获得在POS,CHUNK,NER,SRL上的良好表现。

本文用到了多任务训练的思路,即共享one-hot到word embedding的转化层的参数,在多项任务上进行训练。

method

本文设计了2个网络来完成这些nlp任务,其中一个叫window approach,另一个叫sentence approach,网络结构见下图

其中window approach是基于n-gram模型的改造,窗口大小为n,中心的那个词为中心词,上下文各(n-1)/2个词。而sentence approach是利用卷积获取上下文并将其变成大小一致的中间表示(通过修改卷积核的大小和步伐实现)。两个模型最后都是最大化softmax输出的正确标签类别。

window approach适用于POS,CHUNK,NER, sentence approach 适用于LRS。

word embedding

本文以无监督的方法预训练word embedding以提高在具体工作

Python Natural Language Processing by Jalaj Thanaki English | 31 July 2017 | ISBN: 1787121429 | ASIN: B072B8YWCJ | 486 Pages | AZW3 | 11.02 MB Key Features Implement Machine Learning and Deep Learning techniques for efficient natural language processing Get started with NLTK and implement NLP in your applications with ease Understand and interpret human languages with the power of text analysis via Python Book Description This book starts off by laying the foundation for Natural Language Processing and why Python is one of the best options to build an NLP-based expert system with advantages such as Community support, availability of frameworks and so on. Later it gives you a better understanding of available free forms of corpus and different types of dataset. After this, you will know how to choose a dataset for natural language processing applications and find the right NLP techniques to process sentences in datasets and understand their structure. You will also learn how to tokenize different parts of sentences and ways to analyze them. During the course of the book, you will explore the semantic as well as syntactic analysis of text. You will understand how to solve various ambiguities in processing human language and will come across various scenarios while performing text analysis. You will learn the very basics of getting the environment ready for natural language processing, move on to the initial setup, and then quickly understand sentences and language parts. You will learn the power of Machine Learning and Deep Learning to extract information from text data. By the end of the book, you will have a clear understanding of natural language processing and will have worked on multiple examples that implement NLP in the real world. What you will learn Focus on Python programming paradigms, which are used to develop NLP applications Understand corpus analysis and different types of data attribute. Learn NLP using Python libraries such as NLTK, Polyglot,
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值