【每周一文】Natural Language Processing (almost) From Scratch

本文介绍了一个统一的神经网络架构,用于解决词性标注、词语组块分析、命名实体识别和语义角色标注等NLP任务。通过构建词向量、基于窗口和句子的方法,并结合训练策略,实现序列标注任务。尽管标准流程能接近最优解,但要获得理想效果,还需优化技巧,如多任务学习和模型组合等。
摘要由CSDN通过智能技术生成

概述

本文介绍了一个统一的神经网络架构用于解决自然语言处理各种的各种任务,主要是序列标注任务,包括词性标注(POS)、词语组块分析(Chunking)、命名实体识别(NER)以及语义角色标注(SRL)等。本文主要介绍如何构建这个统一的神经网络以及如何运用一些技巧去提高效果,结论是不需要特殊构建特征工程就可以得到State-of-art结果。

统一标识

为方面后续介绍,提前介绍各类标识。
NN:神经网络
前向神经网络: fθ(.)=fLθ(fL1θ(...f1θ(.)...))
矩阵A中的值: [A]i,j
矩阵A中的列组合成的向量: <A>dwini <script type="math/tex" id="MathJax-Element-7">_i^{d_win}</script>表示第i列附件的d列组合而成的向量, [Adwini]T=([A]1,idwin/2...[A]d1,idwin/2,...,[A]1,i+dwin/2...[A]d1,i+dwin/2)

即d个列向量按照从前到后拼接成一个列向量。
向量x中的某个元素 [x]i
一个向量集合: x1,x2,...,xT 表示为 [x]T1

神经网络构建

词向量构建

将一个词表示为一个向量,即表示为连续空间中的一个点,而不是最原始的ont-hot表示。
1. 每一个词表示为一个向量存储在表中供查询,lookup table(LW),对于每个词向量为 LTW(w)=<W>1w 即大表中的第w列。其中 WRdwrd|D|dD
2. 对于一个输入序列可以表示一个矩阵 LTW([w]T1)=(<W>1w1 ... <W>1wT)
3. 对于任何离散型特征可以进行扩展,如果每个词有多个离散特征,则每一个

Python Natural Language Processing by Jalaj Thanaki English | 31 July 2017 | ISBN: 1787121429 | ASIN: B072B8YWCJ | 486 Pages | AZW3 | 11.02 MB Key Features Implement Machine Learning and Deep Learning techniques for efficient natural language processing Get started with NLTK and implement NLP in your applications with ease Understand and interpret human languages with the power of text analysis via Python Book Description This book starts off by laying the foundation for Natural Language Processing and why Python is one of the best options to build an NLP-based expert system with advantages such as Community support, availability of frameworks and so on. Later it gives you a better understanding of available free forms of corpus and different types of dataset. After this, you will know how to choose a dataset for natural language processing applications and find the right NLP techniques to process sentences in datasets and understand their structure. You will also learn how to tokenize different parts of sentences and ways to analyze them. During the course of the book, you will explore the semantic as well as syntactic analysis of text. You will understand how to solve various ambiguities in processing human language and will come across various scenarios while performing text analysis. You will learn the very basics of getting the environment ready for natural language processing, move on to the initial setup, and then quickly understand sentences and language parts. You will learn the power of Machine Learning and Deep Learning to extract information from text data. By the end of the book, you will have a clear understanding of natural language processing and will have worked on multiple examples that implement NLP in the real world. What you will learn Focus on Python programming paradigms, which are used to develop NLP applications Understand corpus analysis and different types of data attribute. Learn NLP using Python libraries such as NLTK, Polyglot,
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值