关于对话文本数据,文本生成任务pointer_generate_network模型的总结

1 数据部分

数据来源于司机和汽车司机的对话内容,司机的汽车遇到问题,向技师请教汽车可能出现的问题。标签是关于司机和技师之间问题对话内容之间的总结。将数据经过程序处理成batch内容之后,将技师与汽车司机的对话内容,用encoder_input表示,其中对应的mask部分(将非0的表示为1,0表示为0)用encoder_mask表示。将其中的标签部分,还有encoder_input_en。和encoder_input不同的是,encoder_input_toen表示的是将在词表中表示unk的部分,通过记录其中的索引,表示为一个oov的词的位置所对应的那个词。decoder_input在句子的起始位置加上<start>标识符,如果结尾不够最大decoder的值的时候,加入<stop>字符。如果超过了最大decoder的长度后,将其截断为最大decoder的长度。还有对应的是decoder_en,对应的是decoder_input,区别只是每句话之前没有对应的起始符<start>,当然decoder_input也有对应mask为decode_mask。其中还记录了每个batch对应的oov的长度。

2 模型部分

encoder部分:是由一个embedding部分,上面再接一个双向GRU模型。 主要用途是为了计算每个样本token之间的语义相互关系。输出为对应的三维向量表示的每个batch之间的语义关系的token,和对应的可以表示整个句子意思的hidden_state

decoder部分,由一个embeding部to分,上面再接一个GRU模型(这里用的是每次输出一个时间步的当前token对应的语义信息,还有对应的hidden_state状态)。然后通过和encoder计算attention。然后得到encoder和decoder的语义关系的context。然后context和gru模型当前时间步的输出进行语义特征加和,再经过一个线性表示的网络,得到最后vocab_size维度的输出。

attention部分。这里主要使用了encoder部分的输出,decoder部分的gru部分的输出,以及coverage输出。coverage表示的是当前时间步之前的attention部分的加和,作用是为了记录关注过token的注意力,避免出现之前出现过的词,避免重复。首次时间步的时候,因为没有之前attention的信息,所以coverage为0。decoder部分的首次时间步,使用 的是encoder部分的hidden_state输出。将encoder部分的encoder_out和decoder的decoder_out和coverage的部分,通过线性变换,得到对应维度的attention,然后通过token部分的softmax得到每个token所表示的句子中的各自权重值,然后和encoder_out计算对应batch中各自样本,所表示的语义关系经过权重后的值。然后将之前的coverage和当前部分的attention值加和得到当前时间步对应的coverage值。

pgen表示的是当前时间步选择decoder部分的预测,还是对应的encoder部分各自样本对应权重的token的部分的预测。pgen的大小为0~1之间。是通过context、dec_x()、decoder_hidden_state三者经过一定的线性权重计算得到关系,然后经过softmax计算得到的值。

3、训练部分

将司机和技师对话内容的encoder_input经过encoder模型得到encoder_out和en_hidden_state。在计算decoder部分的时候,选择当前decoder_input的长度作为轮询的时间步长,每一个时间步,选择当前decoder_input对应的batch的decoder_en经过decoder模型,得到decoder_x(经过embedding得到的)。经过decoder_hidden和docoder_out(gru模型得到),然后将decoder_hidden和encoder_out和coverage,经过神经网络计算得到attention,在计算attention的时候需要注意当前对应的token可能并不存在,因此权重为0,所以还要经过encoder_input_mask的变换,让那些不存在token的位置,得到的权重为0。通过加和之前步长的attention得到当前的coverage。将计算的attention经过softmax得到各自样本的token的权重。然后与encoder_output得到context的信息。加context信息和当前的decoder_out特征加和,再经过神经网络的变换得到vocab_size维度的概率分布。

计算pgen,衡量当前时间步选择decoder_out的概率还是attention的概率所对应token。将decoder_x和decoder_hidden和context通过神经网络的变化维度后,经过函数转换,得到区间为0-1的范围。

将经过softmax变换后的attention和decoder_out,pgen通过列表储存起来。

当所有decoder_input对应的时间步长完成之后,将每个时间步长的decoder_ouput和attention分别和pgen相乘,公式为decoder_ouput*pgen和attention*(1-pgen),将经过pgen计算过的decoder_out的词汇量长度和oov_len的长度加和,将超过vocab_size的词表位置设为0。将attention对应的token注意力概率,和原始token位置对应,得到新的vovab_size+oov_len对应空间维度位置下的概率。然后将对应每一个时间步长变换后的decoder_out+attention加和,得到最后的final_dist。

将得到每个时间步长的final_dist和对应的文本标签的decoder_en对应位置的当前token的概率拿出来,然后计算加和,然后将所有batch样本加和,求平均得到最后的输出averge_loss。将所有当前之前步长的coverage概率,和当前步长的attention概率,取对应位置上的最小概率,然后加和计算得到当前步长的loss。然后加和batch样本的loss,之后再加和所有步长的coverage和attentuon计算得到loss。得到coverage的loss。然后和averge加和得到当前batch内的decoder的loss,然后依次计算,然后得到对应的loss。

4、测试部分

在预测生成部分的时候,现在直到的有2种方法。一种是greedy方法,一种是beam_size的方法。

greedy方法:

这种方法用的是最大优化当前概率的方法。选择任意batch内的数据,将encoder_input的batch数据,经过gru得到encoder_output和hiddenstate。然后将batch内的数据经过设置好的最大输出长度的,依次开始循环,循环结束后,就可以将得到的token组成为一句话。起始位置的时候,通过标识符<start>,设置个数为batch个,记作为decoder_input。经过decoder的embedding得到decoder_x,然后再经过gru得到decoder_output合decoder_hidden。将decoder_hidden和encoder_output计算得到encoder和decoder相关的注意力attenion,attention计算完成之后,需要查看当前对应的样本token是否为0,如果为0需要将当前位置的0清除。然后结合语义信息计算得到context,attention和之前coverage加和得到当前时间步的coverage。将context和decoder_out通过神经网络的作用得到vocab_size大小的输出final_output。将当前时间步的context、decoder_x、decoder_hidden通过神经网络的作用,再通过函数的作用得到区间为0-1大小的pgen。然后将当前时间步的final_output和attention通过pgen的计算,然后加和得到当前时间步最终的输出finals_output。选择当前batch内的最大概率为最终的预测token的概率,通过索引得到其对应的文本信息。

依次循环,将当前的token当作下一时间步的decoder_input。得到下一时间步的最终输出。当所有时间步完成后,则得到了输出的生成的文本摘要。如果文本中间有<stop>标志,将其从这个标识进行截断。

beam_size方法:

beam_size的方法是,这个程序的设计是保证当前的beam_size和batch是相等的。然后将每个样本的数据复制batch份。然后将batch的数据,经过encoder模型得到encoder_out和encoder_hidden的输出。在decoder部分,程序中是设计了一个类。这个类可以存放当前是生成的token、对应的概率,对应当前的token所对应的coverage,hidden。然后将beam_size个假设的token作为decoder_input,经过embedding,得到decoder_x,经过gru得到decoder_output和decoder_hidden,decoder_hidden和encoder_input以及当前时间步对应的coverage计算得到attention,计算完成attention之后,还要根据当前的encoder_mask,重新对attenion进行正则化,基于attention然后计算coverage。然后attention和encoder_input得到context。decoder和context经过加和得到更丰富的语义信息,然后经过神经网络的作用得到decoder组后的输出,final_output。context和decoder_x、decoder_hidden经过神经网络计算,经过函数作用得到区间为0-1的值pgen。pgen和对应当前时间步的attention和final_output,通过词表维度变换,得到当前时间步的最终概率输出final_dist。

得到的final_dist取前k个概率最大的token,然后将token对应的索引和概率结果赋值给新的变量。重新设置新的假设,将每个前k的token对应的hidden、概率、coverage用类来保存。然后得到对应的假设之后,取出概率前beam_size的假设,拿出对应的token,hidden,coverage作为新一轮的输入。依次这样计算,当得到满足最大时间步的概率对应的token。等满足了最大decode_length或者是提前遇到了<stop>之后,将对应的token放入结果集。达到对应设置的results结果之后,对概率在前几位的句子进行排名,把最优的结果拿出来,转换为当前的最优文本摘要。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Preface In today’s world, where computer viruses and security threats are common themes in anything from Hollywood movies and TV advertisements to political discussions, it seems unthinkable to ignore security considerations in the design and implementation of any network. However, it is only in the past 4–5 years that talkative security experts have been invited to the design table from the start. The common thinking only 5 years ago was either: this is somebody else’s problem or let us design the major functionalities first, then bring in a cryptographer to secure it! This treatment of security as an add-on feature typically led either to design delays, overheads and extra costs when the “feature” had to be included, or to ignored security provisioning when the “feature” was not a must. The problem, of course, stemmed from the fact that security “features” have rarely been revenue-makers. As we all know, many political, social and economic events in the last half decade have forced the designers, regulators and businessmen to adjust their attitudes towards security consider- ations. People realized that although security measures are not revenue-makers, their lack is indeed a deal breaker, to say the least, or has catastrophic aftermaths, at worst. The Internet Engineering Task Force (IETF) has also played an important role in estab- lishing the aforementioned trend by making a few bold moves. The rejection of some very high profile specifications due to the lack of proper security considerations was a message to the industry that security is not to be taken lightly. This was done in a dot.com era where the Internet and its applications seemed to have no boundaries and security provisioning seemed to be only a barrier rather than an enabler. As a result of this trend, the field of network security gained a lot of attention. A profession that seemed to belong only to a few mathematically blessed brains opened up to a community of practitioners dealing with a variety of networking and computing applications. Many stan- dards, such as 802.1X, IPsec and TLS, were developed to apply cryptographic concepts and algorithms to networking problems. Many books were written on the topics of security and cryptography, bringing the dark and difficult secrets of fields such as public key crypto- graphy to a public that typically was far less mathematically savvy than the original inventors. Many protocols and procedures were designed to realize infrastructures such as PKIs to bring these difficult concepts to life. Still, cryptographic algorithms or security protocols such as IPsec are not enough alone to operate a network that needs to generate services and revenues or to protect its constituency. Access to the network needs to be controlled. Users and devices need to be authorized for a variety of services and functions and often must pay for their usage. This is where the AAA protocols came in. In its simpler form a AAA protocol such as fpref.fm Page xvii Wednesday, August 3, 2005 8:03 PM xviii Preface a base RADIUS protocol only provides authentication-based access control. A few service types are also included in the authorization signaling. RADIUS was later augmented with accounting procedures. Diameter as a newer protocol was only standardized less than 2 years ago. Both RADIUS and Diameter are still evolving at the time of writing. This evolution is to enable AAA mechanisms and protocols to provide powerful functions to manage many complicated tasks ranging from what is described above to managing resources and mobility functions based on a variety of policies. In the near future the networks need to allow the user through a variety of interfaces, devices and technologies to gain access to the network. The user will require to be mobile and yet connected. The provision of the connection may at times have to be aided by third parties. The interaction between AAA and security proce- dures with entities providing mobility and roaming capabilities is a very complicated one and is still not completely understood. Despite this complexity, there seem to be very few books on the market that discuss more than a single topic (either security, or mobility or wireless technology). The topic of AAA is largely untouched. Very little text in the way of published literature is available on AAA protocols, let alone describing the interaction of these protocols with security, mobility and key management protocols. The idea for writing this book started from an innocent joke by the IETF operation and management area director during an IETF lunch break a few years ago. When we asked about the relations between the use of EAP for authentication and Mobile IP-AAA signaling, the answer was “Maybe you should write a book about the subject”. Even though this was considered a joke at a time, as we started to work on deploying AAA infrastructure for Mobile IP and EAP support, the need for easy-to-understand overview material was felt so strongly that the joke now sounded like black humor. We had to write a book on AAA as a community service! The book is geared towards people who have a basic understanding of Internet Protocol (IP) and TCP/IP stack layering concepts. Except for the above, most of the other IP-related concepts are explained in the text. Thus, the book is suitable for managers, engineers, researchers and students who are interested in the topic of network security and AAA but do not possess in-depth IP routing and security knowledge. We aimed at providing an overview of IP mobility (Mobile IP) and security (IPsec) to help the reader who is not familiar with these concepts so that the rest of the material in the book can be understood. However, the reader may feel that the material quickly jumps from a simple overview of Mobile IP or IPsec to sophisticated topics such as bootstrapping for IP mobility or key exchange for IP security. Our reasoning here was that we felt that there are a number of excellently written books on the topics of Mobile IP and IPsec, to which the reader may refer, so it would not be fair to fill this book with redundant information. Instead, the book provides just enough material on those topics to quickly guide the reader into the topics that are more relevant to the rest of the material in this book. The book may also serve as a reference or introduction depending on the reader’s need and background, but it is not intended as a complete implementation reference book. The tables listing the protocol attributes are intentionally not exhaustive to avoid distractions. Most of the time, only subsets that pertain to the discussions within the related text are provided to enable the reader to understand the principles behind the design of these attributes. At the same time, references to full standards specifications are provided for readers interested in implementation of the complete feature sets. Chapter 1 of this book provides an overview of what AAA is and stands for. It provides thorough descriptions of both authorization and accounting mechanisms. Unfortunately the field and standardization on authorization mechanisms is in the infancy stage at this point and fpref.fm Page xviii Wednesday, August 3, 2005 8:03 PM Preface xix accounting, compared to authentication, has received far less attention in the research and standards community due to its operator-specific nature. Due to the enormous amount of research done on authentication, we devote Chapter 2 entirely to authentication concepts and mechanisms and also provide a rather unique classification (from IAB) of authentication mechanisms in that chapter. We will come back to the topic of authentication and describe more sophisticated EAP-based authentications in Chapter 10, but after Chapter 2, we go through the concepts of key management in Chapter 3 to lay the groundwork for most of the security and key management discussions in Chapter 4 and the rest of the book. Chapter 4 discusses IPsec and TLS briefly, but provides a thorough discussion on IKE as an important example of a key management and security association negotiation protocol. As mentioned earlier, the aim of that chapter is not to describe IPsec or TLS thoroughly. Both these proto- cols are provided for completeness and to provide the background for the later discussion of security topics. Chapter 5 discusses mobility protocols for IP networks. It describes basic Mobile IP procedures and quickly goes through the latest complementary work in IETF, such as bootstrapping. This chapter also describes two IETF seamless mobility protocols, context transfer and candidate access router discovery, which may be required to achieve seamless handovers. This chapter also describes the security procedures for Mobile IPv4 and lays the groundwork for Mobile IP-AAA discussions in Chapter 8. Chapters 6 and 7 describe the two most important AAA protocols, namely RADIUS and Diameter and their applications for authentication and accounting. Many of the specifications that are considered work in progress in IETF are covered here. Chapter 8 finally covers the topic discussed in the IETF joke we mentioned earlier: Mobile IP-AAA signaling to provide authentication and key management for Mobile IP signaling. Chapter 9 goes on to provide a description of public key infrastructures (PKI) and the issues and concerns with management of PKIs, certificates and their revocation. Chapter 10 describes the EAP authentication framework, EAP signaling transport and the structure for a generic EAP-XXX mechanism. It also provides overviews of a variety of EAP authentication methods, such as EAP-TLS, EAP-TTLS, EAP-SIM, and so on. Finally, Chapter 11 makes a humble attempt at describing the overall problem of AAA and identity management in a multi-operator environment and discusses various architectural models to tackle the problem. This chapter also provides an overview of the Liberty Alliance. We wish the readers a joyful read.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值