RBM-An approach for text summarization using deep learning algorithm

Padmapriya G, Duraiswamy K. AN APPROACH FOR TEXT SUMMARIZATION USING DEEP LEARNING ALGORITHM[J]. Journal of Computer Science, 2014, 10(1):1-9.

Abstract

RBM被广泛应用,限制玻尔兹曼机
对三种不同知识领域的文档进行了实验
基于RBM

Introduction

  • Developed a multi-document summarization system using deep learning algorithm Restricted Boltzmann Machine (RBM).
  • Solving the ranking problem by finding out the intersection between
    the user query and a particular sentence
  • Sentences are selected on the basis of compression rate entered by the user.

Motivation

信息爆炸,从大量信息中找到我们需要的信息很有必要,做摘要是快速获取信息的一个重要途径

Model

-Restricted Boltzman Machine
Restricted Boltzmann Machine is a stochastic neural network (that is a network of neurons where each neuron has some random behavior when activated).
这是一个随机的网络,二分图——这意味着信息在训练期间和网络使用期间都在两个方向流动,并且这两个方向的权重是相同的
这里写图片描述

Term weight

见A survey of document summarition

Concept feature

这里写图片描述
where, P(wi, wj)-joint probability that both keyword
appeared together in a text window.
P(wi)-probability that a keyword wi appears in a text
window and can be computed by:
这里写图片描述
Where:
swi = The number of windows containing the keyword
wi
|sw| = Total number of windows constructed from a text document
The sentence matrix generate by above steps is:
这里写图片描述
Here sentence matrix S = (s1, s2,……..sn) where si = (f1, f2,……..f4), i<= n is the feature vector.

Deep Learning Algorithm

  • Restricted Boltzmann machine contains two hidden layers and for them two set of bias value is selected namely H0H1:
  • 这里写图片描述
    These set of bias values are values which are randomly selected
    这里写图片描述
    这里写图片描述
    这里写图片描述
    这里写图片描述

Optimal Feature Vector Set Generation

  • Fine tune the obtained feature vector set by adjusting the weight of the units of the RBM
  • To fine tune the feature vector set optimally we use back propagation algorithm
  • Uses cross-entropy error 交叉熵
    For example term weight feature of the sentence will be reconstruct by using following formula
    这里写图片描述

Sentence Score

这里写图片描述
Where:
Sc = Sentence score of a sentence
S = Sentence
Q = User query
Wc = Total word count of a text

Ranking of Sentence

To find out number of top sentences to select from the matrix we use following formula based on the compression rate.
这里写图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值