关于文本生成的数据集记录

摘要数据集

cnn/dailymail

Gigaword
Gigaword corpus [Graff and Cieri, 2003] preprocessed identically to [Rush et al., 2015], which leads to around 3.8M training samples, 190K validation samples and 1951 test samples for evaluation. The input summary pairs consist of the head- line and the first sentence of the source articles.

中文摘要数据集
a large corpus of Chinese short text summarization (LCSTS) dataset [Hu et al., 2015] collected and constructed from the Chinese microblogging website Sina Weibo.

散文生成数据集

数据集和代码地址
论文:Topic-to-Essay Generation with Neural Networks
数据集介绍:
In order to guarantee the quality of the crawled text, we only crawl the compositions which contain some reviews and scores. The process of the data collection is summarized as follows: a) We crawl 228,110 a

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值