关于文本生成的数据集记录

最新推荐文章于 2024-05-12 09:52:53 发布

仲夏199603

最新推荐文章于 2024-05-12 09:52:53 发布

阅读量6.3k

点赞数 1

分类专栏：自然语言处理

本文链接：https://blog.csdn.net/qq_32458499/article/details/81084980

版权

摘要数据集

cnn/dailymail

Gigaword
Gigaword corpus [Graff and Cieri, 2003] preprocessed identically to [Rush et al., 2015], which leads to around 3.8M training samples, 190K validation samples and 1951 test samples for evaluation. The input summary pairs consist of the head- line and the first sentence of the source articles.

中文摘要数据集
a large corpus of Chinese short text summarization (LCSTS) dataset [Hu et al., 2015] collected and constructed from the Chinese microblogging website Sina Weibo.

散文生成数据集

数据集和代码地址
论文：Topic-to-Essay Generation with Neural Networks
数据集介绍：
In order to guarantee the quality of the crawled text, we only crawl the compositions which contain some reviews and scores. The process of the data collection is summarized as follows: a) We crawl 228,110 a

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

仲夏199603

关注关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
1
评论
关于文本生成的数据集记录

摘要数据集cnn/dailymailGigaword Gigaword corpus [Graff and Cieri, 2003] preprocessed identically to [Rush et al., 2015], which leads to around 3.8M training samples, 190K validation samples and 1951 t...
复制链接

扫一扫