D2T数据集 —Dialogue (饭店+旅馆restaurants and hotels)

Dialogue 数据集生成

two domains that talk about restaurants and hotels respectively.

为了形成每个领域的训练语料库,从一个统计对话管理器的先前用户试验(Gasiˇ c等人,“2015”)中收集的对话被随机抽样并展示给通过Amazon Mechanical Turk(AMT)服务招募的工人。 工人们被逐一展示每个对话,并被要求用自然英语输入与每个系统DA对应的适当的系统反应。对于每个领域,我们从大约1,000个随机抽样的对话中收集了大约5,000个系统话语。每个分类值都被一个代表其槽的标记所取代,在一个DA中多次出现的槽被合并为一个。在对每个语料根据其去词汇化的DA进行处理和分组后,我们在餐厅中得到248个不同的DAs和酒店领域的164个。每个领域的每个DA的平均槽数分别为2.25和1.95。 该系统使用Theano库(Bergstra等人,2010;Bastien等人,2012)实现,并通过将每个收集到的语料库按3:1:1的比例划分为训练、验证和测试集进行训练。

数据集长成什么样?
{0: {‘src’: “inform(name=none,area=citycentre,near=‘X’)”, ‘sys_summ’: ‘There is sorry no information matching constraints near X .’, ‘scores’: {‘informativeness’: 6.0, ‘naturalness’: 4.0, ‘quality’: 5.0}, ‘ref_summs’: [‘I am sorry but there are no venues near X in the city centre .’, ‘I am sorry but there are no venues near X in the city centre .’, ‘I am sorry but there are no venues near X in the city centre .’, ‘I am sorry but there are no venues near X in the city centre .’, ‘I am sorry but there are no venues near X in the city centre .’, ‘I am sorry but there are no venues near X in the city centre .’, ‘There are no places you are looking for near X in the centre of town .’, ‘There are no places you are looking for near X in the centre of town .’, ‘There are no places you are looking for near X in the centre of town .’, ‘There are no places you are looking for near X in the centre of town .’, ‘There are no places you are looking for near X in the centre of town .’, ‘There are no places you are looking for near X in the centre of town .’, ‘I am sorry but there are no venues near X in the city centre .’, ‘I am sorry but there are no venues near X in the city centre .’, ‘I am sorry but there are no venues near X in the city centre .’, ‘There are no places you are looking for near X in the centre of town .’, ‘There are no places you are looking for near X in the centre of town .’, ‘There are no places you are looking for near X in the centre of town .’]},1:{}}

每个source对应了18个reference。这是hotel或者饭店domain中的一种)

评论提供了有关酒店的大量信息。这个数据可用于许多nlp项目:推荐系统,情绪分析,同类酒店的图网,基于评论的酒店细分。该数据集包含25个城市的酒店列表和评论。 file/opensearch/documents/92885/hotelReviewsInAustin__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInBali__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInBangkok__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInBarcelona__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInBombay__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInChicago__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInDubai__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInHong Kong__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInIstanbul__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInLondon__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInMiami__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInMilan__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInNew York__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInOsaka__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInParis__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInPhuket__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInPrague__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInRome__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInSan Francisco__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInSantorini__en2019100120191005.csv file/opense
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

YingJingh

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值