教程:基于同义转换的目的预处理sentence-compression数据集

教程:基于同义转换的目的预处理sentence-compression数据集
摘要由CSDN通过智能技术生成

下载数据集

数据集地址: sentence-compression.tsv.gz
数据集论文地址: Overcoming the Lack of Parallel Data in Sentence Compression
数据集内容:10个训练数据集,每个包括20000组数据;1个测试数据集,包括10000组数据。下面是一组数据的实例。我们只取“sentence”“headline”作为同义转换的训练数据对。

{
   
  "graph": {
   
    "id": "0",
    "sentence": "Five people have been taken to hospital with minor injuries following a crash on the A17 near Sleaford this morning.",
    "node": [ {
   
      "form": "ROOT",
      "word": [ {
   
        "id": -1,
        "form": "ROOT",
        "stem": "ROOT",
        "tag": "ROOT"
      } ],
      "gender": 0,
      "head_word_index": 0
    }, {
   
      "form": "Five people",
      "word": [ {
   
        "id": 13,
        "form": "Five",
        "stem": "five",
        "tag": "CD"
      }, {
   
        "id": 14,
        "form": "people",
        "stem": "person",
        "tag": "NNS"
      } ],
      "gender": 0,
      "head_word_index": 1
    }, {
   
      "form": "have been taken",
      "word": [ {
   
        "id": 15,
        "form": "have",
        "stem": "have",
        "tag": "VBP"
      }, {
   
        "id": 16,
        "form": "been",
        "stem": "be",
        "tag": "VBN"
      }, {
   
        "id": 17,
        "form": "taken",
        "stem": "take",
        "tag": "VBN"
      } ],
      "gender": 0,
      "head_word_index": 2
    }, {
   
      "form": "to hospital",
      "word": [ {
   
        "id": 18,
        "form": "to",
        "stem": "to",
        "tag": "IN"
      }, {
   
        "id": 19,
        "form": "hospital",
        "stem": "hospital",
        "tag": "NN"
      } ],
      "gender": 0,
      "head_word_index": 1
    }, {
   
      "form": "minor",
      "word": [ {
   
        "id": 21,
        "form": "minor",
        "stem": "minor",
        "tag": "JJ"
      } ],
      "gender": 0,
      "head_word_index": 0
    }, {
   
      "form": "with injuries",
      "word": [ {
   
        "id": 20,
        "form": "with",
        "stem": "with",
        "tag": "IN"
      }, {
   
        "id": 22,
        "form": "injuries",
        "stem": "injury",
        "tag": "NNS"
      } ],
      "gender": 0,
      "head_word_index": 1
    }, {
   
      "form": "following a crash",
      "word": [ {
   
        "id": 23,
        "form": "following",
        "stem": "follow",
        "tag": "VBG"
      }, {
   
        "id": 24,
        "form": "a",
        "stem": "a",
        "tag": "DT"
      }, {
   
        "id": 25,
        "form": "crash",
        "stem": "crash",
        "tag": "NN"
      } ],
      "gender": 0,
      "head_word_index": 2
    }, {
   
      "form": "on the A17",
      "type": "LOC",
      "mid": "/m/08tthd",
      "word": [ {
   
        "id": 26,
        "form": "on",
        "stem": "on",
        "tag": "IN"
      }, {
   
        "id": 27,
        "form": "the",
        "stem": "the",
        "tag": "DT"
      }, {
   
        "id": 28,
        "form": "A17",
        "stem": "A17",
        "tag": "NNP"
      } ],
      "gender": 0,
      "head_word_index": 2
    }, {
   
      "form": "near Sleaford",
      "type": "LOC",
      "mid": "/m/01cfbw",
      "word": [ {
   
        "id": 29,
        "form": "near",
        "stem": "near",
        "tag": "IN"
      }, {
   
        "id
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值