依存句法CoNLL-U 格式(CoNLL2014)

因为dependency-Based Word Embeddings需要用到CoNLL格式的句法,这里:


https://levyomer.wordpress.com/2014/04/25/dependency-based-word-embeddings/


以下摘自

http://universaldependencies.github.io/docs/format.html


Sentences consist of one or more word lines, and word lines contain the following fields:

  1. ID: Word index, integer starting at 1 for each new sentence; may be a range for tokens with multiple words.
  2. FORM: Word form or punctuation symbol.
  3. LEMMA: Lemma or stem of word form.
  4. CPOSTAG: Universal part-of-speech tag drawn from our revised version of the Google universal POS tags.
  5. POSTAG: Language-specific part-of-speech tag; underscore if not available.
  6. FEATS: List of morphological features from the universal feature inventory or from a defined language-specific extension; underscore if not available.
  7. HEAD: Head of the current token, which is either a value of ID or zero (0).
  8. DEPREL: Universal Stanford dependency relation to the HEAD (root iff HEAD = 0) or a defined language-specific subtype of one.
  9. DEPS: List of secondary dependencies (head-deprel pairs).
  10. MISC: Any other annotation.



IDFORMLEMMACPOSTAGPOSTAGFEATSHEADDEPRELDEPSMISC
1TheytheyPRONPRNCase=Nom|Number=Plur2nsubj4nsubj
2buybuyVERBVBPNumber=Plur|Person=3|Tense=Pres0root__
3andandCONJCC_2cc__
4sellsellVERBVBPNumber=Plur|Person=3|Tense=Pres2conj0root
5booksbookNOUNNNSNumber=Plur2dobj4dobj
6..PUNCT._2punct__

以下摘自

http://hanlp.linrunsoft.com/doc/_build/html/dependency_parser.html


CONLL标注格式包含10列,分别为:
———————————————————————————
ID   FORM    LEMMA   CPOSTAG POSTAG  FEATS   HEAD    DEPREL  PHEAD   PDEPREL
———————————————————————————


只用到前8列,其含义分别为:

1    ID      当前词在句子中的序号,1开始.
2    FORM    当前词语或标点
3    LEMMA   当前词语(或标点)的原型或词干,在中文中,此列与FORM相同
4    CPOSTAG 当前词语的词性(粗粒度)
5    POSTAG  当前词语的词性(细粒度)
6    FEATS   句法特征,在本次评测中,此列未被使用,全部以下划线代替。
7    HEAD    当前词语的中心词
8    DEPREL  当前词语与中心词的依存关系

在CONLL格式中,每个词语占一行,无值列用下划线'_'代替,列的分隔符为制表符'\t',行的分隔符为换行符'\n';句子与句子之间用空行分隔。



  • 6
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值