中文句結構樹資料庫

http://rocling.iis.sinica.edu.tw/CKIP/treebank.htm

 

 

  中文句結構樹資料庫從86年起由中央研究院詞庫小組(CKIP)從中央研究院現代漢語平衡語料庫(Sinica Corpus)中,抽取句子,以訊息為本格位語法(Information - based Case Grammar, ICG)的表達模式為基本架構,經由電腦自動剖析成結構樹,再加以人工修正、檢驗後的所得的成果。中文句結構樹資料庫研究,目前發展至3.0版,包含了6個檔案,61,087個中文樹圖,361,834個詞;此「中文句結構樹資料庫」目前開放網上檢索及資料移轉,以供學者專家在中文句法、語意關係研究參考之用。另有1000個句結構樹開放下載。

  中文句結構樹資料庫(Sinica Treebank)建構的主要目的是提供中文自然語言處理研究一個具有句結構標記的語料作為研究素材,我們可以從這個中文句結構樹資料庫中抽取語法知識,也藉由語法知識的抽取與瞭解使剖析系統功能更趨完善。

  中文句子的語法結構表達採取中心語主導原則 ( Head-Driven Principle )。剖析中文句子時,詞組類型由中心語決定,並且參照中心語和其他成分所記載的語法和語意訊息,表達出句子中詞和詞之間的語法結構和語意角色關係。同時我們提出三項輔助原則:詞類小而美原則、由左至右聯併原則、扁平原則。中文句結構樹的表達原則與輔助原則細節、符號說明、語意角色、詞組結構等,請參見陳鳳儀、蔡碧芳、陳克健、黃居仁《中文句結構樹資料庫 (Sinica Treebank)的構建》。

  
研究成果
 
  
论文发表
 

陳克健、黃居仁. 1989. “訊息為本的格位語法– 一個適用於表達中文的語法模式” Proceedings of ROCLING II, pp97-119.

Chen Keh-Jiann. 1992. “Design Concepts for Chinese Parsers.” 3rd International Conference on Chinese Information Processing, pp.1-22.

林甫雯. 1992. ICG中的論旨角色. CKIP-92-01中文詞知識庫.

中文詞知識庫小組. 1993. 中文詞類分析. CKIP-93-05中文詞知識庫.

Chen Keh-Jiann, Chu-Ren Huang. 1994. “Features Constraints in Chinese Language Parsing.” Proceedings of ICCPOL '94, pp. 223-228.

Chen Keh-Jiann. 1996. “A Model for Robust Chinese Parser.” Computational Linguistics and Chinese Language Processing, Vol. 1, No. 1. pp.183-204.

Chen Keh-Jiann, Chu-Ren Huang, Li-Ping Chang, Hui-Li Hsu. 1996. “Sinica Corpus: Design Methodology for Balanced Corpra.” Proceedings of the 11th Pacific Asia Conference on Language, Information, and Computation (PACLIC II), SeoulKorea, pp.167-176.

Chen Keh-Jiann, et al. 1999. “The CKIP Chinese Treebank: Guidelines for Annotation.” ATALA Workshop – Treebanks, Paris, June 18-19 1999 , pp85-96.

陳鳳儀、蔡碧芳、陳克健、黃居仁. 1999. 中文句結構樹資料庫 (Sinica Treebank)的構建. Computational Linguistics and Chinese Language Processing, Vol. 4, No. 2. pp.87-104.

Huang Chu-Ren, Keh-Jiann Chen, Feng-Yi Chen, Keh-Jiann Chen, Zhao-Ming Gao and Kuang-Yu Chen. 2000. Sinica Treebank: Design Criteria, Annotation Guidelines, and On-line Interface. Proceedings of 2nd Chinese Language Processing Workshop (Held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, ACL-2000). 29-37. October 7, 2000, Hong Kong.

Chen Keh-Jiann, Chu-Ren Huang, Feng-Yi Chen, Chi-Ching Luo,Ming-Chung Chang, Chao-Jan Chen, and Zhao-Ming Gao, 2003, "Sinica Treebank: Design Criteria, Representational Issues and Implementation". In Anne Abeille (Ed.) Treebanks Building and Using Parsed Corpora. Language and Speech series. Dordrecht:Kluwer, pp231-248.

Chen Keh-Jiann, Yu-Ming Hsieh, 2004, "Chinese Treebanks and Grammar Extraction", Proceedings of IJCNLP-04, pp560-565.

Li Shih-Min, Su-Chu Lin, Keh-Jiann Chen. 2004. “Feature Representations and Logical Compatibility between Temporal Adverbs and Aspects”. 5th Chinese Lexical Semantics Workshop (CLSW-5). Singapore (14-16 June, 2004) & Genting Highland, Malaysia (17-19 June, 2004).

Lin Su-Chu, Shu-Ling Huang, Keh-Jiann Chen. 2004. “Taxonomy of Fine-grain Semantic Roles for Nominal Modifiers”. 5th Chinese Lexical Semantics Workshop (CLSW-5). Singapore (14-16 June, 2004) & Genting Highland, Malaysia (17-19 June, 2004).

You Jia-Ming, Keh-Jiann Chen, 2004 "Automatic Semantic Role Assignment for a Tree Structure", Proceedings of SIGHAN workshop.

Shih-Min Li, Su-Chu Lin, Keh-Jiann Chen, 2005, "A Probe into Ambiguities of Determinative-Measure Compounds", The 17th ROCLING Conference on Computational Linguistics and Speech Processing, september 15-16, 2005, national cheng hung university, tainan, taiwan, ROC.

Li Shih-Min, Su-Chu Lin and Keh-Jiann Chen, 2005. "Feature Representations and Logical Compatibility between Temporal Adverbs and Aspects", International Journal of Computational Linguistics & Chinese Language Processing, Vol. 10, No. 4. pp.445-457.

Li, Shih-Min, Su-Chu Lin, Chia-Hung Tai and Keh-Jiann Chen, 2006. "A Probe into Ambiguities of Determinative-Measure Compounds", International Journal of Computational Linguistics & Chinese Language Processing, Vol. 11, No. 3. pp.245-280.

  
参与人员
 林素朱
  
曾经参与人员
 

魏文真、駱季青、邱智銘、張秀娟、蔡碧芳、李明懿、王世煜、蔡宜妮、蔡佩庭、邱珮玲、李詩敏

  
联络人
 林素朱 ( jess at hp.iis.sinica.edu.tw )
  
请选择本实验室其他的研究主题
 中文剖析 中文斷詞系統 現代漢語平衡語料庫 概念網

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值