Adversarial Multi-Criteria Learning for Chinese Word Segmentation
paper
code
会议:ACL2017
作者:Xinchi Chen, Zhan Shi, Xipeng Qiu∗, Xuanjing Huang
机构:Shanghai Key Laboratory of Intelligent Information Processing, Fudan University
School of Computer Science, Fudan University
825 Zhangheng Road, Shanghai, China
主要工作(解决了什么):
主要方法(使用了什么):
数据集:
取得的成果:
缺陷和不足:
下一步工作:
摘要
Most existing methods focus on improve the performance for each single criterion.However, it is interesting to exploit these different criteria and mining their common underlying knowledge. In this paper, we propose adversarial multi-criteria learning for CWS by integrating shared knowledge from multiple heterogeneous segmentation criteria. Experiments on eight corpora with heterogeneous segmentation criteria show that the performance of each corpus obtains a significant improvement, compared to single-criterion learning.
1 介绍
Currently, the state-ofthe-art methods are based on statistical supervised learning algorithms, and rely on a large-scale annotated corpus whose cost is extremely expensive.Although there have been great achievements in building CWS corpora, they are somewhat incompatible due to different segmentation criteria. As shown in Table 1, given a sentence “YaoMing reaches the final”, the two commonly used corpora, PKU’s People’s Daily (PKU) (Yu et al.(2001)Yu, Lu, Zhu, Duan, Kang, Sun, Wang, Zhao, and Zhan) and Penn Chinese Treebank (CTB) (Fei(2000)), use different segmentation criteria. In a sense, it is a waste of resources if we fail to fully exploit these corpora.
Recently, some efforts have been made to exploit heterogeneous annotation data for Chinese word segmentation or part-of-speech tagging. These methods adopted stacking or multi-task architectures and showed that heterogeneous corpora can help each other. However