中文分词文献列表 Bibliography of Chinese Word Segmentation

中文分词文献列表 Bibliography of Chinese Word Segmentation

中文分词文献列表 Bibliography of Chinese Word Segmentation
张开旭维护( 中文分词实验环境
如有意见与建议,欢迎联系作者:)
页面生成日期: 2012年09月28日, 由 bibpage工具自动生成自bib格式文献列表。
2012

Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection
Sun, Xu and Wang, Houfeng and Li, Wenjie
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics {(Volume} 1: Long Papers)
Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations
Sun, Weiwei and Wan, Xiaojun
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics {(Volume} 1: Long Papers)
Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
Sun, Weiwei and Uszkoreit, Hans
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics {(Volume} 1: Long Papers)
Joint Chinese Word Segmentation, {POS} Tagging and Parsing
Qian, Xian and Liu, Yang
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Unified Dependency Parsing of Chinese Morphological and Syntactic Structures
Li, Zhongguo and Zhou, Guodong
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Iterative Annotation Transformation with Predict-Self Reestimation for Chinese Word Segmentation
Jiang, Wenbin and Meng, Fandong and Liu, Qun and Lü, Yajuan
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Incremental Joint Approach to Word Segmentation, {POS} Tagging, and Dependency Parsing in Chinese
Hatori, Jun and Matsuzaki, Takuya and Miyao, Yusuke and Tsujii, Jun'ichi
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics {(Volume} 1: Long Papers)
2011

Improving Chinese Word Segmentation and {POS} Tagging with Semi-supervised Methods Using Large Auto-Analyzed Data
Wang, Yiou and Kazama, Jun'ichi and Tsuruoka, Yoshimasa and Chen, Wenliang and Zhang, Yujie and Torisawa, Kentaro
Proceedings of 5th International Joint Conference on Natural Language Processing
Enhancing Chinese Word Segmentation Using Unlabeled Data
Sun, Weiwei and Xu, Jia
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing
feature engineering,使用in-domain的未标注数据帮助中文分词。 {增加的特征有:互信息;Accessor} Variety;基于标点符号的特征;篇章级的特征。 另外一个结论是使用实数值作为特征值不如用binary的。
A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
Sun, Weiwei
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
使用stacked learning这种meta-learning algorithm,有机制避免两层在训练时使用重叠的训练数据,但也能最大限度利用数据。 第一层使用了三个模型,基于词的,基于字序列标注的,基于单字分类的。
Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation
Li, Zhongguo
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
将词法分析与句法分析结合。在同一棵树下使用不同的“成分”标签。 使用句法分析的算法解码。
Syntactic Processing using the Generalized Perceptron and Beam Search
Zhang, Y. and Clark, S.
Computational Linguistics
之前工作的总结。 将平均感知器,应用于汉语的词法分析、句法分析。 使用beam search。
A New Unsupervised Approach to Word Segmentation
Wang, H. and Zhu, J. and Tang, S. and Fan, X.
Computational Linguistics
2010

A Fast Decoder for Joint Word Segmentation and {POS-Tagging} Using a Single Discriminative Model
Zhang, Yue and Clark, Stephen
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
解码速度从每秒2.24句,提高到每秒24.94就
Joint Tokenization and Translation
Xiao, Xinyan and Liu, Yang and Hwang, {YoungSook} and Liu, Qun and Lin, Shouxun
Proceedings of the 23rd International Conference on Computational Linguistics {(Coling} 2010)
A Character-Based Joint Model for Chinese Word Segmentation
Wang, Kun and Zong, Chengqing and Su, Keh-Yih
Proceedings of the 23rd International Conference on Computational Linguistics {(Coling} 2010)
整合一个产生式模型和判别式模型 另外发现将某些binary特征值的权重改一下,可以提高效果。
A Local Generative Model for Chinese Word Segmentation
Zhang, K. and Sun, M. and Xue, P.
Information Retrieval Technology
提出一种用局部的语言模型做分词的方法。 {提出一种构造切分二叉树的方法,处理分词粒度问题,该方法也可直接利用CRF的输出构造二叉树。}
Joint training and decoding using virtual nodes for cascaded segmentation and tagging tasks
Qian, X. and Zhang, Q. and Zhou, Y. and Huang, X. and Wu, L.
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
2009

Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and {POS} Tagging – A Case Study
Jiang, Wenbin and Huang, Liang and Liu, Qun
Proceedings of the 47th {ACL}
Perceptron,分词与词性标注结合。将一种标注体系下的参数,转移到另一种标注体系中使用。
Character-Level Dependencies in Chinese: Usefulness and Learning
Zhao, Hai
Proceedings of the 12th Conference of the European Chapter of the {ACL} {(EACL} 2009)
用字的依存树做分词。 最后系统,词内是词法字依存关系,词之间是线性依存关系。 当然最终效果没有现有最优系统好。
基于字依存树的中文词法-句法一体化分析
赵, 海 and 揭, 春雨 and 宋, 彦
中国计算机语言学研究前沿进展 (2007-2009)
基于 {CRFs} 的中文分词和短文本分类技术
滕, 少华
{就分词来说,用Chi方做特征选择,一半的特征仍然可以保持性能。} 个别字(如“的”,“和”,“了”)的有无对整句切分的正确性有帮助与干扰。 {使用CRF的置信度输出,低置信度产生高错误率。} 基于规则的、基于篇章上下文统计的低置信度后处理过程。
A Simple and Efficient Model Pruning Method for Conditional Random Fields
Zhao, H. and Kit, C.
{CRF训练后,按参数值去掉大部分特征,性能都不会下降,用事实证明CRF有太多冗余。}
Chinese text segmentation: A hybrid approach using transductive learning and statistical association measures
Tsai, R. T. H.
Expert Systems with Applications
{多种加入各种特征提高CRF性能的方法。}
Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling
Mochihashi, Daichi and Yamada, Takeshi and Ueda, Naonori
Proceedings of the Joint Conference of the 47th Annual Meeting of the {ACL} and the 4th International Joint Conference on Natural Language Processing of the {AFNLP}
{用Pitman-Yor,建立了两层语言模型,一个是词的,一个是} 句子的。
Punctuation as Implicit Annotations for Chinese Word Segmentation
Li, Zhongguo and Sun, Maosong
Computational Linguistics
An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and {POS} Tagging
Kruengkrai, Canasai and Uchimoto, Kiyotaka and Kazama, Jun'ichi and Wang, Yiou and Torisawa, Kentaro and Isahara, Hitoshi
Proc. of {ACL-IJCNLP} 2009
词典词与生词分别对待
2008

Word Lattice Reranking for Chinese Word Segmentation and Part-of-Speech Tagging
Jiang, Wenbin and Mi, Haitao and Liu, Qun
Proceedings of the 22nd International Conference on Computational Linguistics {(Coling} 2008)
使用reranking。有别于top-n的reranking,使用指数规模的word lattice reranking。至少看oracle,后者比前者就好。 解决的问题有:如何构造lattice,如何算oracle,有哪些特征,以及reranking的时候的cube剪枝。
Joint Word Segmentation and {POS} Tagging Using a Single Perceptron
Zhang, Yue and Clark, Stephen
Proceedings of {ACL-08:} {HLT}
A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
Jiang, Wenbin and Huang, Liang and Liu, Qun and Lü, Yajuan
Proceedings of {ACL-08:} {HLT}
Unsupervised segmentation helps supervised learning of character tagging for word segmentation and named entity recognition
Zhao, Hai and Kit, Chunyu
The Sixth {SIGHAN} Workshop on Chinese Language Processing
将accessor variety {(AV)的结果离散化,然后分散到字,给为CRF的输入,可以提高分词效果。}
An Empirical Comparison of Goodness Measures for Unsupervised Chinese Word Segmentation with a Unified Framework
Zhao, Hai and Kit, Chunyu
The Third International Joint Conference on Natural Language Processing {(IJCNLP-2008)}, Hyderabad, India
{描述了四种用于无监督中文分词的判别量:Frequency} of Substring with {ReductionDescription} Length Gain {(DLG)Accessor} Variety {(AV)Boundary} Entropy {(Branching} Entropy, {BE)}
Bayesian semi-supervised chinese word segmentation for statistical machine translation
Xu, J. and Gao, J. and Toutanova, K. and Ney, H.
Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1
Statistical Properties of Overlapping Ambiguities in Chinese Word Segmentation and a Strategy for Their Disambiguation
Qiao, W. and Sun, M. and Menzel, W.
Text, Speech and Dialogue
Information retrieval oriented word segmentation based on character associative strength ranking
Liu, Y. and Wang, B. and Ding, F. and Xu, S.
Proceedings of the Conference on Empirical Methods in Natural Language Processing
{用了RankingSVM的方法分词,用于IR}
2007

Chinese Segmentation with a Word-Based Perceptron Algorithm
Zhang, Yue and Clark, Stephen
采用average perceptron,然后用一种lazy update的方法。 采用了基于词的特征,所以解码使用柱搜索,而不能用贪心或者动态规划。
基于有效子串标注的中文分词
赵, 海 and 揭, 春雨
中文信息学报
中文分词十年回顾
黄, 昌宁 and 赵, 海
中文信息学报
中文词的认同度。从863、973到sig {han评测。语料库的质量控制(包括对“心理词”的规则制定)。基于语法的、基于规则的不如基于词的,又被基于字的取代。大规模真实文本中未登录词造成的分词精度失落比歧义切分造成的精度失落至少大5倍以上。基于字的,最大熵,SVM,CRF等。词位转移,2标注,4标注,微软的6标注。5字窗口足够了。}
A dual-layer {CRFs} based joint decoding method for cascaded segmentation and labeling tasks
Shi, Y. and Wang, M.
Proceedings of {IJCAI}
{双层CRF做分词与词性标注,中规中矩。} 第一层基于字信息分词;第二层基于词,以及字信息标注词性。 {两层CRF分开训练,联合测试。第一层找N-best,再综合第一层第二层的结果重新排序。}
A hybrid approach to word segmentation and pos tagging
Nakagawa, Tetsuji and Uchimoto, Kiyotaka
{ANNUAL} {MEETING-ASSOCIATION} {FOR} {COMPUTATIONAL} {LINGUISTICS}
{字与词结合的Lattice,然后分词与标注结合。仍然用马尔可夫模型}
Rethinking Chinese word segmentation: tokenization, character classification, or wordbreak identification
Huang, Chu-Ren and Simon, Petr and Hsieh, Shu-Kai and Prévot, L.
Proceedings of the 45th Annual Meeting of the {ACL} on Interactive Poster and Demonstration Sessions
不使用字标注,直接关心字间间隔(断开与不断开)。 使用滑动窗口的方法进行判断。
2006

Subword-Based Tagging for Confidence-Dependent Chinese Word Segmentation
Zhang, Ruiqiang and Kikui, Genichiro and Sumita, Eiichiro
Proceedings of the {COLING/ACL} 2006 Main Conference Poster Sessions
subword-based tagging, 比如北京市 标注为 北京/l 市/r 不过还是用的三标注系统 {使用CRF中的置信度,与基于词典的方法融合} {CRF倾向于较高的OOV的F1,而较低的IV的F1}
汉语词典的快速查询算法研究
李, 江波 and 周, 强 and 陈, 祖舜
中文信息学报
{双数组Trie数是相当高效的词典查询算法,适合中文分词。简单说是逐字哈希,而哈希函数是平凡的f(x)=x,而且不会有冲突。所以很快。但维护双数组也很难。}}
An improved Chinese word segmentation system with conditional random field
Zhao, H. and Huang, C. N. and Li, M.
Proceedings of the Fifth {SIGHAN} Workshop on Chinese Language Processing
6-tag settone featureassistant segmenters 
Discriminative pruning of language models for Chinese word segmentation
Li, J. and Wang, H. and Ren, D. and Li, G.
Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Contextual Dependencies in Unsupervised Word Segmentation
Goldwater, Sharon and Griffiths, Thomas L. and Johnson, Mark
Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
{基于D过程的语言模型与词法模型两个词两个词的Gibbs采样}
2005

Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach
Gao, Jianfeng and Li, Mu and Huang, Chang-Ning and Wu, Andi
Computational Linguistics
使用perceptron学习线性模型与基于字标注不同,解码前构造word lattice。相当于事先缩小了可能的字标注结果集合的大小。将词分为若干类,每一类会按概率计算一些概率值,作为perceptron的参数。perceptron的参数全是非binary的。只有词类的trigram的概率,不涉及任何具体字。
A conditional random field word segmenter for sighan bakeoff 2005
Tseng, H. and Chang, P. and Andrew, G. and Jurafsky, D. and Manning, C.
Proceedings of the Fourth {SIGHAN} Workshop on Chinese Language Processing
{SIGHAN} bakekoff 2005 中相当好的一个系统 {加了简单的词缀和叠字的feature在CRF里面}
Perceptron Learning for Chinese Word Segmentation
Li, Y. and Miao, C. and Bontcheva, K. and Cunningham, H.
Proceedings of Fourth {SIGHAN} Workshop on Chinese Language processing {(Sighan-05)}
The second international chinese word segmentation bakeoff
Emerson, Thomas
Proceedings of the Fourth {SIGHAN} Workshop on Chinese Language Processing
A Statistic Study of Three-character Unknown Words in Chinese
Duan, {ZWXZH}
Journal of Chinese Language and Computing
2004

Chinese Segmentation and New Word Detection using Conditional Random Fields
Peng, Fuchun and Feng, Fangfang and {McCallum}, Andrew
Proceedings of Coling 2004
{将CRF引入中文分词}
Chinese Part-of-Speech Tagging: One-at-a-Time or All-at-Once? Word-Based or Character-Based?
Ng, Hwee Tou and Low, Jin Kiat
Proceedings of {EMNLP} 2004
用最大熵模型试了三种方法,分开做分词与标注或者同时做,词性标注用基于字的特征或者用基于词的特征: 同时的基于字的最好,但是时间慢很多。 分开基于字的稍差,但快很多。 分开基于词的,分词性能当然与基于字的一样,但词性标注差很多,总时间快一点。词性标注差是因为词之中的字对确定词性很重要。 没有同时而且基于词的,估计是因为机器跑不动。也没有实验在分词阶段用基于词的特征。
基于无指导学习策略的无词表条件下的汉语自动分词
孙, 茂松 and 肖, 明 and 邹, 嘉彦
计算机学报
使用互信息与t测试差当作两个判据以字为单位进行无监督分词。以字算的标注准确度可到85\%左右。
Applying conditional random fields to Japanese morphological analysis
Kudo, T. and Yamamoto, K. and Matsumoto, Y.
Proc. of {EMNLP}
{用改造过的CRF模型做日文分词。以词为单位,即y长度与x不一定相等。}
Adaptive Chinese word segmentation
Gao, J. and Wu, A. and Li, M. and Huang, C. N. and Li, H. and Xia, X. and Qin, H.
Proceedings of {ACL-2004}
Unsupervised segmentation of Chinese corpus using accessor variety
Feng, Haodi and Chen, Kang and Kit, Chunyu and Deng, Xiaotie
Natural Language Processing {IJCNLP} 2004
{如何用Accessor} variety 构造一个分词器。如何设计目标函数。
Accessor variety criteria for Chinese word extraction
Feng, Haodi and Chen, Kang and Deng, Xiaotie and Zheng, Weimin
Computational Linguistics
2003

{HHMM-based} Chinese lexical analyzer {ICTCLAS}
Zhang, H. P. and Yu, H. K. and Xiong, D. Y. and Liu, Q.
Proceedings of the second {SIGHAN} workshop on Chinese language processing-Volume 17
{实用化的分词工具包ICTCLAS的介绍性论文。}
Chinese lexical analysis using hierarchical hidden markov model
Zhang, H. P. and Liu, Q. and Cheng, X. Q. and Zhang, H. and Yu, H. K.
Proceedings of the second {SIGHAN} workshop on Chinese language processing-Volume 17
Chinese Word Segmentation as {LMR} Tagging
Xue, Nianwen and Shen, Libin
Proceedings of the second {SIGHAN} workshop on Chinese language processing-Volume 17
Chinese Word Segmentation as Character Tagging
Xue, Nianwen
Computational Linguistics and Chinese Language Processing
The first international Chinese word segmentation bakeoff
Sproat, R. and Emerson, T.
Proceedings of the second {SIGHAN} workshop on Chinese language processing
A maximum entropy Chinese character-based parser
Luo, X.
Improved source-channel models for Chinese word segmentation
Gao, J. and Li, M. and Huang, C. N.
Proceedings of the 41st Annual Meeting on Association for Computational Linguistics
Chinese word segmentation using minimal linguistic knowledge
Chen, A.
Proceedings of the second {SIGHAN} workshop on Chinese language processing
Combining segmenter and chunker for Chinese word segmentation
Asahara, M. and Goh, C. L. and Wang, X. and Matsumoto, Y.
Proceedings of the 2nd {SIGHAN} Workshop on Chinese Language Processing
2002

Combining classifiers for Chinese word segmentation
Xue, Nianwen and Converse, Susan
Proceedings of the 1st {SIGHAN} Workshop on Chinese Language Processing
里程碑,第一次提出字标注的分词模型
Corpus-based methods in Chinese morphology
Sproat, R. and Shih, C.
Tutorial at the 19th {COLING}
Corpus-based methods in Chinese morphology and phonology
Sproat, R. and Shih, C.
{COOLING} 2002
2001

汉语自动分词研究评述
孙, 茂松 and 邹, 嘉彦
当代语言学
{对上世纪中文分词研究的一个较好的回顾及评论。歧义,交集歧义与覆盖歧义;OOV。}
Defining and automatically identifying words in Chinese
Xue, Nianwen
Self-supervised Chinese word segmentation
Peng, F. and Schuurmans, D.
Advances in Intelligent Data Analysis
{纯无监督分词,EM算法} self-supervised,分两个词典。 {MI词典剪枝}
2000

A compression-based algorithm for Chinese word segmentation
Teahan, W. J. and {McNab}, Rodger and Wen, Yingying and Witten, Ian H.
Comput. Linguist.
1999

Discovering Chinese words from unsegmented text (poster abstract)
Ge, X. and Pratt, W. and Smyth, P.
Proceedings of the 22nd annual international {ACM} {SIGIR} conference on Research and development in information retrieval
{纯无监督分词,EM算法,0阶隐马尔可夫链}
1998

串频统计和词形匹配相结合的汉语自动分词系统
刘, 挺 and 吴, 岩
中文信息学报
Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data
Sun, Maosong and Shen, Dayang and Tsou, Benjamin K
Proceedings of the 17th international conference on Computational linguistics-Volume 2
A hybrid approach to word segmentation
Kazakov, D. and Manandhar, S.
Lecture notes in computer science
1997

中文信息处理中的分词问题
黄, 昌宁
Applied Linguistics
An unsupervised iterative method for Chinese new lexicon extraction
Chang, J. S and Su, K. Y
International Journal of Computational Linguistics \& Chinese Language Processing
1996

A stochastic finite-state word-segmentation algorithm for Chinese
Sproat, R. and Gale, W. and Shih, C. and Chang, N.
Computational Linguistics
Useg: A retargetable word segmentation procedure for information retrieval
Ponte, J. M. and Croft, W. B.
Symposium on Document Analysis and Information Retrieval
1992

An efficient implementation of trie structures
Aoe, {Jun‐Ichi} and Morimoto, Katsushi and Sato, Takashi
Software: Practice and Experience
双数组trie书}
Word identification for Mandarin Chinese sentences
Chen, K. J and Liu, S. H
Proceedings of the 14th conference on Computational linguistics-Volume 1
posted on 2013-03-18 17:05  lexus 阅读( ...) 评论( ...) 编辑 收藏

转载于:https://www.cnblogs.com/lexus/archive/2013/03/18/2966391.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值