1,ictcas 包括Java,LinuxC, WindowsC 的版本均在 http://w http:// ww.ictclas.org/index.html 有下载。
2,imdict-chinese-analyzer 是 imdict智能词典 的智能中文分词模块,作者高小平,算法基于隐马尔科夫模型(Hidden Markov Model, HMM),是中国科学院计算技术研究所的ictclas中文分词程序的重新实现(基于Java),可以直接为lucene搜索引擎提供中文分词支持。 也可以在 http://www.ictclas.org/index.html 下载。
3,LingPipe is a suite of Java libraries for the linguistic analysis of human language. http://alias-i.com/lingpipe/index.html。 这个工具中的分词部分中,可以通过学习形成模型,或者从网站上下载模型。
2012
Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics {(Volume} 1: Long Papers)
Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics {(Volume} 1: Long Papers)
Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics {(Volume} 1: Long Papers)
Joint Chinese Word Segmentation, {POS} Tagging and Parsing
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Unified Dependency Parsing of Chinese Morphological and Syntactic Structures
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Iterative Annotation Transformation with Predict-Self Reestimation for Chinese Word Segmentation
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Incremental Joint Approach to Word Segmentation, {POS} Tagging, and Dependency Parsing in Chinese
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics {(Volume} 1: Long Papers)
2011
Improving Chinese Word Segmentation and {POS} Tagging with Semi-supervised Methods Using Large Auto-Analyzed Data
Proceedings of 5th International Joint Conference on Natural Language Processing
Enhancing Chinese Word Segmentation Using Unlabeled Data
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing
A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
Syntactic Processing using the Generalized Perceptron and Beam Search
Computational Linguistics
A New Unsupervised Approach to Word Segmentation
Computational Linguistics
2010
A Fast Decoder for Joint Word Segmentation and {POS-Tagging} Using a Single Discriminative Model
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Joint Tokenization and Translation
Proceedings of the 23rd International Conference on Computational Linguistics {(Coling} 2010)
A Character-Based Joint Model for Chinese Word Segmentation
Proceedings of the 23rd International Conference on Computational Linguistics {(Coling} 2010)
A Local Generative Model for Chinese Word Segmentation
Information Retrieval Technology
Joint training and decoding using virtual nodes for cascaded segmentation and tagging tasks
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
2009
Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and {POS} Tagging – A Case Study
Proceedings of the 47th {ACL}
Character-Level Dependencies in Chinese: Usefulness and Learning
Proceedings of the 12th Conference of the European Chapter of the {ACL} {(EACL} 2009)
基于字依存树的中文词法-句法一体化分析
中国计算机语言学研究前沿进展 (2007-2009)
基于 {CRFs} 的中文分词和短文本分类技术
A Simple and Efficient Model Pruning Method for Conditional Random Fields
Chinese text segmentation: A hybrid approach using transductive learning and statistical association measures
Expert Systems with Applications
Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling
Proceedings of the Joint Conference of the 47th Annual Meeting of the {ACL} and the 4th International Joint Conference on Natural Language Processing of the {AFNLP}
Punctuation as Implicit Annotations for Chinese Word Segmentation
Computational Linguistics
An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and {POS} Tagging
Proc. of {ACL-IJCNLP} 2009
2008
Word Lattice Reranking for Chinese Word Segmentation and Part-of-Speech Tagging
Proceedings of the 22nd International Conference on Computational Linguistics {(Coling} 2008)
Joint Word Segmentation and {POS} Tagging Using a Single Perceptron
Proceedings of {ACL-08:} {HLT}
A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
Proceedings of {ACL-08:} {HLT}
Unsupervised segmentation helps supervised learning of character tagging for word segmentation and named entity recognition
The Sixth {SIGHAN} Workshop on Chinese Language Processing
An Empirical Comparison of Goodness Measures for Unsupervised Chinese Word Segmentation with a Unified Framework
The Third International Joint Conference on Natural Language Processing {(IJCNLP-2008)}, Hyderabad, India
Bayesian semi-supervised chinese word segmentation for statistical machine translation
Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1
Statistical Properties of Overlapping Ambiguities in Chinese Word Segmentation and a Strategy for Their Disambiguation
Text, Speech and Dialogue
Information retrieval oriented word segmentation based on character associative strength ranking
Proceedings of the Conference on Empirical Methods in Natural Language Processing
2007
Chinese Segmentation with a Word-Based Perceptron Algorithm
基于有效子串标注的中文分词
中文信息学报
中文分词十年回顾
中文信息学报
A dual-layer {CRFs} based joint decoding method for cascaded segmentation and labeling tasks
Proceedings of {IJCAI}
A hybrid approach to word segmentation and pos tagging
{ANNUAL} {MEETING-ASSOCIATION} {FOR} {COMPUTATIONAL} {LINGUISTICS}
Rethinking Chinese word segmentation: tokenization, character classification, or wordbreak identification
Proceedings of the 45th Annual Meeting of the {ACL} on Interactive Poster and Demonstration Sessions
2006
Subword-Based Tagging for Confidence-Dependent Chinese Word Segmentation
Proceedings of the {COLING/ACL} 2006 Main Conference Poster Sessions
汉语词典的快速查询算法研究
中文信息学报
An improved Chinese word segmentation system with conditional random field
Proceedings of the Fifth {SIGHAN} Workshop on Chinese Language Processing
Discriminative pruning of language models for Chinese word segmentation
Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Contextual Dependencies in Unsupervised Word Segmentation
Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
2005
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach
Computational Linguistics
A conditional random field word segmenter for sighan bakeoff 2005
Proceedings of the Fourth {SIGHAN} Workshop on Chinese Language Processing
Perceptron Learning for Chinese Word Segmentation
Proceedings of Fourth {SIGHAN} Workshop on Chinese Language processing {(Sighan-05)}
The second international chinese word segmentation bakeoff
Proceedings of the Fourth {SIGHAN} Workshop on Chinese Language Processing
A Statistic Study of Three-character Unknown Words in Chinese
Journal of Chinese Language and Computing
2004
Chinese Segmentation and New Word Detection using Conditional Random Fields
Proceedings of Coling 2004
Chinese Part-of-Speech Tagging: One-at-a-Time or All-at-Once? Word-Based or Character-Based?
Proceedings of {EMNLP} 2004
基于无指导学习策略的无词表条件下的汉语自动分词
计算机学报
Applying conditional random fields to Japanese morphological analysis
Proc. of {EMNLP}
Adaptive Chinese word segmentation
Proceedings of {ACL-2004}
Unsupervised segmentation of Chinese corpus using accessor variety
Natural Language Processing {IJCNLP} 2004
Accessor variety criteria for Chinese word extraction
Computational Linguistics
2003
{HHMM-based} Chinese lexical analyzer {ICTCLAS}
Proceedings of the second {SIGHAN} workshop on Chinese language processing-Volume 17
Chinese lexical analysis using hierarchical hidden markov model
Proceedings of the second {SIGHAN} workshop on Chinese language processing-Volume 17
Chinese Word Segmentation as {LMR} Tagging
Proceedings of the second {SIGHAN} workshop on Chinese language processing-Volume 17
Chinese Word Segmentation as Character Tagging
Computational Linguistics and Chinese Language Processing
The first international Chinese word segmentation bakeoff
Proceedings of the second {SIGHAN} workshop on Chinese language processing
A maximum entropy Chinese character-based parser
Improved source-channel models for Chinese word segmentation
Proceedings of the 41st Annual Meeting on Association for Computational Linguistics
Chinese word segmentation using minimal linguistic knowledge
Proceedings of the second {SIGHAN} workshop on Chinese language processing
Combining segmenter and chunker for Chinese word segmentation
Proceedings of the 2nd {SIGHAN} Workshop on Chinese Language Processing
2002
Combining classifiers for Chinese word segmentation
Proceedings of the 1st {SIGHAN} Workshop on Chinese Language Processing
Corpus-based methods in Chinese morphology
Tutorial at the 19th {COLING}
Corpus-based methods in Chinese morphology and phonology
{COOLING} 2002
2001
汉语自动分词研究评述
当代语言学
Defining and automatically identifying words in Chinese
Self-supervised Chinese word segmentation
Advances in Intelligent Data Analysis
2000
A compression-based algorithm for Chinese word segmentation
Comput. Linguist.
1999
Discovering Chinese words from unsegmented text (poster abstract)
Proceedings of the 22nd annual international {ACM} {SIGIR} conference on Research and development in information retrieval
1998
串频统计和词形匹配相结合的汉语自动分词系统
中文信息学报
Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data
Proceedings of the 17th international conference on Computational linguistics-Volume 2
A hybrid approach to word segmentation
Lecture notes in computer science
1997
中文信息处理中的分词问题
Applied Linguistics
An unsupervised iterative method for Chinese new lexicon extraction
International Journal of Computational Linguistics \& Chinese Language Processing
1996
A stochastic finite-state word-segmentation algorithm for Chinese
Computational Linguistics
Useg: A retargetable word segmentation procedure for information retrieval
Symposium on Document Analysis and Information Retrieval
1992
An efficient implementation of trie structures
Software: Practice and Experience
Word identification for Mandarin Chinese sentences
Proceedings of the 14th conference on Computational linguistics-Volume 1