中文分词资料

1,ictcas  包括Java,LinuxC, WindowsC 的版本均在  http://w http:// ww.ictclas.org/index.html 有下载。

2,imdict-chinese-analyzer 是 imdict智能词典 的智能中文分词模块,作者高小平,算法基于隐马尔科夫模型(Hidden Markov Model, HMM),是中国科学院计算技术研究所的ictclas中文分词程序的重新实现(基于Java),可以直接为lucene搜索引擎提供中文分词支持。 也可以在 http://www.ictclas.org/index.html 下载。

3,LingPipe is a suite of Java libraries for the linguistic analysis of human language. http://alias-i.com/lingpipe/index.html。 这个工具中的分词部分中,可以通过学习形成模型,或者从网站上下载模型。

4,MMSEG: A Word Identification System for Mandarin Chinese Text Based on Two Variants of the Maximum Matching Algorithm

5,Lucene 中文分词

6,开源中国社区中文分词

7,Microsoft Research S-MSRSeg


2012

Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection
Sun, Xu and Wang, Houfeng and Li, Wenjie
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics {(Volume} 1: Long Papers)
Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations
Sun, Weiwei and Wan, Xiaojun
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics {(Volume} 1: Long Papers)
Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
Sun, Weiwei and Uszkoreit, Hans
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics {(Volume} 1: Long Papers)
Joint Chinese Word Segmentation, {POS} Tagging and Parsing
Qian, Xian and Liu, Yang
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Unified Dependency Parsing of Chinese Morphological and Syntactic Structures
Li, Zhongguo and Zhou, Guodong
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Iterative Annotation Transformation with Predict-Self Reestimation for Chinese Word Segmentation
Jiang, Wenbin and Meng, Fandong and Liu, Qun and Lü, Yajuan
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Incremental Joint Approach to Word Segmentation, {POS} Tagging, and Dependency Parsing in Chinese
Hatori, Jun and Matsuzaki, Takuya and Miyao, Yusuke and Tsujii, Jun'ichi
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics {(Volume} 1: Long Papers)
2011

Improving Chinese Word Segmentation and {POS} Tagging with Semi-supervised Methods Using Large Auto-Analyzed Data
Wang, Yiou and Kazama, Jun'ichi and Tsuruoka, Yoshimasa and Chen, Wenliang and Zhang, Yujie and Torisawa, Kentaro
Proceedings of 5th International Joint Conference on Natural Language Processing
Enhancing Chinese Word Segmentation Using Unlabeled Data
Sun, Weiwei and Xu, Jia
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing
A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
Sun, Weiwei
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation
Li, Zhongguo
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
Syntactic Processing using the Generalized Perceptron and Beam Search
Zhang, Y. and Clark, S.
Computational Linguistics
A New Unsupervised Approach to Word Segmentation
Wang, H. and Zhu, J. and Tang, S. and Fan, X.
Computational Linguistics
2010

A Fast Decoder for Joint Word Segmentation and {POS-Tagging} Using a Single Discriminative Model
Zhang, Yue and Clark, Stephen
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Joint Tokenization and Translation
Xiao, Xinyan and Liu, Yang and Hwang, {YoungSook} and Liu, Qun and Lin, Shouxun
Proceedings of the 23rd International Conference on Computational Linguistics {(Coling} 2010)
A Character-Based Joint Model for Chinese Word Segmentation
Wang, Kun and Zong, Chengqing and Su, Keh-Yih
Proceedings of the 23rd International Conference on Computational Linguistics {(Coling} 2010)
A Local Generative Model for Chinese Word Segmentation
Zhang, K. and Sun, M. and Xue, P.
Information Retrieval Technology
Joint training and decoding using virtual nodes for cascaded segmentation and tagging tasks
Qian, X. and Zhang, Q. and Zhou, Y. and Huang, X. and Wu, L.
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
2009

Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and {POS} Tagging – A Case Study
Jiang, Wenbin and Huang, Liang and Liu, Qun
Proceedings of the 47th {ACL}
Character-Level Dependencies in Chinese: Usefulness and Learning
Zhao, Hai
Proceedings of the 12th Conference of the European Chapter of the {ACL} {(EACL} 2009)
基于字依存树的中文词法-句法一体化分析
赵, 海 and 揭, 春雨 and 宋, 彦
中国计算机语言学研究前沿进展 (2007-2009)
基于 {CRFs} 的中文分词和短文本分类技术
滕, 少华
A Simple and Efficient Model Pruning Method for Conditional Random Fields
Zhao, H. and Kit, C.
Chinese text segmentation: A hybrid approach using transductive learning and statistical association measures
Tsai, R. T. H.
Expert Systems with Applications
Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling
Mochihashi, Daichi and Yamada, Takeshi and Ueda, Naonori
Proceedings of the Joint Conference of the 47th Annual Meeting of the {ACL} and the 4th International Joint Conference on Natural Language Processing of the {AFNLP}
Punctuation as Implicit Annotations for Chinese Word Segmentation
Li, Zhongguo and Sun, Maosong
Computational Linguistics
An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and {POS} Tagging
Kruengkrai, Canasai and Uchimoto, Kiyotaka and Kazama, Jun'ichi and Wang, Yiou and Torisawa, Kentaro and Isahara, Hitoshi
Proc. of {ACL-IJCNLP} 2009
2008

Word Lattice Reranking for Chinese Word Segmentation and Part-of-Speech Tagging
Jiang, Wenbin and Mi, Haitao and Liu, Qun
Proceedings of the 22nd International Conference on Computational Linguistics {(Coling} 2008)
Joint Word Segmentation and {POS} Tagging Using a Single Perceptron
Zhang, Yue and Clark, Stephen
Proceedings of {ACL-08:} {HLT}
A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
Jiang, Wenbin and Huang, Liang and Liu, Qun and Lü, Yajuan
Proceedings of {ACL-08:} {HLT}
Unsupervised segmentation helps supervised learning of character tagging for word segmentation and named entity recognition
Zhao, Hai and Kit, Chunyu
The Sixth {SIGHAN} Workshop on Chinese Language Processing
An Empirical Comparison of Goodness Measures for Unsupervised Chinese Word Segmentation with a Unified Framework
Zhao, Hai and Kit, Chunyu
The Third International Joint Conference on Natural Language Processing {(IJCNLP-2008)}, Hyderabad, India
Bayesian semi-supervised chinese word segmentation for statistical machine translation
Xu, J. and Gao, J. and Toutanova, K. and Ney, H.
Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1
Statistical Properties of Overlapping Ambiguities in Chinese Word Segmentation and a Strategy for Their Disambiguation
Qiao, W. and Sun, M. and Menzel, W.
Text, Speech and Dialogue
Information retrieval oriented word segmentation based on character associative strength ranking
Liu, Y. and Wang, B. and Ding, F. and Xu, S.
Proceedings of the Conference on Empirical Methods in Natural Language Processing
2007

Chinese Segmentation with a Word-Based Perceptron Algorithm
Zhang, Yue and Clark, Stephen
基于有效子串标注的中文分词
赵, 海 and 揭, 春雨
中文信息学报
中文分词十年回顾
黄, 昌宁 and 赵, 海
中文信息学报
A dual-layer {CRFs} based joint decoding method for cascaded segmentation and labeling tasks
Shi, Y. and Wang, M.
Proceedings of {IJCAI}
A hybrid approach to word segmentation and pos tagging
Nakagawa, Tetsuji and Uchimoto, Kiyotaka
{ANNUAL} {MEETING-ASSOCIATION} {FOR} {COMPUTATIONAL} {LINGUISTICS}
Rethinking Chinese word segmentation: tokenization, character classification, or wordbreak identification
Huang, Chu-Ren and Simon, Petr and Hsieh, Shu-Kai and Prévot, L.
Proceedings of the 45th Annual Meeting of the {ACL} on Interactive Poster and Demonstration Sessions
2006

Subword-Based Tagging for Confidence-Dependent Chinese Word Segmentation
Zhang, Ruiqiang and Kikui, Genichiro and Sumita, Eiichiro
Proceedings of the {COLING/ACL} 2006 Main Conference Poster Sessions
汉语词典的快速查询算法研究
李, 江波 and 周, 强 and 陈, 祖舜
中文信息学报
An improved Chinese word segmentation system with conditional random field
Zhao, H. and Huang, C. N. and Li, M.
Proceedings of the Fifth {SIGHAN} Workshop on Chinese Language Processing
Discriminative pruning of language models for Chinese word segmentation
Li, J. and Wang, H. and Ren, D. and Li, G.
Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Contextual Dependencies in Unsupervised Word Segmentation
Goldwater, Sharon and Griffiths, Thomas L. and Johnson, Mark
Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
2005

Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach
Gao, Jianfeng and Li, Mu and Huang, Chang-Ning and Wu, Andi
Computational Linguistics
A conditional random field word segmenter for sighan bakeoff 2005
Tseng, H. and Chang, P. and Andrew, G. and Jurafsky, D. and Manning, C.
Proceedings of the Fourth {SIGHAN} Workshop on Chinese Language Processing
Perceptron Learning for Chinese Word Segmentation
Li, Y. and Miao, C. and Bontcheva, K. and Cunningham, H.
Proceedings of Fourth {SIGHAN} Workshop on Chinese Language processing {(Sighan-05)}
The second international chinese word segmentation bakeoff
Emerson, Thomas
Proceedings of the Fourth {SIGHAN} Workshop on Chinese Language Processing
A Statistic Study of Three-character Unknown Words in Chinese
Duan, {ZWXZH}
Journal of Chinese Language and Computing
2004

Chinese Segmentation and New Word Detection using Conditional Random Fields
Peng, Fuchun and Feng, Fangfang and {McCallum}, Andrew
Proceedings of Coling 2004
Chinese Part-of-Speech Tagging: One-at-a-Time or All-at-Once? Word-Based or Character-Based?
Ng, Hwee Tou and Low, Jin Kiat
Proceedings of {EMNLP} 2004
基于无指导学习策略的无词表条件下的汉语自动分词
孙, 茂松 and 肖, 明 and 邹, 嘉彦
计算机学报
Applying conditional random fields to Japanese morphological analysis
Kudo, T. and Yamamoto, K. and Matsumoto, Y.
Proc. of {EMNLP}
Adaptive Chinese word segmentation
Gao, J. and Wu, A. and Li, M. and Huang, C. N. and Li, H. and Xia, X. and Qin, H.
Proceedings of {ACL-2004}
Unsupervised segmentation of Chinese corpus using accessor variety
Feng, Haodi and Chen, Kang and Kit, Chunyu and Deng, Xiaotie
Natural Language Processing {IJCNLP} 2004
Accessor variety criteria for Chinese word extraction
Feng, Haodi and Chen, Kang and Deng, Xiaotie and Zheng, Weimin
Computational Linguistics
2003

{HHMM-based} Chinese lexical analyzer {ICTCLAS}
Zhang, H. P. and Yu, H. K. and Xiong, D. Y. and Liu, Q.
Proceedings of the second {SIGHAN} workshop on Chinese language processing-Volume 17
Chinese lexical analysis using hierarchical hidden markov model
Zhang, H. P. and Liu, Q. and Cheng, X. Q. and Zhang, H. and Yu, H. K.
Proceedings of the second {SIGHAN} workshop on Chinese language processing-Volume 17
Chinese Word Segmentation as {LMR} Tagging
Xue, Nianwen and Shen, Libin
Proceedings of the second {SIGHAN} workshop on Chinese language processing-Volume 17
Chinese Word Segmentation as Character Tagging
Xue, Nianwen
Computational Linguistics and Chinese Language Processing
The first international Chinese word segmentation bakeoff
Sproat, R. and Emerson, T.
Proceedings of the second {SIGHAN} workshop on Chinese language processing
A maximum entropy Chinese character-based parser
Luo, X.
Improved source-channel models for Chinese word segmentation
Gao, J. and Li, M. and Huang, C. N.
Proceedings of the 41st Annual Meeting on Association for Computational Linguistics
Chinese word segmentation using minimal linguistic knowledge
Chen, A.
Proceedings of the second {SIGHAN} workshop on Chinese language processing
Combining segmenter and chunker for Chinese word segmentation
Asahara, M. and Goh, C. L. and Wang, X. and Matsumoto, Y.
Proceedings of the 2nd {SIGHAN} Workshop on Chinese Language Processing
2002

Combining classifiers for Chinese word segmentation
Xue, Nianwen and Converse, Susan
Proceedings of the 1st {SIGHAN} Workshop on Chinese Language Processing
Corpus-based methods in Chinese morphology
Sproat, R. and Shih, C.
Tutorial at the 19th {COLING}
Corpus-based methods in Chinese morphology and phonology
Sproat, R. and Shih, C.
{COOLING} 2002
2001

汉语自动分词研究评述
孙, 茂松 and 邹, 嘉彦
当代语言学
Defining and automatically identifying words in Chinese
Xue, Nianwen
Self-supervised Chinese word segmentation
Peng, F. and Schuurmans, D.
Advances in Intelligent Data Analysis
2000

A compression-based algorithm for Chinese word segmentation
Teahan, W. J. and {McNab}, Rodger and Wen, Yingying and Witten, Ian H.
Comput. Linguist.
1999

Discovering Chinese words from unsegmented text (poster abstract)
Ge, X. and Pratt, W. and Smyth, P.
Proceedings of the 22nd annual international {ACM} {SIGIR} conference on Research and development in information retrieval
1998

串频统计和词形匹配相结合的汉语自动分词系统
刘, 挺 and 吴, 岩
中文信息学报
Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data
Sun, Maosong and Shen, Dayang and Tsou, Benjamin K
Proceedings of the 17th international conference on Computational linguistics-Volume 2
A hybrid approach to word segmentation
Kazakov, D. and Manandhar, S.
Lecture notes in computer science
1997

中文信息处理中的分词问题
黄, 昌宁
Applied Linguistics
An unsupervised iterative method for Chinese new lexicon extraction
Chang, J. S and Su, K. Y
International Journal of Computational Linguistics \& Chinese Language Processing
1996

A stochastic finite-state word-segmentation algorithm for Chinese
Sproat, R. and Gale, W. and Shih, C. and Chang, N.
Computational Linguistics
Useg: A retargetable word segmentation procedure for information retrieval
Ponte, J. M. and Croft, W. B.
Symposium on Document Analysis and Information Retrieval
1992

An efficient implementation of trie structures
Aoe, {Jun‐Ichi} and Morimoto, Katsushi and Sato, Takashi
Software: Practice and Experience
Word identification for Mandarin Chinese sentences
Chen, K. J and Liu, S. H
Proceedings of the 14th conference on Computational linguistics-Volume 1

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
import WordSegment.*; import java.awt.event.ActionEvent; import java.awt.event.ActionListener; import java.awt.*; import java.io.File; import java.util.Vector; import javax.swing.*; /** * */ /** * @author Truman * */ public class WordSegDemoFrame extends JFrame implements ActionListener { final static int ALGO_FMM = 1; final static int ALGO_BMM = 2; private JMenuBar menuBar = new JMenuBar(); private JMenuItem openDicItem, closeItem; private JRadioButtonMenuItem fmmItem, bmmItem; private JMenuItem openTrainFileItem, saveDicItem, aboutItem; private JButton btSeg; private JTextField tfInput; private JTextArea taOutput; private JPanel panel; JLabel infoDic, infoAlgo; private WordSegment seger; private DicTrainer trainer = new DicTrainer(); private void initFrame() { setTitle("Mini分词器"); setDefaultCloseOperation(EXIT_ON_CLOSE); setJMenuBar(menuBar); JMenu fileMenu = new JMenu("文件"); JMenu algorithmMenu = new JMenu("分词算法"); JMenu trainMenu = new JMenu("训练语料"); JMenu helpMenu = new JMenu("帮助"); openDicItem = fileMenu.add("载入词典"); fileMenu.addSeparator(); closeItem = fileMenu.add("退出"); algorithmMenu.add(fmmItem = new JRadioButtonMenuItem("正向最大匹配", true)); algorithmMenu.add(bmmItem = new JRadioButtonMenuItem("逆向最大匹配", false)); ButtonGroup algorithms = new ButtonGroup(); algorithms.add(fmmItem); algorithms.add(bmmItem); openTrainFileItem = trainMenu.add("载入并训练语料"); saveDicItem = trainMenu.add("保存词典"); aboutItem = helpMenu.add("关于Word Segment Demo"); menuBar.add(fileMenu); menuBar.add(algorithmMenu); menuBar.add(trainMenu); menuBar.add(helpMenu); openDicItem.addActionListener(this); closeItem.addActionListener(this); openTrainFileItem.addActionListener(this); saveDicItem.addActionListener(this); aboutItem.addActionListener(this); fmmItem.addActionListener(this); bmmItem.addActionListener(this); JPanel topPanel = new JPanel(); topPanel.setLayout(new FlowLayout());

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值