机器学习
Support Vector Machine
Decision Tree
Maximum Entropy
Conditional Random Field
自然语言处理
综合
- track mentions of entities (e.g. people or proteins);
- link entity mentions to database entries;
- uncover relations between entities and actions;
- classify text passages by language, character encoding, genre, topic, or sentiment;
- correct spelling with respect to a text collection;
- cluster documents by implicit topic and discover significant trends over time; and
- provide part-of-speech tagging and phrase chunking.
- Advanced Natural Lange Object-oriented Processing Environment.包括一系列工具(特别c#的stanford parser)
分词
词性标注
- SVMTool , a POS Tagger based on SVMs
- QTAG Part of speech tagger
命名实体识别
Stemming
句法分析
文本挖掘
摘要
- Rouge Rouge在Windows下的配置
其他
加密
压缩
日志
Unicode
XML
多字符串匹配
- AC in C# : Aho-Corasick string matching in C#
HTML Parser
- Html Agility Pack , an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files.
- Majestic-12 , an open source high-performance .NET C# module that was created to parse HTML for links, indexing and other purposes. 速度快,但不生成dom树
外部联接
- An annotated list of resources by Stanford NLP Group
- KDnuggets 有一些与KDD相关的软件等