机器学习常用工具


Support Vector Machine

An implementation of Vapnik's Support Vector Machine
A Library for Support Vector Machines

Decision Tree

The "classic" decision-tree tool, developed by J. R. Quinlan  Tutorial

Maximum Entropy

Yet Another Small MaxEnt Toolkit

Conditional Random Field

A simple, customizable, and open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data

自然语言处理

综合

An organizational center for open source projects related to natural language processing
A suite of UNIX software tools to facilitate the construction and testing of statistical language models
A Java-based development package for academic use in information retrieval (IR) and text mining. Include many NLP tools
A suite of Java libraries for the linguistic analysis of human language, including
  • track mentions of entities (e.g. people or proteins);
  • link entity mentions to database entries;
  • uncover relations between entities and actions;
  • classify text passages by language, character encoding, genre, topic, or sentiment;
  • correct spelling with respect to a text collection;
  • cluster documents by implicit topic and discover significant trends over time; and
  • provide part-of-speech tagging and phrase chunking.
Open source Python modules, linguistic data and documentation for research and development in natural language processing and text analytics, with distributions for Windows, Mac OSX and Linux.
  • Advanced Natural Lange Object-oriented Processing Environment.包括一系列工具(特别c#的stanford parser)

分词

中科院的中文分词系统
A Java implementation of a CRF-based Chinese Word Segmenter

词性标注

A error-driven transformation-based tagger implemented by  Eric Brill
A Java implementation of the log-linear part-of-speech taggers descriped by Kristina Toutanova, et.al.
A decision tree based tagger from the University of Stuttgart.
An HMM-based Java POS tagger from Birmingham U.

命名实体识别

A Java implementation of a Conditional Random Field sequence model, together with well-engineered features for Named Entity Recognition
Tools include statistical named-entity recognition, a heuristic sentence boundary detector, and a heuristic within-document coreference resolution engine. Java. GPL. By Bob Carpenter, Breck Baldwin and co.
SVM-based NP-chunker, also usable for POS tagging, NER, etc. C/C++ open source. Won CoNLL 2000 shared task. (Less automatic than a specialized POS tagger for an end user.)

Stemming

A process for removing the commoner morphological and inflexional endings from words in English by Martin Porter
A small string processing language designed for creating stemming algorithms for use in Information Retrieval.

句法分析

Java implementations of probabilistic natural language parsers, both highly optimized PCFG and dependency parsers, and a lexicalized PCFG parser.

文本挖掘

摘要

其他

加密

包括众多加密算法,RSA、DES、MD5、SHA等  Win32安装版

压缩

A Massively Spiffy Yet Delicately Unobtrusive Compression Library

日志

Creates and maintains open-source software related to the logging of application behavior and released at no charge to the public, including
注: log4cxx官方版本有内存泄漏问题

Unicode

A mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications

XML

A validating XML parser, including C and Java edition

多字符串匹配

  • AC in C# : Aho-Corasick string matching in C#

HTML Parser

  • Html Agility Pack , an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files.
  • Majestic-12 , an open source high-performance .NET C# module that was created to parse HTML for links, indexing and other purposes. 速度快,但不生成dom树

外部联接

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值