



Bibliography of constructive induction - feature engineering Bibliography on Automated Text Categorization Bibliography - Text Categorization Automatic Text Processing related short bibliography Feature Subset Selection Bibliography Bibliography of NLP in Biomedicine Lifelong learning, meta-learning Spam Bibliography Machine Learning Bibliographies Machine Learning Applied to Text Feature Selection Computer Science Bibliographies TDT Publications Bibliography on Transformation-Based Learning

Common Sense


Open Mind OpenCyc ThoughtTreasure home page Cycorp


Companies and organizations


Electronic pocket talking dictionaries and translators ARDA Home Page Web Intelligence Consortium AvaQuest, Inc. Resources - Categorization Vendors


Open source projects


Senga The OpenNLP Homepage Worldwide Lexicon NLP Toolkit POPFile Automatic Email Sorting using Naive Bayes linguana Morphix-NLP -- The most NLP application on one CD! ZSoft platform-independent solutions for Data Mining


Spam mail


Email spam A Plan for Spam POPFile Automatic Email Sorting using Naive Bayes Spammunition Internet Content Filtering Group


Machine Learning Laboratory Snowfox Home Research proposal Welcome to Cross Language Evaluation Forum Text REtrieval Conference (TREC) WebBase Project Data Mining on the Web (mentions OpenDir) WebKB search.cpan.org Ken Williams - AI-Categorizer WebKB@CMU Interspace Center for Automated Learning and Discovery Columbia Newsblaster Google Web APIs - Home The Lemur Toolkit for Language Modeling and Information Retrieval UNLP General Information Text categorization using lexical chains Kernel Methods for Image and Text Natural Language Processing (NLP) at Cornell The CAPTCHA Project Demo of semantic word orientation

LIBSVM MATLAB Support Vector Machine Toolbox SVM-Light Support Vector Machine SvmFu Documentation mySVM


Language Identification


Language Identification Tools Stochastic Language Identifier Language Identification XRCE CA Language Identifier Welcome to Inxight Software, Inc. OEM Products Language & Character Encoding Identification Automatic Language Identification Bibliography RALI -- S I L C Identification of Language and Character Encoding Basis Technology's Products Rosette Language Identifier TextCat Language Guesser




Porter in Perl Lovins Snowball Porter Stemming Algorithm


Part of Speech Tagging


MULTEXT TnT - Statistical Part-of-Speech Tagging QTag Eric Brill's tagger ePost - C++ wrapper of Brill's tagger


Text categorization


The Bow Toolkit UDC in brief Kea - automatic keyphrase extraction BoosTexter SNoW LTG software LT TCR S-EM download page Learning with Positive and Unlabeled Data LPU download page


Machine Learning


C4.5 - C5.0


See5 An Informal Tutorial RuleQuest Research Data Mining Tools Ross Quinlan - AI Group, CSE


Weka 3 The SLIPPER Rule Learning System The WHIRL System DTREG -- Decision Tree Analysis Program NLREG -- Nonlinear Regression Analysis Program SGI - MLC++ Home Page YALE - Yet Another Learning Environment




EuroWordNet The Global WordNet Association WordNet WordNet 1.6 Vocabulary Helper WordNet in RDF Wordnet Domains Richard Lexicon Home Demos


Roget's Thesaurus


Roget's Thesaurus as an Electronic Lexical Knowledge Base




LSI - Latent Semantic Indexing Web Site Psycholinguistics and Computational Cognition Lab Telcordia Latent Semantic Indexing (LSI) Demo Machine LSA @ CU Boulder Introduction to LSI




CMU AI Repository - NLP NL Software Registry @ DFKI Resources Software Tools for NLP Speech and Language Web Resources The Data Warehousing Information Center - Text Mining Tools Welcome to Cognitive Computation


Sentence boundary detection


SATZ - Sentence boundary detector MXTERMINATOR search.cpan.org Tony G. Rose - HTML-Summary-0.017 LTG software LT TTT Adwait Ratnaparkhi Stat NLP Automatic English Sentence Segmenter LinguaENSentence - Splitting text into sentences. Sentencizers


XML parsers


expat Xerces C++ Parser


Open directory




About Yahoo


Open Directory - Use of ODP Data Web Directory Sizes ODP and Yahoo Size Projection Charts


Semantic metrics


Dekang Lin - semantic metrics search.cpan.org Siddharth Patwardhan - WordNet-Similarity-0.03


NL parsing


Minipar Link Grammar Parser Apple Pie Parser Conexor Analyzers


Misc text analysis tools


LT Group - Edinburgh Infogistics Text Analysis tools Senga fnTBL Toolkit - Home WordStat SRI Language Modeling Toolkit Textomy - tooks for text dissection


Text summarization


Copernic Summarizer - Product Overview search.cpan.org HTMLSummary - module for generating a summary from a web page.


HTML parsers


Clean up your Web pages with HTML TIDY HTML Tidy Project Page


Named Entity Recognition


Language-Independent Named Entity Recognition


AI Search


Local++ Project Home Page AI C++ Search Class Library




Netlib TNT Home Page GAMS - Guide to Available Mathematical Software Critical t Values Peter Hellekalek pLab Software Pseudo random number generators




STL Guide at SGI STLport Boost STL Error Decryptor




Rob van der Woude's Scripting Pages Batch Files Sample Win9x Batch Programs


GSview Introduction to GnuPlot

Search engines


Notess.com_ The Greg Notess Web Site Search Engine Watch Search Tools - Information, Guides and News Finding Information on the Internet A TUTORIAL Search tools Web Search @ About.com The Internet Archive Wayback Machine Searchengines.Ru Search Engine Showdown Teoma Search -- Search with Authority KartOO On Search, the Series


Speech Processing


Speech Recognition Update Speech Technology Magazine online Speechtechnology Network Compaq.com - SpeechBot Biometric Consortium


Book publishers


MIT Press Addison-Wesley Prentice Hall W.H.Freeman and Company Cambridge University Press Academic Press Kluwer Academic Publishers Oxford University Press The University of Chicago Press Elsevier John Wiley and Sons O'Reilly and Associates McGraw-Hill Book Company Mcmillan Computer Reference


Mailing lists


TREC filtering Corpora Colibri Elsnet list Linguist Search Engine Report Connectionists WebIR


Corpora and lexicons




SIGLEX Resources Corpus Linguistics English language corpora Linguistic Data Resources on the Internet The ACL NLP-CL Universe W3-Corpora List of Corpora BNC English Language Corpora and Corpus resources David Lee's Bookmarks for Corpus-based Linguists


Online books and texts


Project Gutenberg Electronic Text Center -- University of Virginia The Online Books Page




Reuters Research and Standards Group - Corpus RCV1




Reuters-21578 Text Categorization Collection Reuters-21578 Text Categorization Test Collection Tools for Reuters-21578 Text Categorization Dataset




Files Available to Download or View Medical Subject Headings (MeSH) OHSUMED (FTP)


American National Corpus Novelty and Redundancy Detection for Adaptive Filtering DataSet Glasgow IDOM - Test collections ICAME The BNC Handbook LDC - Linguistic Data Consortium The ELRA home page The Oxford Text Archive WIPO automated categorization datasets Web Term Document Frequency Form OPUS - an open source parallel corpus Collocational Dictionary (ARCS) The Moby Project The TREC-AP Text Categorization Test Collection Words and Phrases from the British National Corpus Free Association Norms Longman Dictionaries for Research (LDOCE) Movie Review Data

Scientific search


NCSTRL Home Page Computer Science e-Print Archive Cora Research Paper Search IEEE Xplore ResearchIndex (NEC) Welcome to the ACM Digital Library Welcome to IEEE Transactions & Journals Scirus - Searching for Science Unified Computer Science TR Index (UCSTRI) search4science Computation and Language - ISRAEL Mirror Other Lists of Bibliographies Computer Science Bibliography Glimpse Server Cornell Computer Science Technical Reports NASA Technical Report Server (NTRS) Papers database main page Technical Reports - NASA LaRC Technical Library

Online publications




Journal of Artificial Intelligence Research Journal of Machine Learning Research Journal of Intelligent Information Systems TAL journal - Association pour le Traitement Automatique des LAngues




VLDB Endowment Inc.


Books and reports


Foundations of Statistical Natural Language Processing Survey of the State of the Art in Human Language Technology Pattern Classification - Duda, Hart, Stork Generalized Information Measures and their Applications Managing Gigabytes Numerical Recipes Data-Intensive Linguistics


ACL Anthology

Hubs on NLP, IR, ML etc


ELSnet Homepage fabulousness - linguistics and stuff Information Retrieval Links Fieldmethods.net Linguistic Resources on the Internet Speech and Language Web Resources Boosting Research Site Boosting.org Survey of Information Retrieval The Association for Computational Linguistics The LINGUIST List COLT Computational Learning Theory Pattern Recognition on the Web Statistical NLP - corpus-based resources The ELRA home page KDnuggets Data Mining, Web Mining, and Knowledge Discovery Guide Information Filtering Resources MLnet OiS - Machine Learning, Knowledge Discovery, Data Mining, Case-based Reasoning, and Knowledge Acquisition Glasgow IDOM - IR resources Weblog of computational linguistics WebIR ACL SIG on Natural Language Learning (SIGNLL) COLE sites about Computational Linguistics EACL HLT Home

Advanced LaTeX LaTeX- from quick and dirty to style and finesse




LaTeX2e Help file Help on LaTeX commands The LaTeX Encycolpedia Math Symbols in LaTeX LATEX maths and graphics The Technion Guide to LATEX2e




CTAN LaTeX Archive The TeX Catalog Online TeX Users Group Home Page







