Google global sites url
https://github.com/justjavac/Google-IPs
JCSEG
http://www.oschina.net/p/jcseg
MMSEG
http://technology.chtsai.org/mmseg/
//convert maven project to eclipse project
#mvn eclipse:eclipse -DskipTests
//tranfer text docs to seq docs
#mahout seqdirectory -c UTF-8 -i mahout/topics/textdocs -o mahout/topics/seqdocs
//dump tokenized docs(seq format) to text format
mahout seqdumper -i mahout/topics/docsvectors2/tokenized-documents -o ./tokenized-docs2
//recompile jcseg
#mvn clean package -DskipTests
Lucene Analyzer
http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/analysis/Analyzer.html
www.cnblogs.com/forfuture1978/archive/2010/06/06/1752837.html
http://www.360doc.com/content/12/0512/21/1542811_210601163.shtml
mongodb + lucene/solr MongoDB+Sphinx做全文检索 coreseek MongoDB 2.6的文本搜索现在可用于生产环境
http://www.open-open.com/lib/view/1343210299443
http://www.gasimzade.org/2012/11/under-hood-architectural-overview-of.html
http://www.jayway.com/2010/11/14/full-text-search-with-mongodb-and-lucene-analyzers/
http://docs.mongodb.org/manual/tutorial/model-data-for-keyword-search/
http://lumongo.org/
http://baike.sogou.com/v54377490.htm