比如要找readdb的使用方法
可以用文本方式打开bin/nutch.sh
然后,找到相应的这么一句:
elif [ "$COMMAND" = "readdb" ] ; then
CLASS=org.apache.nutch.crawl.CrawlDbReader
这样子就能知道方法对应的类。
然后,在Elcipse中导入nutch的整个项目包括源码。在其中,找到需要的类,然后,作为java application来运行。
可以得到以下console内容:
Usage: CrawlDbReader <crawldb> (-stats | -dump <out_dir> | -topN <nnnn> <out_dir> [<min>] | -url <url>)
<crawldb> directory name where crawldb is located
-stats [-sort] print overall statistics to System.out
[-sort] list status sorted by host
-dump <out_dir> [-format normal|csv ] dump the whole db to a text file in <out_dir>
[-format csv] dump in Csv format
[-format normal] dump in standard format (default option)
-url <url> print information on <url> to System.out
-topN <nnnn> <out_dir> [<min>] dump top <nnnn> urls sorted by score to <out_dir>
[<min>] skip records with scores below this value.
This can significantly improve performance.