转:Nutch-2.2.1脚本分析

为了对Nutch进行定制化,需要看懂Nutch的源码。

版本:2.2.1   最新版本

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~·

?
1
2
3
我们知道执行nutch时,会敲入
. /bin/nutch
通过查看nutch的内容,我们知道这是一个shell脚本

 

?
1
2
3
cat nutch|wc -l
244
root @idc200 :/usr/local/nutch- 2.2 . 1 /runtime/local/bin#

先来分析一下这244行脚本

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# The Nutch command script
#
# Environment Variables
#
#   NUTCH_JAVA_HOME The java implementation to use.  Overrides JAVA_HOME.
#
#   NUTCH_HEAPSIZE  The maximum amount of heap to use, in MB.
#                   Default is 1000.
#
#   NUTCH_OPTS      Extra Java runtime options.
#


上面是1-28行的注释,不解释。

~~~~~~~~~~~~~~~~~~~~~~~~~~~

接下来是 29-33行

?
1
2
3
4
cygwin=false
case "`uname`" in
CYGWIN*) cygwin=true;;
esac

通过输入

if $cygwin; then
        echo "cygwin is true"
else
        echo "cygwin is false"
fi

可知cygwin为false,这是因为我直接在linux环境下运行,而不是window下运行。

~~~~~~~~~~~~~~~~~~~~~~~~~~~

接下来是34-45行

?
1
2
3
4
5
6
7
8
9
10
11
# resolve links - $0 may be a softlink
THIS= "$0"
while [ -h "$THIS" ]; do
   ls =` ls -ld "$THIS" `
   link=` expr "$ls" : '.*-> \(.*\)$' `
   if expr "$link" : '.*/.*' > /dev/null ; then
     THIS= "$link"
   else
     THIS=` dirname "$THIS" `/ "$link"
   fi
done

这段话是什么意思呢?

THIS是第一个参数的值,比如说正常情况下就是"./bin/nutch".

[ -h "$THIS" ]的意思是什么呢?

?
1
2
3
4
5
-h FILE
FILE exists and is a symbolic link (same as -L)
 
-h 用来判断$PRG文件是否存在并且是一个符号链接
脚本就是当$PRG存在并且是符号链接时执行 do ~ done 之间的脚本

我的不是符号链接,则不用理会这个。

~~~~~~~~~~~~

接下来的是46-73行。

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# if no args specified, show usage
if [ $ # = 0 ]; then
   echo "Usage: nutch COMMAND"
   echo "where COMMAND is one of:"
# echo " crawl one-step crawler for intranets"
   echo " inject         inject new urls into the database"
   echo " hostinject     creates or updates an existing host table from a text file"
   echo " generate       generate new batches to fetch from crawl db"
   echo " fetch          fetch URLs marked during generate"
   echo " parse          parse URLs marked during fetch"
   echo " updatedb       update web table after parsing"
   echo " updatehostdb   update host table after parsing"
   echo " readdb         read/dump records from page database"
   echo " readhostdb     display entries from the hostDB"
   echo " elasticindex   run the elasticsearch indexer"
   echo " solrindex      run the solr indexer on parsed batches"
   echo " solrdedup      remove duplicates from solr"
   echo " parsechecker   check the parser for a given url"
   echo " indexchecker   check the indexing filters for a given url"
   echo " plugin         load a plugin and run one of its classes main()"
   echo " nutchserver    run a (local) Nutch server on a user defined port"
   echo " junit          runs the given JUnit test"
   echo " or"
   echo " CLASSNAME      run the class named CLASSNAME"

这个就很简单了,如果没有第2个参数,则打印用法。

~~~~~~~~~~~~~~~~~~

接下来是74-77行

 

?
1
2
3
# get arguments
COMMAND=$ 1
shift


这个是什么意思呢?

先将参数1传给COMMAND.然后左移动一位,将参数2变成参数1,参数3变成参数2.

注意:参数0保持不变。

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

接下来是78-81行

?
1
2
3
# some directories
THIS_DIR=` dirname "$THIS" `
NUTCH_HOME=` cd "$THIS_DIR/.." ; pwd `

这个比较简单,我的环境下的输出结果是:

?
1
2
. /bin
/usr/local/nutch-2 .2.1 /runtime/local

~~~~~~~~~~~~~~~~~~~~

接下来是82-93行

?
1
2
3
4
5
6
7
8
9
10
# some Java parameters
if [ "$NUTCH_JAVA_HOME" != "" ]; then
   #echo "run java in $NUTCH_JAVA_HOME"
   JAVA_HOME=$NUTCH_JAVA_HOME
fi
 
if [ "$JAVA_HOME" = "" ]; then
   echo "Error: JAVA_HOME is not set."
   exit 1
fi

测试看NUTCH_JAVA_HOME是否为空,我的linux环境里没有配置这个环境变量。

所以NUTCH_JAVA_HOME仍然保持为空。

下面的是测试JAVA_HOME是否设置,否则报错退出。这个没有啥问题。

~~~~~~~~~~~~~~~~~~~~~~

接下来的是94-103行

?
1
2
3
4
5
6
7
8
9
# NUTCH_JOB
if [ -f ${NUTCH_HOME}/*nutch*.job ]; then
     local = false
   for f in $NUTCH_HOME/*nutch*.job; do
     NUTCH_JOB=$f;
   done
else
   local = true
fi

来分析下这段脚本的含义

其实是通过判断文件是否存在来判断运行在deploy模式下还是local模式下。

我当前使用了local文件夹的文件,所以local为true.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

接下来是104-108行

?
1
2
3
4
# cygwin path translation
if $cygwin; then
   NUTCH_JOB=`cygpath -p -w "$NUTCH_JOB" `
fi

 这个不用理会。

~~~~~~~~~~~~~~~~··

接下来是109-111行

?
1
2
JAVA=$JAVA_HOME/bin/java
JAVA_HEAP_MAX=-Xmx 1000 m

我的环境的输出结果为:

?
1
2
/usr/lib/jvm/jdk1.7.0_21/bin/java
-Xmx1000m

~~~~~~~~

接下来是112-118行

?
1
2
3
4
5
6
# check envvars which might override default args
if [ "$NUTCH_HEAPSIZE" != "" ]; then
   #echo "run with heapsize $NUTCH_HEAPSIZE"
   JAVA_HEAP_MAX= "-Xmx" "$NUTCH_HEAPSIZE" "m"
   #echo $JAVA_HEAP_MAX
fi

我的环境的NUTCH_HEAPSIZE没有设置,所以这个也不用理会。

~~~~~~~~

接下来119-125行

?
1
2
3
4
5
6
# CLASSPATH initially contains $NUTCH_CONF_DIR, or defaults to $NUTCH_HOME/conf
CLASSPATH=${NUTCH_CONF_DIR:=$NUTCH_HOME /conf }
CLASSPATH=${CLASSPATH}:$JAVA_HOME /lib/tools .jar
 
# so that filenames w/ spaces are handled correctly in loops below
IFS=

这个就很简单了

我的环境的CLASSPATH输出结果为:

/usr/local/nutch-2.2.1/runtime/local/conf:/usr/lib/jvm/jdk1.7.0_21/lib/tools.jar

 

~~~~~~~~~~~

接下来是:126-142行

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# add libs to CLASSPATH
if $ local ; then
   for f in $NUTCH_HOME /lib/ *.jar; do
    CLASSPATH=${CLASSPATH}:$f;
   done
   # local runtime
   # add plugins to classpath
   if [ -d "$NUTCH_HOME/plugins" ]; then
      CLASSPATH=${NUTCH_HOME}:${CLASSPATH}
   fi
fi
 
# cygwin path translation
if $cygwin; then
   CLASSPATH=`cygpath -p -w "$CLASSPATH" `
fi

其实就是不断添加jar包到CLASSPATH里

这样,输出就包含了很多的jar包。

~~~~~~~~~~~~~~~

接下来是143-164行

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# setup 'java.library.path' for native-hadoop code if necessary
# used only in local mode
JAVA_LIBRARY_PATH= ''
if [ -d "${NUTCH_HOME}/lib/native" ]; then
   JAVA_PLATFORM=`CLASSPATH=${CLASSPATH} ${JAVA} org.apache.hadoop.util.PlatformName | sed -e 's/ /_/g' `
 
   if [ -d "${NUTCH_HOME}/lib/native" ]; then
     if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then
       JAVA_LIBRARY_PATH=${JAVA_LIBRARY_PATH}:${NUTCH_HOME} /lib/native/ ${JAVA_PLATFORM}
     else
       JAVA_LIBRARY_PATH=${NUTCH_HOME} /lib/native/ ${JAVA_PLATFORM}
     fi
   fi
fi
 
if [ $cygwin = true -a "X${JAVA_LIBRARY_PATH}" != "X" ]; then
   JAVA_LIBRARY_PATH=`cygpath -p -w "$JAVA_LIBRARY_PATH" `
fi
 
# restore ordinary behaviour
unset IFS

直接自己加代码echo出这些变量的值就可以。很简单。

~~~~~~~~~~~~~~~

165-184行

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# default log directory & file
if [ "$NUTCH_LOG_DIR" = "" ]; then
   NUTCH_LOG_DIR= "$NUTCH_HOME/logs"
fi
if [ "$NUTCH_LOGFILE" = "" ]; then
   NUTCH_LOGFILE= 'hadoop.log'
fi
 
#Fix log path under cygwin
if $cygwin; then
   NUTCH_LOG_DIR=`cygpath -p -w "$NUTCH_LOG_DIR" `
fi
 
NUTCH_OPTS= "$NUTCH_OPTS -Dhadoop.log.dir=$NUTCH_LOG_DIR"
NUTCH_OPTS= "$NUTCH_OPTS -Dhadoop.log.file=$NUTCH_LOGFILE"
 
if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then
   NUTCH_OPTS= "$NUTCH_OPTS -Djava.library.path=$JAVA_LIBRARY_PATH"
fi

我打印出结果如下:

?
1
2
3
4
5
6
7
NUTCH_LOGFILE
hadoop.log
 
 
 
NUTCH_OPTS 
-Dhadoop.log.dir=/usr/local/nutch- 2.2 . 1 /runtime/local/logs -Dhadoop.log.file=hadoop.log -Djava.library.path=/usr/local/nutch- 2.2 . 1 /runtime/local/lib/ native /Linux-amd64- 64

~~~~~~~~~~~

185-227行

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# figure out which class to run
if [ "$COMMAND" = "crawl" ] ; then
CLASS=org.apache.nutch.crawl.Crawler
elif [ "$COMMAND" = "inject" ] ; then
CLASS=org.apache.nutch.crawl.InjectorJob
elif [ "$COMMAND" = "hostinject" ] ; then
CLASS=org.apache.nutch.host.HostInjectorJob
elif [ "$COMMAND" = "generate" ] ; then
CLASS=org.apache.nutch.crawl.GeneratorJob
elif [ "$COMMAND" = "fetch" ] ; then
CLASS=org.apache.nutch.fetcher.FetcherJob
elif [ "$COMMAND" = "parse" ] ; then
CLASS=org.apache.nutch.parse.ParserJob
elif [ "$COMMAND" = "updatedb" ] ; then
CLASS=org.apache.nutch.crawl.DbUpdaterJob
elif [ "$COMMAND" = "updatehostdb" ] ; then
CLASS=org.apache.nutch.host.HostDbUpdateJob
elif [ "$COMMAND" = "readdb" ] ; then
CLASS=org.apache.nutch.crawl.WebTableReader
elif [ "$COMMAND" = "readhostdb" ] ; then
CLASS=org.apache.nutch.host.HostDbReader
elif [ "$COMMAND" = "elasticindex" ] ; then
CLASS=org.apache.nutch.indexer.elastic.ElasticIndexerJob
elif [ "$COMMAND" = "solrindex" ] ; then
?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
CLASS=org.apache.nutch.indexer.solr.SolrIndexerJob
elif [ "$COMMAND" = "solrdedup" ] ; then
CLASS=org.apache.nutch.indexer.solr.SolrDeleteDuplicates
elif [ "$COMMAND" = "parsechecker" ] ; then
   CLASS=org.apache.nutch.parse.ParserChecker
elif [ "$COMMAND" = "indexchecker" ] ; then
   CLASS=org.apache.nutch.indexer.IndexingFiltersChecker
elif [ "$COMMAND" = "plugin" ] ; then
CLASS=org.apache.nutch.plugin.PluginRepository
elif [ "$COMMAND" = "nutchserver" ] ; then
CLASS=org.apache.nutch.api.NutchServer
elif [ "$COMMAND" = "junit" ] ; then
   CLASSPATH=$CLASSPATH:$NUTCH_HOME/test/classes/
   CLASS=junit.textui.TestRunner
else
CLASS=$COMMAND
fi
 
这个就是根据你输入的命令选择对应的类:

这里的crawl对应着-

?
1
org.apache.nutch.crawl.Crawler

~~~~~~~~~~~

228-244行

?
1
2
3
4
5
6
7
8
9
10
11
12
13
if $ local ; then
  # fix for the external Xerces lib issue with SAXParserFactory
  NUTCH_OPTS= "-Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl $NUTCH_OPTS"
  EXEC_CALL= "$JAVA $JAVA_HEAP_MAX $NUTCH_OPTS -classpath $CLASSPATH"
else
  # check that hadoop can be found on the path
  if [ $( which hadoop | wc -l ) - eq 0 ]; then
     echo "Can't find Hadoop executable. Add HADOOP_HOME/bin to the path or run in local mode."
     exit -1;
  fi
  # distributed mode
  EXEC_CALL= "hadoop jar $NUTCH_JOB"
fi

 

加上最后一行的

?
1
exec $EXEC_CALL $CLASS "$@"

其实就相当于

=========================================================

?
1
/usr/lib/jvm/jdk1. 7 .0_21/bin/java
?
1
-Xmx1000m
?
1
-Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
?
1
-Dhadoop.log.dir=/usr/local/nutch- 2.2 . 1 /runtime/local/logs
?
1
-Dhadoop.log.file=hadoop.log
?
1
-Djava.library.path=/usr/local/nutch- 2.2 . 1 /runtime/local/lib/ native /Linux-amd64- 64
?
1
-classpath
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/conf:
?
1
/usr/lib/jvm/jdk1. 7 .0_21/lib/tools.jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/activation- 1.1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/aopalliance- 1.0 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/apache-nutch- 2.2 . 1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/asm- 3.2 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/avro- 1.3 . 3 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/commons-beanutils- 1.7 . 0 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/commons-beanutils-core- 1.8 . 0 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/commons-cli- 1.2 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/commons-codec- 1.4 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/commons-collections- 3.2 . 1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/commons-configuration- 1.6 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/commons-digester- 1.8 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/commons-el- 1.0 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/commons-httpclient- 3.1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/commons-io- 2.4 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/commons-lang- 2.6 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/commons-logging- 1.1 . 1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/commons-math- 2.1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/commons-net- 1.4 . 1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/crawler-commons- 0.2 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/cxf-api- 2.5 . 2 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/cxf-common-utilities- 2.5 . 2 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/cxf-rt-bindings-xml- 2.5 . 2 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/cxf-rt-core- 2.5 . 2 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/cxf-rt-frontend-jaxrs- 2.5 . 2 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/cxf-rt-transports-common- 2.5 . 2 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/cxf-rt-transports-http- 2.5 . 2 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/elasticsearch- 0.19 . 4 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/geronimo-javamail_1.4_spec- 1.7 . 1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/geronimo-stax-api_1.0_spec- 1.0 . 1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/gora-core- 0.3 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/guava- 11.0 . 2 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/hadoop-core- 1.2 . 0 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/hamcrest-core- 1.3 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/hsqldb- 2.2 . 8 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/httpclient- 4.1 . 1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/httpcore- 4.1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/icu4j- 4.0 . 1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/jackson-core-asl- 1.8 . 8 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/jackson-jaxrs- 1.7 . 1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/jackson-mapper-asl- 1.8 . 8 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/jackson-xc- 1.7 . 1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/jaxb-api- 2.2 . 2 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/jaxb-impl- 2.2 . 3 - 1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/jdom- 1.1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/jersey-core- 1.8 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/jersey-json- 1.8 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/jersey-server- 1.8 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/jettison- 1.3 . 1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/jetty- 6.1 . 26 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/jetty-client- 6.1 . 26 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/jetty-sslengine- 6.1 . 26 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/jetty-util5- 6.1 . 26 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/jetty-util- 6.1 . 26 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/jline- 0.9 . 1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/jsr305- 1.3 . 9 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/jsr311-api- 1.1 . 1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/junit- 4.11 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/juniversalchardet- 1.0 . 3 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/log4j- 1.2 . 16 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/lucene-analyzers- 3.6 . 0 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/lucene-core- 3.6 . 0 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/lucene-highlighter- 3.6 . 0 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/lucene-memory- 3.6 . 0 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/lucene-queries- 3.6 . 0 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/neethi- 3.0 . 1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/org.osgi.core- 4.0 . 0 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/org.restlet- 2.0 . 5 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/org.restlet.ext.jackson- 2.0 . 5 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/oro- 2.0 . 8 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/paranamer- 2.2 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/paranamer-ant- 2.2 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/paranamer-generator- 2.2 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/qdox- 1.10 . 1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/serializer- 2.7 . 1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/servlet-api- 2.5 - 20081211 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/servlet-api- 2.5 - 6.1 . 14 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/slf4j-api- 1.6 . 6 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/slf4j-log4j12- 1.6 . 1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/solr-solrj- 3.4 . 0 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/spring-aop- 3.0 . 6 .RELEASE.jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/spring-asm- 3.0 . 6 .RELEASE.jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/spring-beans- 3.0 . 6 .RELEASE.jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/spring-context- 3.0 . 6 .RELEASE.jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/spring-core- 3.0 . 6 .RELEASE.jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/spring-expression- 3.0 . 6 .RELEASE.jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/spring-web- 3.0 . 6 .RELEASE.jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/stax2-api- 3.1 . 1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/stax-api- 1.0 . 1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/stax-api- 1.0 - 2 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/tika-core- 1.3 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/woodstox-core-asl- 4.1 . 1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/wsdl4j- 1.6 . 2 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/wstx-asl- 3.2 . 7 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/xercesImpl- 2.9 . 1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/xml-apis- 1.3 . 04 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/xmlenc- 0.52 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/xmlParserAPIs- 2.6 . 2 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/xmlschema-core- 2.0 . 1 .jar:
?
1
/usr/local/nutch- 2.2 . 1 /runtime/local/lib/zookeeper- 3.3 . 1 .jar
?
1
org.apache.nutch.crawl.Crawler
?
1
再加上剩下的若干参数

 =========================================================

分析完毕。

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值