ubuntu下安装nutch2.x

1、安装ant(自行百度)

目前官方2.x只提供了源码下载,不再提供编译的版本,需要用户自己去编译。

2、下载 nutch 2.2.1

由于对nutch2.3.1 进行编译时,一直处在网络检测中,于是改为对2.2.1版本进行编译,

下载地址:http://archive.apache.org/dist/nutch/2.2.1/

解压到自定义的文件夹下:tar -xvf apache-nutch-2.2.1-src-tar-gz /usr/local

3、nutch存储采用mysql

修改 ${NUTCH_HOME}/ivy/ivy.xml文件,取消注释

 <dependency org="mysql" name="mysql-connector-java" rev="5.1.18" conf="*->default"/> 
<dependency org="org.apache.gora" name="gora-sql" rev="0.1.1-incubating" conf="*->default" />
修改:

<dependency org="org.apache.gora" name="gora-core" rev="0.3" conf="*->default"/>
为:
<dependency org="org.apache.gora" name="gora-core" rev="0.2.1" conf="*->default"/>

4、数据库连接配置

修改 ${NUTCH_HOME}/conf/gora.properties文件,注释掉默认的数据库连接配置,同时添加以下配置内容:

###############################  
# Default MySQL properties    #  
###############################  
gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver  
gora.sqlstore.jdbc.url=jdbc:mysql://localhost:3306/nutch?createDatabaseIfNotExist=true  
gora.sqlstore.jdbc.user=xxxx(MySQL用户名)  
gora.sqlstore.jdbc.password=xxxx(MySQL密码)

5、修改 ${NUTCH_HOME}/nutch-site.xml 配置文件

将以下内容覆盖nutch-site.xml文件

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>  
	<name>http.agent.name</name>  
	<value>YourNutchSpider</value>  
</property>  


<property>  
	<name>http.accept.language</name>  
	<value>ja-jp, en-us,en-gb,en;q=0.7,*;q=0.3</value>  
	<description>Value of the Accept-Language request header field.  
		This allows selecting non-English language as default one to retrieve.  
		It is a useful setting for search engines build for certain national group.  
	</description>  
</property>

<property>  
	<name>storage.data.store.class</name>  
	<value>org.apache.gora.sql.store.SqlStore</value>  
	<description>The Gora DataStore class for storing and retrieving data.  
		Currently the following stores are available:.  
	</description>  
</property>
   
<property>  
	<name>parser.character.encoding.default</name>  
	<value>utf-8</value>  
	<description>The character encoding to fall back to when no other information  
	is available</description>  
</property>
  
<property>  
	<name>generate.batch.id</name>  
	<value>*</value>  
</property>  
</configuration>

6、ant编译

切换到apache-nutch.2.2.1主目录下,运行ant命令

遇到的问题

  • 编译中若出现:

Could not load definitions from resource org/sonar/ant/antlib.xml. It could not be found. 

则下载sonar-ant-task-2.2.jar,地址http://repo2.maven.org/maven2/org/codehaus/sonar-plugins/sonar-ant-task/2.2/sonar-ant-task-2.2.jar
将其拷贝到 ${NUTCH_HOME}/lib 目录下面,并修改${NUTCH_HOME}/build.xml,在

<taskdef uri="antlib:org.sonar.ant" resource="org/sonar/ant/antlib.xml">
下添加
<classpath><fileset dir="./lib" includes="sonar*.jar" /></classpath>

  • 编译build failed

或者是其他的依赖性问题导致BUILD FAILED的,可通过修改maven中央库地址来解决

修改${NUTCH_HOME}/ivy/ivysettings.xml中

<property name="repo.maven.org"  
    value="http://repo1.maven.org/maven2/"  
    override="false"/>

value值改为其它中央库地址:

http://repo2.maven.org/maven2/(这个靠谱)

http://repository.sonatype.org/content/groups/public/

http://central.maven.org/maven2/

  • 编译卡顿

若一直出现在以下界面:

resolve-default:  
[ivy:resolve] :: Apache Ivy 2.3.0 - 20130110142753 :: http://ant.apache.org/ivy/ ::  
[ivy:resolve] :: loading settings :: file = /opt/apache-nutch-2.3.1/ivy/ivysettings.xml 

耐心等待两分钟,若还是不动,重新ant编译,最好在网络顺畅的条件下编译

  • 编译中出现以下情况:

You probably access the destination server through a proxy server that is not well configured.
重新ant编译

出现:

[ivy:resolve] WARN: 	::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve] WARN: 	::          UNRESOLVED DEPENDENCIES         ::
[ivy:resolve] WARN: 	::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve] WARN: 	:: commons-httpclient#commons-httpclient;3.1: configuration not found in commons-httpclient#commons-httpclient;3.1: 'master'. It was required from org.apache.nutch#nutch;working@damao-TravelMate-P256-MG default
[ivy:resolve] WARN: 	:: log4j#log4j;1.2.15: configuration not found in log4j#log4j;1.2.15: 'master'. It was required from org.apache.nutch#nutch;working@damao-TravelMate-P256-MG default
[ivy:resolve] WARN: 	::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve] WARN: 	::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve] WARN: 	::              FAILED DOWNLOADS            ::
[ivy:resolve] WARN: 	:: ^ see resolution messages for details  ^ ::
[ivy:resolve] WARN: 	::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve] WARN: 	:: org.mortbay.jetty#jetty;6.1.26!jetty.zip
[ivy:resolve] WARN: 	::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve] 	report for org.apache.nutch#nutch;working@damao-TravelMate-P256-MG default produced in /root/.ivy2/cache/org.apache.nutch-nutch-default.xml
[ivy:resolve] 	resolve done (2940ms resolve - 4576ms download)
[ivy:resolve] 
[ivy:resolve] :: problems summary ::
[ivy:resolve] :::: WARNINGS
[ivy:resolve] 		[FAILED     ] org.mortbay.jetty#jetty;6.1.26!jetty.zip:  (0ms)
[ivy:resolve] 	==== local: tried
[ivy:resolve] 	  /root/.ivy2/local/org.mortbay.jetty/jetty/6.1.26/zips/jetty.zip
[ivy:resolve] 	==== maven2: tried
[ivy:resolve] 	  http://central.maven.org/maven2/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.zip
[ivy:resolve] 	==== sonatype: tried
[ivy:resolve] 	  http://oss.sonatype.org/content/repositories/releases/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.zip

若出现上面FAILED DOWNLOADS,重新ant编译即可

若是maven中央库中确实没有这个包,则需要手动下载放到
/root/.ivy2/local/org.mortbay.jetty/jetty/6.1.26/zips/jetty.zip(具体地址看上述错误信息中的====local:tried部分)

若出现上面UNRESOLVED DEPENDENCIES,首先看已经下载的库中是否有这个包,地址在/root/.ivy2/cache或者/home/用户名/.ivy2/cache下

若是已经下载的库中有这个包,则删除该包,重新ant编译;

若下载的库中没有这个包,需要修改 ${NUTCH_HOME}/ivy/ivy.xml文件,通过定位commons-httpclient发现该包的conf属性为master,

<dependency org="commons-httpclient" name="commons-httpclient"
      rev="3.1" conf="*->master" />
将conf属性修改为default.


参考文章:http://blog.csdn.net/u010317005/article/details/51090175




  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值