nutch2.3 mysql教程_在Eclipse中运行Nutch2.3

参考http://wiki.apache.org/nutch/RunNutchInEclipse 一、环境准备 1、下载nutch2.3源代码 wget http://mirror.bit.edu.cn/apache/nutch/2.3/apache-nutch-2.3-src.tar.gz 或者下载正在开发中的最新版本 svn co https://svn.apache.org/repos/asf/nutch/bra

参考http://wiki.apache.org/nutch/RunNutchInEclipse

一、环境准备

1、下载nutch2.3源代码

wget http://mirror.bit.edu.cn/apache/nutch/2.3/apache-nutch-2.3-src.tar.gz或者下载正在开发中的最新版本

svn co https://svn.apache.org/repos/asf/nutch/branches/2.x

2、选择使用的数据库类型,以hbase为例

在conf/nutch-site.xml中增加以下属性:

storage.data.store.class

org.apache.gora.hbase.store.HBaseStore

Default class for storing data

3、在ivy/ivy.xml中增加与hbase相关的依赖项,此项本已存在,但被注释掉,将注释去掉即可

http.agent.name

My Nutch Spider

http.robots.agents

none

plugin.folders

/Users/liaoliuqing/0_Search/1_Nutch/1_Official/apache-nutch-2.3/build/plugins其中plugin.folders的值为$NUTCH_HOME/build/plugins

5、执行ant eclipse

二、导入project

1、导入project

test.jsp?url=http%3A%2F%2Fimg.blog.csdn.net%2F20150128164608750%3Fwatermark%2F2%2Ftext%2FaHR0cDovL2Jsb2cuY3Nkbi5uZXQvamVkaWFlbF9sdQ%3D%3D%2Ffont%2F5a6L5L2T%2Ffontsize%2F400%2Ffill%2FI0JBQkFCMA%3D%3D%2Fdissolve%2F70%2Fgravity%2FCenter&refer=http%3A%2F%2Fblog.csdn.net%2Fjediael_lu%2Farticle%2Fdetails%2F43232449

2、在build path中,将apche-nutch-2.3/conf放到最上面,即点击top按键

test.jsp?url=http%3A%2F%2Fimg.blog.csdn.net%2F20150128164653807%3Fwatermark%2F2%2Ftext%2FaHR0cDovL2Jsb2cuY3Nkbi5uZXQvamVkaWFlbF9sdQ%3D%3D%2Ffont%2F5a6L5L2T%2Ffontsize%2F400%2Ffill%2FI0JBQkFCMA%3D%3D%2Fdissolve%2F70%2Fgravity%2FCenter&refer=http%3A%2F%2Fblog.csdn.net%2Fjediael_lu%2Farticle%2Fdetails%2F43232449

三、运行程序

1、Run as ----> Run configuration,选择project与主类

test.jsp?url=http%3A%2F%2Fimg.blog.csdn.net%2F20150128164757455%3Fwatermark%2F2%2Ftext%2FaHR0cDovL2Jsb2cuY3Nkbi5uZXQvamVkaWFlbF9sdQ%3D%3D%2Ffont%2F5a6L5L2T%2Ffontsize%2F400%2Ffill%2FI0JBQkFCMA%3D%3D%2Fdissolve%2F70%2Fgravity%2FCenter&refer=http%3A%2F%2Fblog.csdn.net%2Fjediael_lu%2Farticle%2Fdetails%2F43232449

2、填写参数

/Users/liaoliuqing/Downloads/seed.txt

-Dhadoop.log.dir=logs -Dhadoop.log.file=hadoop.log

test.jsp?url=http%3A%2F%2Fimg.blog.csdn.net%2F20150128164704789%3Fwatermark%2F2%2Ftext%2FaHR0cDovL2Jsb2cuY3Nkbi5uZXQvamVkaWFlbF9sdQ%3D%3D%2Ffont%2F5a6L5L2T%2Ffontsize%2F400%2Ffill%2FI0JBQkFCMA%3D%3D%2Fdissolve%2F70%2Fgravity%2FCenter&refer=http%3A%2F%2Fblog.csdn.net%2Fjediael_lu%2Farticle%2Fdetails%2F43232449

3、点击run,输出结果如下:

InjectorJob: starting at 2015-01-28 16:27:43

InjectorJob: Injecting urlDir: /Users/liaoliuqing/Downloads/seed.txt

InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora storage class.

InjectorJob: total number of urls rejected by filters: 0

InjectorJob: total number of urls injected after normalization and filtering: 1

Injector: finished at 2015-01-28 16:27:47, elapsed: 00:00:04

注意,在运行程序前,本机需要先启动hbase。

4、查看hbase中的数据

hbase(main):003:0> scan 'webpage'

ROW COLUMN+CELL

com.163.www:http/ column=f:fi, timestamp=1422433667377, value=\x00'\x8D\x00

com.163.www:http/ column=f:ts, timestamp=1422433667377, value=\x00\x00\x01K/\xA7:\x14

com.163.www:http/ column=mk:_injmrk_, timestamp=1422433667377, value=y

com.163.www:http/ column=mk:dist, timestamp=1422433667377, value=0

com.163.www:http/ column=mtdt:_csh_, timestamp=1422433667377, value=?\x80\x00\x00

com.163.www:http/ column=s:s, timestamp=1422433667377, value=?\x80\x00\x00

1 row(s) in 0.2970 seconds

f68f2add0b68e4f9810432fce46917b7.png

本文原创发布php中文网,转载请注明出处,感谢您的尊重!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值