Nutch2.x+Hadoop 2.5.2+Hbase0.94.26(续2)

1.执行bin/nutch generate -topN 5 -crawlId tieba的时候,出现以下错误

java.lang.Exception: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to org.apache.gora.persistency.Persistent

        at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)

        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)

Caused by: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to org.apache.gora.persistency.Persistent

        at org.apache.gora.mapreduce.PersistentDeserializer.deserialize(PersistentDeserializer.java:71)

        at org.apache.gora.mapreduce.PersistentDeserializer.deserialize(PersistentDeserializer.java:35)

        at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:146)

        at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121)

        at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:302)

        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170)

        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)

        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)

        at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

        at java.util.concurrent.FutureTask.run(FutureTask.java:266)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

初步怀疑是avrò的版本不匹配,把avrò从1.7.7降级到1.7.6问题依然存在。然后发现执行nutch的时候,classpath里面调用的都是hadoop 2.5.2的jar,而在hadoop-2.5.2/share/hadoop/common/lib/下,avro的版本是1.7.4,把1.7.7版本替换进去,问题解决

2.执行bin/nutch fetch 1421804965-1372033824 -crawlId tieba -threads 50,其中1421804965-1372033824为在hbase shell中执行 get 'tieba_webpage','com.baidu.tieba:http/' 所得f:bid timestamp=1421804970851, value=1421804965-1372033824

此时报错,No agents listed in 'http.agent.name' property

修改nutch-default.properties中的 <name>http.agent.name</name>部分,添加任意字符串

转载于:https://www.cnblogs.com/mactech/p/4239163.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值