首先搭建hadoop集群。参看之前的文章:
https://note.youdao.com/share/?token=6B7AD80F6F904C1982B92E03C61B637C&gid=30499526
首先把/hadoop/etc/hadoop下面的几个文件复制到/nutch-2.3.1/conf
core-site.xml
hadoop-env.sh
hbase-site.xml
hdfs-site.xml
masters(没有就新建吧,里面填Hmaster的地址)
slaves
然后把Hbase/lib 下面的*.jar 复制到 hadoop/share/hadoop/mapreduce
vim /nutch-2.3.1/conf/nutch-site.xml
添加
<property>
<name>plugin.folders</name>
<value>/opt/apache-nutch-2.3.1/build/plugins</value>
</property
然后把nutch拷贝到其他机器
可能出现的问题:
Container killed on request. Exit code is 143
然后提示memory 2.7g in 2.1g used
这里是表示内存不够,所以
vim hadoop/etc/hadoop/mapred.site
<property>
<name>mapreduce.map.memory.mb</name>
<value>4096</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>4096</value>
</property>