文章:
单机:
http://shaurong.blogspot.com/2013/11/hadoop-220-single-cluster-centos-64-x64_7.html
Yarn使用,(hadoop2.3.0,)
https://www.digitalocean.com/community/tutorials/how-to-install-hadoop-on-ubuntu-13-10
~hadoop2.4.1官方单机配置文档,
http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/SingleCluster.html
测试,
http://8liang.cn/hadoop-distributed-install-and-question-2/
http://www.36dsj.com/archives/6118
其它:
Ibm系列文章,有3篇,---第2篇是集群,
http://www.ibm.com/developerworks/cn/linux/l-hadoop-1/
另:
后面的,再使用,则用hadoop的2.4.1版本;
//
单机:
在Ubuntu Linux上安装Apache Hadoop:
Ubuntu虚机,虚机名称ubuntu5;(vmware下安装ubuntu12.0.4版,64位)
前置,修改机器名:
sudo gedit /etc/hostname
将原有的ubuntu
修改为本机使用的,如:ubuntu5(本机用户名)
~
sudo gedit /etc/hosts
删除原有的,
# 127.0.0.1 localhost
# 127.0.1.1 ubuntu
新加
192.168.132.132 ubuntu5 (192...本机ip)
~
重启系统;
1,Jdk安装,已安装,见“ubuntu,需要的工具安装”文档;
1.7,手动安装;
2,安装hadoop:
下载:hadoop-2.2.0.tar.gz
从我的win7复制到ubuntu-主文件夹-downloads下;(直接复制-黏贴)
1),解压hadoop安装文件,
复制:
sudo cp /home/ubuntu1/Downloads/hadoop-2.2.0.tar.gz /usr/local/
解压:
cd /usr/local //进入目录;
sudo tar xzf hadoop-2.2.0.tar.gz //tar解压到当前目录;
改名:
sudo mv hadoop-2.2.0 hadoop
sudo rm -rf hadoop-2.2.0.tar.gz //删除原文件
2),修改配置文件:
将hadoop的二进制目录添加到path变量,(加入文档结尾)
sudo gedit /etc/profile
export HADOOP_HOME=/usr/local/hadoop
export PATH=$HADOOP_HOME/bin:$PATH
source /etc/profile
打开hadoop/conf/hadoop-env.sh文件;
cd /usr/local
sudo gedit hadoop/etc/hadoop/hadoop-env.sh
将export JAVA_HOME=${JAVA_HOME}
修改为export JAVA_HOME=/usr/lib/jdk1.7.0_45
3),配置伪分布模式:
配置下面4个文件,
打开core-site.xml文件;
sudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml
在<configuration></configuration>中加入下面代码,
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
打开mapred-site.xml文件;
sudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml.template
在<configuration></configuration>中加入下面代码,
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
yarn-site.xml
sudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml
在<configuration></configuration>中加入下面代码,
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
创建保存数据的文件夹;
mkdir -p /usr/local/hadoop_store/hdfs/namenode
mkdir -p /usr/local/hadoop_store/hdfs/datanode
hdfs-site.xml
sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
//dfs.replication是数据需要备份的数量,默认是3,单机版需将此改为1,如果此数大于集群的机器数会出错。
3,安装及配置ssh:
1),
检测本机是否安装ssh;
终端输入:ssh localhost
提示 ssh: connect to host localhost port 22: Connection refused
说明本机未安装ssh;
2),
自动下载安装ssh:(中间是否继续执行,y)
sudo apt-get install openssh-server
查看ssh安装:
ps -e|grep ssh
输出如下,说明安装成功,
2433 ? 00:00:00 ssh-agent
3578 ? 00:00:00 sshd
3),对SSH进行配置,使得用户可以无密码登录本机:
贴换到root用户,操作下面步骤;
sudo su -
生成私钥id_rsa和公钥id_rsa.pub:
ssh-keygen -t rsa -P ‘’ -f ~/.ssh/id_dsa
~
将公钥添加到authorized_keys中:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
~
登录用户都可以操作,
chmod 600 ~/.ssh/authorized_keys
~
首次使用ssh localhost命令,yes/no处输入yes,
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is 9a:1e:39:7b:41:2a:b6:db:5d:4b:e0:e1:6c:5b:c0:66.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 12.04.3 LTS (GNU/Linux 3.8.0-29-generic x86_64)
* Documentation: https://help.ubuntu.com/
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
~
exit//登出;
再次使用ssh localhost命令,
显示:
Last login: Wed Jul 9 04:49:49 2014 from localhost
无密码配置成功;
exit 登出;
4,运行hadoop:
root用户下操作,
在首次启动Hadoop之前,我们需要对HDFS进行格式化:
cd /usr/local/hadoop/
sudo bin/hadoop namenode -format
输出大致:
13/10/29 03:19:52 common.Storage: Storage directory ...... has been successfully formatted.
......
SHUTDOWN_MSG: Shutting down NameNode at ubuntu5/192.168.132.132
启动Hadoop,
sudo sbin/start-dfs.sh
检测hadoop是否启动成功,
Jps //3个节点;
sudo sbin/start-yarn.sh
jps //5个节点;
~
正确;
关闭:
sudo sbin/stop-yarn.sh
sudo sbin/stop-dfs.sh
或使用,
sudo sbin/start-all.sh
sudo sbin/stop-all.sh
jps
浏览器查看:
http://localhost:50070/(MapReduce的Web页面)
·NameNode 'localhost:9000' (active)的界面;
·All Applications的界面;
·显示如下的一段文字,
It looks like you are making an HTTP request to a Hadoop IPC port. This is not the correct port for the web interface on this daemon.
http://localhost:50090/ secondary namenode
错误1:
ssh: Could not resolve hostname now.: Name or service not known
因为ubuntu是64位,hadoop是32位的问题,
·跟用户名没关系;
·通过配置变量解决;
解决,
关闭上面的hadoop,并关闭终端;新开终端,进行操作,
hadoop/conf/hadoop-env.sh中加入下面两行代码,(再文档结尾位置),
export HADOOP_COMMON_LIB_NATIVE_DIR=/usr/local/hadoop/lib/native
export HADOOP_OPTS="-Djava.library.path=/usr/local/hadoop/lib"
~
贴换到root账户,
重新格式化、启动hadoop,
~
成功;
错误2:
jps查看节点,发现datanode节点未启动,
日志错误提示,
java.io.IOException: Incompatible clusterIDs in /usr/local/hadoop_store/hdfs/datanode: namenode clusterID = CID-6b93ec07-d44b-46c0-8ade-5845db555f91; datanode clusterID = CID-59abc85f-8ef5-494a-94bf-920993eb252f
从日志中可以看出,原因是因为datanode的clusterID 和 namenode的clusterID 不匹配。
删除namenode、datanode保存的current文件,
root@ubuntu5:~# sudo rm -r /usr/local/hadoop_store/hdfs/namenode/current
root@ubuntu5:~# sudo rm -r /usr/local/hadoop_store/hdfs/datanode/current
~
然后重新格式化、启动hadoop,解决;
-----------------
二次登录:
切换到root账户;
进入目录,cd /usr/local/hadoop/
启动Hadoop,
sudo sbin/start-dfs.sh
sudo sbin/start-yarn.sh
jps
~
或,据需要添加:重新配置ssh,hadoop格式化的步骤;
-----------------------------------------------------
hadoop使用示例:统计字数;
cd /usr/local/hadoop/ hadoop启动后,
bin/hadoop dfs -ls /
错误:
运行bin/hadoop dfs -ls /后,报错如下,
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
还是32位、64位点的问题;
解决,编译hadoop源码,参考下列文章,
http://www.ercoppa.org/Linux-Compile-Hadoop-220-fix-Unable-to-load-native-hadoop-library.htm
http://cn.soulmachine.me/blog/20140214/
http://blog.csdn.net/sophistcxf/article/details/23559277
编译hadoop源码,
下载源码版的hadoop,解压,
cd /usr/local
sudo tar xzf hadoop-2.2.0-src.tar.gz
安装下面软件,
sudo apt-get install zlib1g-dev
sudo apt-get install libssl-dev
~安装autotool
sudo apt-get install autoconf automake libtool
~
sudo apt-get install gcc ---已有;
sudo apt-get install libglib2.0-dev
~
sudo apt-get install build-essential
sudo apt-get install pkg-config ---已有;
安装maven,---见”maven安装”文档;
3.0.5,手动安装;
安装ant:
sudo tar xzf apache-ant-1.9.4-bin.tar.gz
sudo rm -rf apache-ant-1.9.4-bin.tar.gz
sudo gedit /etc/profile
export ANT_HOME=/usr/local/apache-ant-1.9.4
export PATH=$ANT_HOME/src:$PATH
source /etc/profile
安装protobuf,
http://code.google.com/p/protobuf/
sudo tar xzf protobuf-2.5.0.tar.gz
cd protobuf-2.5.0
sudo ./configure
sudo make
sudo make install
~
检测,
protoc --version (还是原有的2.4.1),修改,
sudo gedit /etc/profile
export PROTOC_HOME=/usr/local/protobuf-2.5.0
export PATH=$PROTOC_HOME/src:$PATH
source /etc/profile
安装cname,
sudo apt-get install cmake
给源码打一个patch,否则Apache Hadoop Auth编译报错:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile (default-testCompile) on project hadoop-auth: Compilation failure: Compilation failure:
[ERROR] /usr/local/hadoop-2.2.0-src/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/client/AuthenticatorTestCase.java:[88,11] 错误: 无法访问AbstractLifeCycle
~
gedit /usr/local/hadoop-2.2.0-src/hadoop-common-project/hadoop-auth/pom.xml
在<dependencies></dependencies>之间,在jetty段之上,添加下面段,
<dependency>
<groupId>org.mortbay.jetty</groupId>
<artifactId>jetty-util</artifactId>
<scope>test</scope>
</dependency>
修改maven的镜像,---否则n多东西下载不下来;或太慢;
sudo gedit /usr/local/maven/conf/settings.xml
·
在<mirrors>标签中增加以下内容:
<mirror>
<id>nexus-osc</id>
<mirrorOf>*</mirrorOf>
<name>Nexus osc</name>
<url>http://maven.oschina.net/content/groups/public/</url>
</mirror>
~
在<profiles>标签中增加以下内容:
<profile>
<id>jdk-1.7</id>
<activation>
<jdk>1.7</jdk>
</activation>
<repositories>
<repository>
<id>nexus</id>
<name>local private nexus</name>
<url>http://maven.oschina.net/content/groups/public/</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>nexus</id>
<name>local private nexus</name>
<url>http://maven.oschina.net/content/groups/public/</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</pluginRepository>
</pluginRepositories>
</profile>
编译hadoop,
sudo su - //贴换到root用户,
cd /usr/local/hadoop-2.2.0-src
mvn package -Pdist,native -DskipTests -Dtar
~比较费时间;
BUILD SUCCESS ---耗时20分钟!
替换掉32位的native库:
备份32位的,---备份到自己建立的文件夹;
sudo cp /usr/local/hadoop/lib/native /home/ubuntu5/Downloads/hadoop-native-32-64/32/
备份64位的,
cp -r /usr/local/hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0/lib/native /home/ubuntu5/Downloads/hadoop-native-32-64/64/
~
删除hadoop二进制文件下的32位native,
sudo rm -rf /usr/local/hadoop/lib/native
拷贝源码编译文件中的native,
cp -r /usr/local/hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0/lib/native /usr/local/hadoop/lib/
删除hadoop-env.sh中,前面步骤加入的
export HADOOP_COMMON_LIB_NATIVE_DIR=/usr/local/hadoop/lib/native
export HADOOP_OPTS="-Djava.library.path=/usr/local/hadoop/lib"
~
贴换到root账户,
·然后重新配置ssh,
·重新格式化、启动hadoop,
运行bin/hadoop dfs -ls /后,无错误提示,解决;
继续测试:
hadoop使用示例:统计字数;
cd /usr/local/hadoop/ hadoop启动后,
~
建立文件夹,
root@ubuntu5:/usr/local/hadoop# bin/hdfs dfs -mkdir /user
root@ubuntu5:/usr/local/hadoop# bin/hdfs dfs -mkdir /user/ubuntu5
root@ubuntu5:/usr/local/hadoop# bin/hdfs dfs -mkdir /user/ubuntu5/input
root@ubuntu5:/usr/local/hadoop# bin/hdfs dfs -ls /user/ubuntu5
Found 1 items
drwxr-xr-x - root supergroup 0 2014-07-15 03:45 /user/ubuntu5/input
~
新终端:---随便目录,建立一个txt文件,用于后续复制;
ubuntu5@ubuntu5:~$ cd /usr/local/hadoop/bin
ubuntu5@ubuntu5:/usr/local/hadoop/bin$ sudo mkdir input
root@ubuntu5:/usr/local/hadoop/bin/input# echo "This is a test" >> test.txt
root@ubuntu5:/usr/local/hadoop/bin/input# cat test.txt
This is a test
~
主终端,
bin/hdfs dfs -put /usr/local/hadoop/bin/input/test.txt /user/ubuntu5/input
root@ubuntu5:/usr/local/hadoop# bin/hdfs dfs -ls /user/ubuntu5/input
Found 1 items
-rw-r--r-- 1 root supergroup 15 2014-07-15 09:19 /user/ubuntu5/input/test.txt
root@ubuntu5:/usr/local/hadoop# bin/hdfs dfs -cat /user/ubuntu5/input/test.txt
This is a test
(在http://localhost:50070的Browse the filesystem下,也可以查看;)
//统计字数,output文件夹事先不存在;
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /user/ubuntu5/input /user/ubuntu5/output
bin/hdfs dfs -ls /user/ubuntu5/output
Found 2 items
-rw-r--r-- 1 root supergroup 0 2014-07-15 09:32 /user/ubuntu5/output/_SUCCESS
-rw-r--r-- 1 root supergroup 23 2014-07-15 09:32 /user/ubuntu5/output/part-r-00000
bin/hdfs dfs -cat /user/ubuntu5/output/part-r-00000
This 1
a 1
is 1
test 1
示例完成后,可以删除创建的文件,
bin/hdfs dfs -rmr /user //删除整个user文件夹;
另,
机器重启后,/user文件不存在;