hadoop单机,ubuntu

最新推荐文章于 2020-10-31 21:48:04 发布

sky_yangwang

最新推荐文章于 2020-10-31 21:48:04 发布

阅读量5.8k

点赞数

分类专栏： java 文章标签： hadoop ubuntu

本文链接：https://blog.csdn.net/sky_yangwang/article/details/37879173

版权

java 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

文章：

单机：

http://shaurong.blogspot.com/2013/11/hadoop-220-single-cluster-centos-64-x64_7.html

Yarn使用，(hadoop2.3.0，)

https://www.digitalocean.com/community/tutorials/how-to-install-hadoop-on-ubuntu-13-10

~hadoop2.4.1官方单机配置文档，

http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/SingleCluster.html

测试，

http://8liang.cn/hadoop-distributed-install-and-question-2/

http://www.36dsj.com/archives/6118

其它：

Ibm系列文章，有3篇，---第2篇是集群，

http://www.ibm.com/developerworks/cn/linux/l-hadoop-1/

另：

后面的，再使用，则用hadoop的2.4.1版本；

单机：

在Ubuntu Linux上安装Apache Hadoop：

Ubuntu虚机，虚机名称ubuntu5；(vmware下安装ubuntu12.0.4版，64位)

前置，修改机器名：

sudo gedit /etc/hostname

将原有的ubuntu

修改为本机使用的，如：ubuntu5(本机用户名)

sudo gedit /etc/hosts

删除原有的，

# 127.0.0.1 localhost

# 127.0.1.1 ubuntu

新加

192.168.132.132 ubuntu5 （192...本机ip）

重启系统；

1，Jdk安装，已安装，见“ubuntu,需要的工具安装”文档；

1.7，手动安装；

2，安装hadoop：

下载：hadoop-2.2.0.tar.gz

从我的win7复制到ubuntu-主文件夹-downloads下；(直接复制-黏贴)

1），解压hadoop安装文件，

复制：

sudo cp /home/ubuntu1/Downloads/hadoop-2.2.0.tar.gz /usr/local/

解压：

cd /usr/local //进入目录；

sudo tar xzf hadoop-2.2.0.tar.gz //tar解压到当前目录；

改名：

sudo mv hadoop-2.2.0 hadoop

sudo rm -rf hadoop-2.2.0.tar.gz //删除原文件

2），修改配置文件：

将hadoop的二进制目录添加到path变量，(加入文档结尾)

sudo gedit /etc/profile

export HADOOP_HOME=/usr/local/hadoop

export PATH=$HADOOP_HOME/bin:$PATH

source /etc/profile

打开hadoop/conf/hadoop-env.sh文件;

cd /usr/local

sudo gedit hadoop/etc/hadoop/hadoop-env.sh

将export JAVA_HOME=${JAVA_HOME}

修改为export JAVA_HOME=/usr/lib/jdk1.7.0_45

3），配置伪分布模式：

配置下面4个文件，

打开core-site.xml文件;

sudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml

在<configuration></configuration>中加入下面代码，

<name>fs.defaultFS</name>

<value>hdfs://localhost:9000</value>

</property>

打开mapred-site.xml文件;

sudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml.template

在<configuration></configuration>中加入下面代码，

<name>mapreduce.framework.name</name>

</property>

yarn-site.xml

sudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml

在<configuration></configuration>中加入下面代码，

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

创建保存数据的文件夹；

mkdir -p /usr/local/hadoop_store/hdfs/namenode

mkdir -p /usr/local/hadoop_store/hdfs/datanode

hdfs-site.xml

sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml

<name>dfs.replication</name>

</property>

<name>dfs.namenode.name.dir</name>

<value>file:/usr/local/hadoop_store/hdfs/namenode</value>

</property>

<name>dfs.datanode.data.dir</name>

<value>file:/usr/local/hadoop_store/hdfs/datanode</value>

</property>

//dfs.replication是数据需要备份的数量，默认是3，单机版需将此改为1，如果此数大于集群的机器数会出错。

3，安装及配置ssh：

1），

检测本机是否安装ssh；

终端输入：ssh localhost

提示 ssh: connect to host localhost port 22: Connection refused

说明本机未安装ssh；

2），

自动下载安装ssh：（中间是否继续执行，y）

sudo apt-get install openssh-server

查看ssh安装：

ps -e|grep ssh

输出如下，说明安装成功，

2433 ? 00:00:00 ssh-agent

3578 ? 00:00:00 sshd

3），对SSH进行配置，使得用户可以无密码登录本机：

贴换到root用户，操作下面步骤；

sudo su -

生成私钥id_rsa和公钥id_rsa.pub：

ssh-keygen -t rsa -P ‘’ -f ~/.ssh/id_dsa

将公钥添加到authorized_keys中：

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

登录用户都可以操作，

chmod 600 ~/.ssh/authorized_keys

首次使用ssh localhost命令，yes/no处输入yes，

The authenticity of host 'localhost (127.0.0.1)' can't be established.

ECDSA key fingerprint is 9a:1e:39:7b:41:2a:b6:db:5d:4b:e0:e1:6c:5b:c0:66.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.

Welcome to Ubuntu 12.04.3 LTS (GNU/Linux 3.8.0-29-generic x86_64)

* Documentation: https://help.ubuntu.com/

The programs included with the Ubuntu system are free software;

the exact distribution terms for each program are described in the

individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by

applicable law.

exit//登出；

再次使用ssh localhost命令，

显示：

Last login: Wed Jul 9 04:49:49 2014 from localhost

无密码配置成功；

exit 登出；

4，运行hadoop：

root用户下操作，

在首次启动Hadoop之前，我们需要对HDFS进行格式化：

cd /usr/local/hadoop/

sudo bin/hadoop namenode -format

输出大致：

13/10/29 03:19:52 common.Storage: Storage directory ...... has been successfully formatted.

......

SHUTDOWN_MSG: Shutting down NameNode at ubuntu5/192.168.132.132

启动Hadoop，

sudo sbin/start-dfs.sh

检测hadoop是否启动成功，

Jps //3个节点；

sudo sbin/start-yarn.sh

jps //5个节点；

正确；

关闭：

sudo sbin/stop-yarn.sh

sudo sbin/stop-dfs.sh

或使用，

sudo sbin/start-all.sh

sudo sbin/stop-all.sh

jps

浏览器查看：

http://localhost:50070/（MapReduce的Web页面）

·NameNode 'localhost:9000' (active)的界面；

http://localhost:8088

·All Applications的界面；

http://localhost:9000/

·显示如下的一段文字，

It looks like you are making an HTTP request to a Hadoop IPC port. This is not the correct port for the web interface on this daemon.

http://localhost:50090/ secondary namenode

http://localhost:50075/

错误1：

ssh: Could not resolve hostname now.: Name or service not known

因为ubuntu是64位，hadoop是32位的问题，

·跟用户名没关系；

·通过配置变量解决；

解决，

关闭上面的hadoop，并关闭终端；新开终端，进行操作，

hadoop/conf/hadoop-env.sh中加入下面两行代码，(再文档结尾位置)，

export HADOOP_COMMON_LIB_NATIVE_DIR=/usr/local/hadoop/lib/native

export HADOOP_OPTS="-Djava.library.path=/usr/local/hadoop/lib"

贴换到root账户，

重新格式化、启动hadoop，

成功；

错误2：

jps查看节点，发现datanode节点未启动，

日志错误提示，

java.io.IOException: Incompatible clusterIDs in /usr/local/hadoop_store/hdfs/datanode: namenode clusterID = CID-6b93ec07-d44b-46c0-8ade-5845db555f91; datanode clusterID = CID-59abc85f-8ef5-494a-94bf-920993eb252f

从日志中可以看出，原因是因为datanode的clusterID 和 namenode的clusterID 不匹配。

删除namenode、datanode保存的current文件，

root@ubuntu5:~# sudo rm -r /usr/local/hadoop_store/hdfs/namenode/current

root@ubuntu5:~# sudo rm -r /usr/local/hadoop_store/hdfs/datanode/current

然后重新格式化、启动hadoop，解决；

-----------------

二次登录：

切换到root账户；

进入目录，cd /usr/local/hadoop/

启动Hadoop，

sudo sbin/start-dfs.sh

sudo sbin/start-yarn.sh

jps

或，据需要添加：重新配置ssh，hadoop格式化的步骤；

-----------------------------------------------------

hadoop使用示例：统计字数；

cd /usr/local/hadoop/ hadoop启动后，

bin/hadoop dfs -ls /

错误：

运行bin/hadoop dfs -ls /后，报错如下，

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

还是32位、64位点的问题；

解决，编译hadoop源码，参考下列文章，

http://www.ercoppa.org/Linux-Compile-Hadoop-220-fix-Unable-to-load-native-hadoop-library.htm

http://cn.soulmachine.me/blog/20140214/

http://dataheads.wordpress.com/2013/12/10/hadoop-2-setup-on-64-bit-ubuntu-12-04-part-3/comment-page-1/

http://blog.csdn.net/sophistcxf/article/details/23559277

编译hadoop源码，

下载源码版的hadoop，解压，

cd /usr/local

sudo tar xzf hadoop-2.2.0-src.tar.gz

安装下面软件，

sudo apt-get install zlib1g-dev

sudo apt-get install libssl-dev

~安装autotool

sudo apt-get install autoconf automake libtool

sudo apt-get install gcc ---已有；

sudo apt-get install libglib2.0-dev

sudo apt-get install build-essential

sudo apt-get install pkg-config ---已有；

安装maven，---见”maven安装”文档；

3.0.5，手动安装；

安装ant：

sudo tar xzf apache-ant-1.9.4-bin.tar.gz

sudo rm -rf apache-ant-1.9.4-bin.tar.gz

sudo gedit /etc/profile

export ANT_HOME=/usr/local/apache-ant-1.9.4

export PATH=$ANT_HOME/src:$PATH

source /etc/profile

安装protobuf，

http://code.google.com/p/protobuf/

sudo tar xzf protobuf-2.5.0.tar.gz

cd protobuf-2.5.0

sudo ./configure

sudo make

sudo make install

检测，

protoc --version （还是原有的2.4.1），修改，

sudo gedit /etc/profile

export PROTOC_HOME=/usr/local/protobuf-2.5.0

export PATH=$PROTOC_HOME/src:$PATH

source /etc/profile

安装cname,

sudo apt-get install cmake

给源码打一个patch，否则Apache Hadoop Auth编译报错：

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile (default-testCompile) on project hadoop-auth: Compilation failure: Compilation failure:

[ERROR] /usr/local/hadoop-2.2.0-src/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/client/AuthenticatorTestCase.java:[88,11] 错误: 无法访问AbstractLifeCycle

gedit /usr/local/hadoop-2.2.0-src/hadoop-common-project/hadoop-auth/pom.xml

在<dependencies></dependencies>之间，在jetty段之上，添加下面段，

<groupId>org.mortbay.jetty</groupId>

<artifactId>jetty-util</artifactId>

</dependency>

修改maven的镜像，---否则n多东西下载不下来；或太慢；

sudo gedit /usr/local/maven/conf/settings.xml

在<mirrors>标签中增加以下内容：

<id>nexus-osc</id>

<name>Nexus osc</name>

<url>http://maven.oschina.net/content/groups/public/</url>

</mirror>

在<profiles>标签中增加以下内容：

</activation>

<id>nexus</id>

<name>local private nexus</name>

<url>http://maven.oschina.net/content/groups/public/</url>

</releases>

<enabled>false</enabled>

</snapshots>

</repository>

</repositories>

<id>nexus</id>

<name>local private nexus</name>

<url>http://maven.oschina.net/content/groups/public/</url>

</releases>

<enabled>false</enabled>

</snapshots>

</pluginRepository>

</pluginRepositories>

</profile>

编译hadoop，

sudo su - //贴换到root用户，

cd /usr/local/hadoop-2.2.0-src

mvn package -Pdist,native -DskipTests -Dtar

~比较费时间；

BUILD SUCCESS ---耗时20分钟！

替换掉32位的native库：

备份32位的，---备份到自己建立的文件夹；

sudo cp /usr/local/hadoop/lib/native /home/ubuntu5/Downloads/hadoop-native-32-64/32/

备份64位的，

cp -r /usr/local/hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0/lib/native /home/ubuntu5/Downloads/hadoop-native-32-64/64/

删除hadoop二进制文件下的32位native，

sudo rm -rf /usr/local/hadoop/lib/native

拷贝源码编译文件中的native，

cp -r /usr/local/hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0/lib/native /usr/local/hadoop/lib/

删除hadoop-env.sh中，前面步骤加入的

export HADOOP_COMMON_LIB_NATIVE_DIR=/usr/local/hadoop/lib/native

export HADOOP_OPTS="-Djava.library.path=/usr/local/hadoop/lib"

贴换到root账户，

·然后重新配置ssh，

·重新格式化、启动hadoop，

运行bin/hadoop dfs -ls /后，无错误提示，解决；

继续测试：

hadoop使用示例：统计字数；

cd /usr/local/hadoop/ hadoop启动后，

建立文件夹，

root@ubuntu5:/usr/local/hadoop# bin/hdfs dfs -mkdir /user

root@ubuntu5:/usr/local/hadoop# bin/hdfs dfs -mkdir /user/ubuntu5

root@ubuntu5:/usr/local/hadoop# bin/hdfs dfs -mkdir /user/ubuntu5/input

root@ubuntu5:/usr/local/hadoop# bin/hdfs dfs -ls /user/ubuntu5

Found 1 items

drwxr-xr-x - root supergroup 0 2014-07-15 03:45 /user/ubuntu5/input

新终端：---随便目录，建立一个txt文件，用于后续复制；

ubuntu5@ubuntu5:~$ cd /usr/local/hadoop/bin

ubuntu5@ubuntu5:/usr/local/hadoop/bin$ sudo mkdir input

root@ubuntu5:/usr/local/hadoop/bin/input# echo "This is a test" >> test.txt

root@ubuntu5:/usr/local/hadoop/bin/input# cat test.txt

This is a test

主终端，

bin/hdfs dfs -put /usr/local/hadoop/bin/input/test.txt /user/ubuntu5/input

root@ubuntu5:/usr/local/hadoop# bin/hdfs dfs -ls /user/ubuntu5/input

Found 1 items

-rw-r--r-- 1 root supergroup 15 2014-07-15 09:19 /user/ubuntu5/input/test.txt

root@ubuntu5:/usr/local/hadoop# bin/hdfs dfs -cat /user/ubuntu5/input/test.txt

This is a test

(在http://localhost:50070的Browse the filesystem下，也可以查看；)

//统计字数，output文件夹事先不存在；

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /user/ubuntu5/input /user/ubuntu5/output

bin/hdfs dfs -ls /user/ubuntu5/output

Found 2 items

-rw-r--r-- 1 root supergroup 0 2014-07-15 09:32 /user/ubuntu5/output/_SUCCESS

-rw-r--r-- 1 root supergroup 23 2014-07-15 09:32 /user/ubuntu5/output/part-r-00000

bin/hdfs dfs -cat /user/ubuntu5/output/part-r-00000

This 1

a 1

is 1

test 1

示例完成后，可以删除创建的文件，

bin/hdfs dfs -rmr /user //删除整个user文件夹；

另，

机器重启后，/user文件不存在；

sky_yangwang

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录