Hadoop学习---- Mac OSX下Hadoop 2.3.0安装及配置

3 篇文章 0 订阅
1 篇文章 0 订阅

运行环境

  • 操作系统:OS X 10.9.2
  • Hadoop版本:2.3.0

准备工作

安装Java

我用的1.7版本,可以在 这里 进行下载。

下载好解压缩安装之后,需要对Java环境变量(我都是直接改的~/.bash_profile)进行配置。貌似OS X下的配置比较恶心,网上(Mac OS 上设置 JAVA_HOME)比较推荐的做法是

export JAVA_HOME=`/usr/libexec/java_home` 

设置SSH生成密钥

单节点伪集群部署时需要本机ssh连通,生成密钥方法为:

ssh-keygen -t rsa
cat .ssh/id_rsa.pub >>.ssh/authorized_keys

如果之前本机没有开启SSH服务,需要勾上“系统偏好设置->共享->远程登录”设置项。

下载Hadoop 2.3.0

在 官网 下载。网上建议的方法是新建一个用户专门用来进行Hadoop环境的配置和管理,偷懒的就在当前用户目录下找个地方解压。

以下是环境变量的配置,后文中都用$HADOOP_HOME表示hadoop根目录。

# hadoop
export HADOOP_HOME=~/Environment/hadoop-2.3.0
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

这个时候应该是可以使用hadoop命令了,可以用以下命令测试一下:

[ 502 ~ ]$hadoop version
Hadoop 2.3.0
Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1567123
Compiled by jenkins on 2014-02-11T13:40Z
Compiled with protoc 2.5.0
From source with checksum dfe46336fbc6a044bc124392ec06b85
This command was run using /Users/chenshijiang/Environment/hadoop-2.3.0/share/hadoop/common/hadoop-common-2.3.0.jar

Hadoop配置项

在使用Hadoop之前,需要对一些配置文件进行修改,Hadoop 2.3.0的配置文件都保存在$HADOOP_HOME/etc/hadoop文件夹下。以下直接列出几个配置文件的修改方法。

core-site.xml

<configuration> 
	<property> 
		<name>fs.default.name</name> 
		<value>hdfs://localhost:9000</value> 
	</property> 
	<property> 
		<name>hadoop.tmp.dir</name> 
		<value>/Users/your username/Environment/hadoop-2.3.0/tmp</value> 
		<description>A base for other temporary directories.</description> 
	</property> 
</configuration> </span>

这里需要注意的是“hadoop.tmp.dir”的配置,这是为了解决Hadoop namenode无法启动的问题。

hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/Users/username/Environment/hadoop-2.3.0/hdfs/namenode</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/Users/username/Environment/hadoop-2.3.0/hdfs/datanode</value>
    </property>
</configuration>

这里需要注意的是配置之前需要提前建好namenode和datanode相应的目录。

mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

yarn-site.xml

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>40960</value>
    </property>
</configuration>

至此,基本完成所有准备工作。

Hadoop试用

启动Hadoop

之前的Hadoop版本中,可以使用start-all.sh启动Hadoop,现在这种做法已经不赞同使用了。

[ 503 ~ ]$start-all.sh 
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh

按输出建议的那样,我们依次启动HDFS和YARN,每次启动之后可以运行jps观察已经启动的服务:

[ 505 ~ ]$start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /log/path
localhost: starting datanode, logging to /log/path
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /log/path
[ 506 ~ ]$jps
27592 Jps
27310 NameNode
27519 SecondaryNameNode
27405 DataNode

[ 507 ~ ]$start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /log/path
localhost: starting nodemanager, logging to /log/path
[ 508 ~ ]$jps
27737 NodeManager
27777 Jps
27640 ResourceManager
27310 NameNode
27519 SecondaryNameNode
27405 DataNode
此时,Hadoop已经启动,用浏览器打开localhost:50070和localhost:8088,可以分别看到HDFS和YARN的管理页面。

我们也可以跑个Job试试了。

Job测试

可以运行一下命令执行一个简单的Hadoop Job:

[ 510 ~ ]$cd $HADOOP_HOME
[ 511 ~/Environment/hadoop-2.3.0 ]$hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar pi 10 5
#log太长就不贴了

或者先运行以下命令传一个文件到HDFS上:

[513 ~/Environment/hadoop-2.3.0 ]$hadoop fs -mkdir hdfs://localhost:9000/user/
[514 ~/Environment/hadoop-2.3.0 ]$hadoop fs -mkdir hdfs://localhost:9000/user/username
[517 ~/Environment/hadoop-2.3.0 ]$hadoop fs -copyFromLocal README.txt hdfs://localhost:9000/user/username/readme.txt

然后跑一个用该文件的Job:

[ 518 ~/Environment/hadoop-2.3.0 ]$hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar wordcount readme.txt out

*更多hadoop使用方法请执行hadoop查看帮助或者自行Google

跑完以上两个Job后可以在YARN管理界面中查看。 Job 结果

停止Hadoop

停止Hadoop的操作步骤和启动类似,把start-.sh换成stop-.sh就可以了。

[ 521 ~/Environment/hadoop-2.3.0 ]$stop-yarn.sh
stopping yarn daemons
stopping resourcemanager
localhost: stopping nodemanager
localhost: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop
[ 522 ~/Environment/hadoop-2.3.0 ]$stop-dfs.sh
Stopping namenodes on [localhost]
localhost: stopping namenode
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
跑Job可能遇到的问题
14/03/21 13:49:11 INFO mapreduce.Job: Job job_1395379328591_0005 failed with state FAILED due to: Application application_1395379328591_0005 failed 2 times due to AM Container for appattempt_1395379328591_0005_000002 exited with  exitCode: 127 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
    at org.apache.hadoop.util.Shell.run(Shell.java:418)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:722)

这是因为YARN使用的JAVA_HOME和系统使用的不一致,解决方法为:

[ 583 ~/Environment/hadoop-2.3.0 ]$sudo ln -s /usr/bin/java /bin/java 
Password: 

其他问题

1、namenode未初始化。

14/03/21 13:49:11 INFO mapreduce.Job: Job job_1395379328591_0005 failed with state FAILED due to: Application application_1395379328591_0005 failed 2 times due to AM Container for appattempt_1395379328591_0005_000002 exited with  exitCode: 127 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
    at org.apache.hadoop.util.Shell.run(Shell.java:418)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:722)

执行初始化namenode:

hadoop namenode -format
 

参考资料

官网教程:http://hadoop.apache.org/docs/r1.0.4/cn/quickstart.html

Hadoop namenode无法启动
http://stackoverflow.com/questions/20390217/mapreduce-job-in-headless-environment-fails-n-times-due-to-am-container-exceptio

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值