hadoop环境搭建
注:三个重要网址
hadoop下载地址:https://archive.apache.org/dist/hadoop/common/hadoop-2.5.0/
hadoop官方网址:hadoop.apache.org
hadoop环境搭建网址:https://hadoop.apache.org/docs/r2.7.4/hadoop-project-dist/hadoop-common/SingleCluster.html
Hadoop 2.x 部署
* Local Mode
* Distributed Mode
* 伪分布式
一台机器,运行所有的守护进程,
从节点DataNode、NodeManager
* 完全分布式
有多个从节点
DataNodes
NodeManagers
配置文件
$HADOOP_HOME/etc/hadoop/slaves
-
下载软件:jdk1.8、hadoop2.5.0(都是Linux版本的根据Linux的位数选择相应的软件位数)
-
把软件通过FileZilla传到Linux /apps下
-
安装jdk
环境变量的配置(Linux下命令) vi /etc/profile 在最后一行加入 export JAVA_HOME=/apps/java/jdk export PATH=$PATH:$JAVA_HOME/bin 退出,执行source /etc/profile
-
安装hadoop 2.5.0对应jdk1.7或者以上
-
三台机器的配置
主机 | 配置 |
---|---|
192.168.0.129 | CnetOS1(主机名)&1.5G&1 CPU |
192.168.0.130 | CnetOS2(主机名)&1G&1 CPU |
192.168.0.131 | CnetOS3(主机名)&1G&1 CPU |
-
配置映射
/etc/hosts(包括windows本机的hosts位置:C:\Windows\System32\drivers\etc) 192.168.0.129 www.shy1.com CentOS1 192.168.0.130 www.shy2.com CentOS2 192.168.0.131 www.shy3.com CentOS3
-
分布式集群在三台机器上的部署
CentOS1 CentOS2 CentOS3 HDFS NameNode DataNode DataNode DataNode SecondaryNameNode YARN ResourceManager NodeManager NodeManager NodeManager MapReduce JobHistoryServer
-
配置文件
- hdfs
-
hadoop-env.sh
export JAVA_HOME=/apps/java/jdk1.8 -
core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://www.shy1.com:8020</value> </property> <property> <name>hadoop-tmp.dir</name> <value>/apps/hadoop/hadoop-2.5.0/data/tmp</value> </property> <property> <name>fs.trash.interval</name> <value>420</value> </property> </configuration>
-
hdfs-site.xml
<configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>www.shy3.com:50090</value> </property> </configuration>
-
slaves
CentOS2 CentOS3 CentOS4
-
- yarn
-
yarn-env.sh
export JAVA_HOME=/apps/java/jdk1.8 -
yarn-site.xml
<configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>CentOS3</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>4096</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>4</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>640800</value> </property> </configuration>
-
slaves
CentOS2 CentOS3 CentOS4
-
- mapredue
-
mapred-env.sh
export JAVA_HOME=/apps/java/jdk1.8 -
mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>www.shy1.com:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>www.shy1.com:19888</value> </property> <property> <name>mapreduce.jobhistory.done-dir</name> <value>/history/done</value> </property> <property> <name>mapreudce.jobhistory.intermediate.done-dir</name> <value>/history/done/done_intermediate</value> </property> </configuration>
-
- hdfs
9.把这个hadoop发送到另外的两台机器上去环境搭建完成
10.测试集群
* 基本测试
启动服务:(在hadoop目录下 )
先启动hdfs文件系统
sbin/start-dfs.sh
在启动yarn(这个是在第二台机器上启动的,也就是resourcemanager所在的机器)
sbin/start-yarn.sh
最后启动jobhistoryserver
sbin/mr-jobhistory-daemon.sh start historyserver
jps看一下进程是否都已经启动起来了,如果没有就手动进行启动
打开浏览器输入配置了namenode节点的地址和50070端口号进行访问。如果不成功请关闭所有虚拟机的防火墙
服务启动,是否可用,简单的应用
* hdfs
读写操作
bin/hdfs dfs -mkdir -p /user/beifeng/tmp/conf
bin/hdfs dfs -put etc/hadoop/*-site.xml /user/beifeng/tmp/conf
bin/hdfs dfs -text /user/beifeng/tmp/conf/core-site.xml
* yarn
run jar
* mapreduce
bin/yarn jar share/hadoop/mapreduce/hadoop*example*.jar wordcount /user/beifeng/mapreuce/wordcount/input /user/beieng/mapreduce/wordcount/output
启动方式
* 各个服务组件逐一启动
* hdfs
hadoop-daemon.sh start|stop namenode|datanode|secondarynamenode
* yarn
yarn-daemon.sh start|stop resourcemanager|nodemanager
* mapreduce
mr-historyserver-daemon.sh start|stop historyserver
* 各个模块分开启动
* hdfs
start-dfs.sh
stop-dfs.sh
* yarn
start-yarn.sh
stop-yarn.sh
* 全部启动
* start-all.sh
* stop-all.sh