大数据学习5:hdfs和yarn 的学习记录
=======================
一、hdfs启动过程的解析
二、hdfs配置参数
三、yarn资源调度配置
四、hdfs使用,yarn的任务检查
五、配置过程中的检查
========================
一、hdfs启动过程的解析
在伪分布式hadoop部署中,启动hdfs
[hadoop@hadoop001 sbin]$ ./start-dfs.sh
Startingnamenodes on [hadoop001]
hadoop001:starting namenode, logging to/opt/software/hadoop-2.8.1/logs/hadoop-hadoop-namenode-hadoop001.out
localhost:starting datanode, logging to/opt/software/hadoop-2.8.1/logs/hadoop-hadoop-datanode-hadoop001.out
Startingsecondary namenodes [0.0.0.0]
0.0.0.0:starting secondarynamenode, logging to/opt/software/hadoop-2.8.1/logs/hadoop-hadoop-secondarynamenode-hadoop001.out
在启动过程中,会发现用到了3种地址,由于是伪分布式,这3个地址应该为同一个地址,如何调整呢?
[hadoop@hadoop001sbin]$ vi start-dfs.sh
#---------------------------------------------------------
# namenodes
NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -namenodes)
echo"Starting namenodes on [$NAMENODES]"
"$HADOOP_PREFIX/sbin/hadoop-daemons.sh"\
--config "$HADOOP_CONF_DIR" \
--hostnames "$NAMENODES" \
--script "$bin/hdfs" start namenode$nameStartOpt
#---------------------------------------------------------
# datanodes (using default slaves file)
if [ -n"$HADOOP_SECURE_DN_USER" ]; then
echo \
"Attempting to start secure cluster,skipping datanodes. " \
"Run start-secure-dns.sh as root tocomplete startup."
else
"$HADOOP_PREFIX/sbin/hadoop-daemons.sh"\
--config "$HADOOP_CONF_DIR" \
--script "$bin/hdfs" startdatanode $dataStartOpt
fi
#---------------------------------------------------------
# secondary namenodes (if any)
SECONDARY_NAMENODES=$($HADOOP_PREFIX/bin/hdfsgetconf -secondarynamenodes 2>/dev/null)
if [ -n"$SECONDARY_NAMENODES" ]; then
echo "Starting secondary namenodes[$SECONDARY_NAMENODES]"
"$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
--config "$HADOOP_CONF_DIR" \
--hostnames"$SECONDARY_NAMENODES" \
--script "$bin/hdfs" startsecondarynamenode
fi
#---------------------------------------------------------
有此可见,在启动脚本中,打印出来的几个地址就是脚本中获得的。
namenode 为 $HADOOP_PREFIX/bin/hdfsgetconf –namenodes
datanode 为 /opt/software/hadoop/etc/hadoop/slaves
secondary namenode 为默认的配置,在hdfs-site.xml中要手动配置,可以在hadoop.apache.org-> document-> stable中,右侧最下方几个默认参数xml列表中找到默认配置。
dfs.namenode.secondary.http-address 0.0.0.0:50090 Thesecondary namenode http server address and port.
dfs.namenode.secondary.https-address 0.0.0.0:50091 Thesecondary namenode HTTPS server address and port.
这里手动配置hdfs-site.xml
[hadoop@hadoop001hadoop]$ vi hdfs-site.xml ,在configuration中添加
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>192.168.137.11:50090</value>
</property>
<property>
<name>dfs.namenode.secondary.https-address</name>
<value>192.168.137.11:50091</value>
</property>
</configuration>
重新启动(stop-dfs.sh,start-dfs.sh)后为:
[hadoop@hadoop001sbin]$ ./start-dfs.sh
Startingnamenodes on [hadoop001]
hadoop001:starting namenode, logging to/opt/software/hadoop-2.8.1/logs/hadoop-hadoop-namenode-hadoop001.out
hadoop001:starting datanode, logging to/opt/software/hadoop-2.8.1/logs/hadoop-hadoop-datanode-hadoop001.out
Startingsecondary namenodes [hadoop001]
hadoop001:starting secondarynamenode, logging to/opt/software/hadoop-2.8.1/logs/hadoop-hadoop-secondarynamenode-hadoop001.out
登陆http://192.168.137.11:50070/ 可检查hdfs 情况
二、 hdfs配置参数
在hadoop.apache.org->document-> stable中,右侧最下方几个默认参数xml列表中找到默认配置。
把对应的参数配置到对应的xml文件中即可。格式如下:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>192.168.137.11:50090</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>192.168.137.11:50091</value>
</property>
</configuration>
三、 yarn资源调度配置
在hadoop.apache.org中,找到yarn配置部分,按照官方文档进行配置。
[这里注意,可能只有mapred-site.xml.template,yarn-site.xml.template文件,拷贝一份改名去掉.template]增加配置
修改etc/hadoop/mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
修改etc/hadoop/yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
启动
$sbin/start-yarn.sh
查看
ResourceManager- http://192.168.137.11:8088/
报错的话看第五部分。
四、 hdfs使用,yarn的任务检查
(1) 开始一个任务:
进入这个目录,可以找到例子程序jar
[hadoop@hadoop001hadoop]$ cd /opt/software/hadoop-2.8.1/share/hadoop/mapreduce
[hadoop@hadoop001mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.8.1.jar pi 10 10
求π。
(2) 检查任务:
runing
finish
点击任务连接,还可以显示详情,可以点击log等信息。
注意:在本处,点击log可能会网页打不开,是因为登陆电脑没有做DNS解析,无法识别hadoop中主机名,在win7 中修改hosts文件,添加映射关系即可。
C:\Windows\System32\drivers\etc\hosts中添加
192.168.137.11hadoop001
(3) hdfs 一些简单使用
上传一个文件到hdfs 根目录:hadoop fs-put hadoop.log /
查看根目录下文件:hadoop fs-ls /
查看一个文件:hadoop fs-cat /hadoop.log
也可以在网页的browsedirectory 里查看文件,甚至上传下载。
五、 配置过程中的检查
[root@hadoop001~]# cat /opt/software/hadoop-2.8.1/logs/yarn-hadoop-resourcemanager-hadoop001.out
[FatalError] yarn-site.xml:20:6:The markup in the documentfollowingthe root element must be well-formed.
这里的意思是yarn-site.xml文件的第20行有错误,去检查即可。