hadoop笔记整理(-)

java环境搭建:

查看本机java版本:
[xiangkun@hadoop-senior01 ~]$ rpm -qa|grep java

卸载本机java版本:
[xiangkun@hadoop-senior01 ~]$ rpm -e --nodeps java-1.6.0-

执行权限:
[xiangkun@hadoop-senior01 softwares]$ chmod u+x ./*

解压到modules目录:
[xiangkun@hadoop-senior01 softwares]$ tar -zxf jkd-1.8.0  -C /opt/modules/

配置环境变量:

###JAVA_HOME
export JAVA_HOME=/opt/modules/jdk1.8.0_131
export PATH=$PATH:$JAVA_HOME/bin

========================================================

[root@bigdata-cdh00 bin]# vim /etc/profile

fi

HOSTNAME=`/bin/hostname 2>/dev/null`
HISTSIZE=1000
if [ "$HISTCONTROL" = "ignorespace" ] ; then
    export HISTCONTROL=ignoreboth
else
    export HISTCONTROL=ignoredups
fi

export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL
export JAVA_HOME=/opt/modules/jdk1.7.0_79

export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

export PATH=$JAVA_HOME/bin:$PATH
# By default, we want umask to get set. This sets it for login shell
# Current threshold for system reserved uid/gids is 200
# You could check uidgid reservation validity in
# /usr/share/doc/setup-*/uidgid file

Hadoop安装:

apache所有文件的归档(历史版本):http://archive.apache.org/dist/
hadoop所有版本的归档:http://archive.apache.org/dist/hadoop/common

解压:
[xiangkun@hadoop-senior01 softwares]$ tar -zxf hadoop-2.5.0.tar.gz -C /opt/modules/

hadoop-env.xml
export JAVA_HOME=/opt/modules/jdk1.8.0_131



mapreduce 三种运行方式

  • Local (Standalone) Mode 本地模式
  • Pseudo-Distributed Mode 尾分布式模式
  • Fully-Distributed Mode 分布式模式

第一种 :Local Mode

cd 到hadoop安装目录:
  $ mkdir input
  $ cp etc/hadoop/*.xml input
  $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar grep input output 'dfs[a-z.]+'
  $ cat output/*

这里写图片描述

经典案例:使用mapreduce统计单词的个数

xiangkun@xiangkun-X550LD:/opt/modules/hadoop-2.5.0$ sudo mkdir wcinput

xiangkun@xiangkun-X550LD:/opt/modules/hadoop-2.5.0$ cd wcinput

xiangkun@xiangkun-X550LD:/opt/modules/hadoop-2.5.0/wcinput$ sudo touch wc.input // 创建一个文件

xiangkun@xiangkun-X550LD:/opt/modules/hadoop-2.5.0/wcinput$ vim wc.input //编辑这个文件

hadoop hdfs
hadoop yarn
hadoop mapreduce

$bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount wcinput wcoutput

$ cat wcoutput/*

第二种 :Pseudo-Distributed Mode

 
 etc/hadoop/core-site.xml:

配置一:
<configuration>
    <property>
        <name>fs.defaultFS</name>
        ###这主机名,就指定了namenode运行的机器
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>
配置二:
<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/modules/hadoop-2.5.0/data/tmp</value>
    </property>
</configuration>

etc/hadoop/hdfs-site.xml:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    ###指定secondarynamenode运行在那台机器上
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>hadoop-senior01.xiangkun:50090</value>
    </property>
### </configuration>

格式化:

  $ bin/hdfs namenode -format

启动:

[xiangkun@hadoop-senior01 hadoop-2.5.0]$ sudo sbin/hadoop-daemon.sh start namenode
[sudo] password for xiangkun: 
starting namenode, logging to /opt/modules/hadoop-2.5.0/logs/hadoop-root-namenode-hadoop-senior01.xiangkun.out
[xiangkun@hadoop-senior01 hadoop-2.5.0]$ jps
4001 Jps
3878 NameNode
[xiangkun@hadoop-senior01 hadoop-2.5.0]$ sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /opt/modules/hadoop-2.5.0/logs/hadoop-xiangkun-datanode-hadoop-senior01.xiangkun.out
[xiangkun@hadoop-senior01 hadoop-2.5.0]$ jps
4032 DataNode
3878 NameNode
4103 Jps

hdfs:启动后,通过web 访问的端口是50070

 在hdfs文件系统创建一个目录
[xiangkun@hadoop-senior01 hadoop-2.5.0]$ bin/hdfs dfs -mkdir -p /usr/xiangkun/mapreduce/wordcount/input

给这个目录上传一个文件:
[xiangkun@hadoop-senior01 hadoop-2.5.0]$ bin/hdfs dfs -put wcinput/wc.input   /usr/xiangkun/mapreduce/wordcount/input

执行mapreduce程序:
[xiangkun@hadoop-senior01 hadoop-2.5.0]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar  wordcount /usr/xiangkun/mapreduce/wordcount/input /usr/xiangkun/mapreduce/wordcount/output

查看执行结果:
[xiangkun@hadoop-senior01 hadoop-2.5.0]$ bin/hdfs dfs -cat /usr/xiangkun/mapreduce/wordcount/output/*

配置yarn,将hdfs文件系统运行在yarn上

yarn-env.sh

JAVA_HOME=/opt/modules/jdk1.8.0_131
slaves   (决定datanode ,nodemanager在那台机器)

hadoop-senior01.xiangkun
yarn-site.xml
    
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop-senior01.xiangkun</value>
    </property>

启动yarn:


[xiangkun@hadoop-senior01 hadoop-2.5.0]$ sbin/yarn-daemon.sh start resourcemanager

[xiangkun@hadoop-senior01 hadoop-2.5.0]$ sbin/yarn-daemon.sh start nodemanager

[xiangkun@hadoop-senior01 hadoop-2.5.0]$ jps
4032 DataNode
5299 Jps
5268 NodeManager
3878 NameNode
5021 ResourceManager
mapped-env.sh

export JAVA_HOME=/opt/modules/jdk1.8.0_131
mapred-site.xml

	<property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

yarn:启动后,通过web 访问的端口是8088
 
将hdfs运行在yarn上:

[xiangkun@hadoop-senior01 hadoop-2.5.0]$ bin/hdfs dfs -rm -R /usr/xiangkun/mapreduce/wordcount/output/

[xiangkun@hadoop-senior01 hadoop-2.5.0]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar  wordcount /usr/xiangkun/mapreduce/wordcount/input /usr/xiangkun/mapreduce/wordcount/output

tips:

  1. 数据流只经过datanode,不经过namenode,namenode仅仅存储元数据

  2. 默认的default.xml是相应的jar包中。

  3. 日志查看方式:

    • more :翻页查看
    • tail: 文件的末尾 (man tail: 查看tail使用方法)
    • tail -100f: 文件倒数100行日志

yarn历史监控启动:

配置日志聚集属性:

yarn-site.xml

<property>
	 <name>yarn.log-aggregation-enable</name>
	 <value>true</value>
</property>
##该属性,表示日志保存的时间
<property>
	 <name>yarn.log-aggregation-retain-seconds</name>
	 <value>640800</value>
</property>
[xiangkun@hadoop-senior01 hadoop-2.5.0]$sbin/mr-jobhistory-daemon.sh start historyserver

配置删除的文件在垃圾箱保存的时间

core-site.xml
<property>
	<name>fs.trash.interval</name>
	<value>640800</value>
</property>

启动方式总结:

1.  各个服务组件逐一启动
     1.hdfs:
       hadoop-daemon.sh start|stop namenode|datanode|secondarynamenode
       2.yarn
       yarn-daemon.sh start|stop resourcemanager|nodemanager
       3.mapreduce
       mr-historyserver-daemon.sh start|stop historyserver
 2.各个模块分开启动:
	 1.hdfs
		 start-dfs.sh
		 stop-dfs.sh
	  namenode先链接自己,在链接别的阶段datanode,
	  配置ssh无密码登录:
	  [xiangkun@hadoop-senior01 .ssh]$ pwd
		/home/xiangkun/.ssh
	  [xiangkun@hadoop-senior01 .ssh]$ ssh-keygen -t rsa
	  ###发送到其它机器(用户名一样)
	  [xiangkun@hadoop-senior01 .ssh]$ ssh-copy-id hostname
	 2.yarn
		 start-yarn.sh
		 stop-yarn.sh
   3.全部启动(不推荐:因为启动hdfs,yarn都需要在主节点上,实际上是分布式的,namenode ,resoursemanager 在不同的节点上)
	    1.start-all.sh
	    2.stop-all.sh

namenode运行在那台机器:

 etc/hadoop/core-site.xml:

配置一:
<configuration>
    <property>
        <name>fs.defaultFS</name>
        ###这主机名,就指定了namenode运行的机器
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

datanode运行在那台机器:

slaves   (决定datanode ,nodemanager在那台机器)

hadoop-senior01.xiangkun

secondarynamenode运行在那台机器:

etc/hadoop/hdfs-site.xml:

<configuration>
    
    ###指定secondarynamenode运行在那台机器上
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>hadoop-senior01.xiangkun:50090</value>
    </property>
### </configuration>

ResourceManager/NodeManager运行在那台机器:

yarn-site.xml

<property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop-senior01.xiangkun</value>
</property>
<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

MapReduce HistoryServer运行在那台机器:

mapped-site.xml

<property>
        <name>mapreduce.jobhistory.address</name>
        <value>hadoop-senior01.xiangkun:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>hadoop-senior01.xiangkun:19888</value>
    </property>

hdfs:sbin/start-all.sh启动顺序:namenode–>datanode—>secodarynamenode—>resourcemanager—>nodemanager
然后启动mapreduce:sbin/mr-jobhistory-daemon.sh start historyserver

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值