1、hadoop介绍
1.1、官网介绍
hadoop官网:hadoop.apache.org
类似的Apache组件的网址基本都是 XXX.apache.org,如spark.apache.org,kafka.apache.org。
要学会看官网的,找参数。
广义概念上的hadoop指的是以apache hadoop软件为主的生态圈,包括但不限于hive、sqoop、flume、spark、flink、 hbase等;狭义概念上的hadoop指的就是apache hadoop软件,它是开源的。
1.2、各版本使用情况
apache hadoop软件:
hadoop版本 | 使用情况 | 对应CDH版本 |
---|---|---|
1.x | 基本不用 | |
2.x | 企业主流 | CDH5.x系列 |
3.x | 尝试使用 | CDH6.x系列 |
1.3、软件版本选择
本文使用的是cdh版本的安装包,hadoop-2.6.0-cdh5.16.2.tar.gz的安装包。对应官网https://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/SingleCluster.html
选择cdh的好处:不必考虑版本兼容性。比如要安装hbase,只要找http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.16.2 下对应的安装包,flume只要找http://archive.cloudera.com/cdh5/cdh/5/flume-ng-1.6.0-cdh5.16.2 下对应的安装包,保持cdh5.16.2一致,就不用再考虑版本的兼容性。后续若使用后发现存在bug,不得不升级解决,则应该在对应版本的以后版本中找 changes.log文件,看看对应的bug是否已经解决,再选择对应的版本升级。如http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.16.2-changes.log
1.4、hadoop框架介绍
名称 | 内容 | 延伸 |
---|---|---|
hdfs | 负责存储 | |
mapreduce | 负责计算 | 由于开发难度高,代码量大,维护困难,计算慢,所以大家基本不会使用MR,都使用hql、spark、flink |
yarn | 负责资源作业调度 | 主要资源:内存 VCORE |
2、HDFS部署
Now you are ready to start your Hadoop cluster in one of the three supported modes:
模式 | 名称 | 使用情况 |
---|---|---|
Local (Standalone) Mode | 本地模式 | 不用 |
Pseudo-Distributed Mode | 伪分布式模式 | 学习 测试 1台 |
Fully-Distributed Mode | 分布式模型 集群模式 | 生产 |
此处安装的是伪分布式模式
2.0 修改主机名
[root@JD ~]# hostnamectl set-hostname rzdata001
[root@JD ~]# reboot # 重启
2.1 创建用户 目录
[root@rzdata001 ~]# useradd ruoze
[root@rzdata001 ~]# su - ruoze
[ruoze@rzdata001 ~]$ mkdir app software sourcecode log tmp data lib
[ruoze@rzdata001 ~]$ ll
total 0
drwxrwxr-x 2 ruoze ruoze 6 Nov 27 21:32 app # 解压的文件夹 软连接
drwxrwxr-x 2 ruoze ruoze 6 Nov 27 21:32 data # 数据
drwxrwxr-x 2 ruoze ruoze 6 Nov 27 21:32 lib # 第三方的jar
drwxrwxr-x 2 ruoze ruoze 6 Nov 27 21:32 log # 日志文件夹
drwxrwxr-x 2 ruoze ruoze 6 Nov 27 21:32 software # 压缩包
drwxrwxr-x 2 ruoze ruoze 6 Nov 27 21:32 sourcecode # 源代码编译
drwxrwxr-x 2 ruoze ruoze 6 Nov 27 21:32 tmp # 临时文件夹 ???/tmp
[ruoze@rzdata001 ~]$
2.2 上传压缩包
[ruoze@rzdata001 ~]$ cd software
rz上传
2.3 解压
[ruoze@rzdata001 software]$ tar -zxvf hadoop-2.6.0-cdh5.16.2.tar.gz
[ruoze@rzdata001 software]$ ll
total 424180
drwxr-xr-x 14 ruoze ruoze 4096 Jun 3 19:11 hadoop-2.6.0-cdh5.16.2
-rw-r--r-- 1 ruoze ruoze 434354462 Nov 28 12:47 hadoop-2.6.0-cdh5.16.2.tar.gz
[ruoze@rzdata001 software]$
[ruoze@rzdata001 software]$
[ruoze@rzdata001 software]$ mv hadoop-2.6.0-cdh5.16.2 ../app/
[ruoze@rzdata001 software]$ cd ../app/
[ruoze@rzdata001 app]$ ll
total 4
drwxr-xr-x 14 ruoze ruoze 4096 Jun 3 19:11 hadoop-2.6.0-cdh5.16.2
[ruoze@rzdata001 app]$ ln -s hadoop-2.6.0-cdh5.16.2 hadoop
[ruoze@rzdata001 app]$ ll
total 4
lrwxrwxrwx 1 ruoze ruoze 22 Nov 28 21:24 hadoop -> hadoop-2.6.0-cdh5.16.2
drwxr-xr-x 14 ruoze ruoze 4096 Jun 3 19:11 hadoop-2.6.0-cdh5.16.2
[ruoze@rzdata001 app]$
2.4 环境要求
2.4.1 java安装,此处已安装
[ruoze@rzdata001 app]$ which java
/usr/java/jdk1.8.0_121/bin/java
[ruoze@rzdata001 app]$
2.4.2 ssh 此处已安装
2.5 JAVA_HOME 显性配置
[ruoze@rzdata001 app]$ cd hadoop/etc/hadoop
[ruoze@rzdata001 hadoop]$ pwd
/home/ruoze/app/hadoop/etc/hadoop
[ruoze@rzdata001 hadoop]$ ll
total 156
-rw-r--r-- 1 ruoze ruoze 4436 Jun 3 19:04 capacity-scheduler.xml
-rw-r--r-- 1 ruoze ruoze 1335 Jun 3 19:04 configuration.xsl
-rw-r--r-- 1 ruoze ruoze 318 Jun 3 19:04 container-executor.cfg
-rw-r--r-- 1 ruoze ruoze 774 Jun 3 19:04 core-site.xml
-rw-r--r-- 1 ruoze ruoze 3670 Jun 3 19:04 hadoop-env.cmd
-rw-r--r-- 1 ruoze ruoze 4224 Jun 3 19:04 hadoop-env.sh
-rw-r--r-- 1 ruoze ruoze 2598 Jun 3 19:04 hadoop-metrics2.properties
-rw-r--r-- 1 ruoze ruoze 2490 Jun 3 19:04 hadoop-metrics.properties
-rw-r--r-- 1 ruoze ruoze 9683 Jun 3 19:04 hadoop-policy.xml
-rw-r--r-- 1 ruoze ruoze 775 Jun 3 19:04 hdfs-site.xml
-rw-r--r-- 1 ruoze ruoze 2230 Jun 3 19:04 httpfs-env.sh
-rw-r--r-- 1 ruoze ruoze 1657 Jun 3 19:04 httpfs-log4j.properties
-rw-r--r-- 1 ruoze ruoze 21 Jun 3 19:04 httpfs-signature.secret
-rw-r--r-- 1 ruoze ruoze 620 Jun 3 19:04 httpfs-site.xml
-rw-r--r-- 1 ruoze ruoze 3523 Jun 3 19:04 kms-acls.xml
-rw-r--r-- 1 ruoze ruoze 3139 Jun 3 19:04 kms-env.sh
-rw-r--r-- 1 ruoze ruoze 1788 Jun 3 19:04 kms-log4j.properties
-rw-r--r-- 1 ruoze ruoze 5933 Jun 3 19:04 kms-site.xml
-rw-r--r-- 1 ruoze ruoze 12601 Jun 3 19:04 log4j.properties
-rw-r--r-- 1 ruoze ruoze 938 Jun 3 19:04 mapred-env.cmd
-rw-r--r-- 1 ruoze ruoze 1383 Jun 3 19:04 mapred-env.sh
-rw-r--r-- 1 ruoze ruoze 4113 Jun 3 19:04 mapred-queues.xml.template
-rw-r--r-- 1 ruoze ruoze 758 Jun 3 19:04 mapred-site.xml.template
-rw-r--r-- 1 ruoze ruoze 10 Jun 3 19:04 slaves
-rw-r--r-- 1 ruoze ruoze 2316 Jun 3 19:04 ssl-client.xml.example
-rw-r--r-- 1 ruoze ruoze 2697 Jun 3 19:04 ssl-server.xml.example
-rw-r--r-- 1 ruoze ruoze 2237 Jun 3 19:04 yarn-env.cmd
-rw-r--r-- 1 ruoze ruoze 4567 Jun 3 19:04 yarn-env.sh
-rw-r--r-- 1 ruoze ruoze 690 Jun 3 19:04 yarn-site.xml
[ruoze@rzdata001 hadoop]$
[ruoze@rzdata001 hadoop]$ vim hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_121 # bug,必须手动配置
2.6 配置文件
[ruoze@rzdata001 hadoop]$ cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.0.3 rzdata001
[ruoze@rzdata001 hadoop]$
[ruoze@rzdata001 hadoop]$ vim core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://rzdata001:9000</value>
</property>
</configuration>
[ruoze@rzdata001 hadoop]$
[ruoze@rzdata001 hadoop]$ vim hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
[ruoze@rzdata001 hadoop]$
2.7 ssh无密码信任关系
[ruoze@rzdata001 hadoop]$ cd
[ruoze@rzdata001 ~]$ pwd
/home/ruoze
[ruoze@rzdata001 ~]$ ls -la
total 24
drwx------ 10 ruoze ruoze 4096 Nov 28 21:34 .
drwxr-xr-x. 6 root root 71 Nov 24 19:36 ..
drwxrwxr-x 3 ruoze ruoze 48 Nov 28 21:24 app
-rw------- 1 ruoze ruoze 396 Nov 28 14:22 .bash_history
-rw-r--r-- 1 ruoze ruoze 18 Nov 17 16:35 .bash_logout
-rw-r--r-- 1 ruoze ruoze 193 Nov 17 16:35 .bash_profile
-rw-r--r-- 1 ruoze ruoze 231 Nov 17 16:35 .bashrc
drwxrwxr-x 2 ruoze ruoze 6 Nov 27 21:34 data
drwxrwxr-x 2 ruoze ruoze 6 Nov 27 21:34 lib
drwxrwxr-x 2 ruoze ruoze 6 Nov 27 21:34 log
drwxrwxr-x 2 ruoze ruoze 42 Nov 28 21:23 software
drwxrwxr-x 2 ruoze ruoze 6 Nov 27 21:34 sourcecode
drwx------ 2 ruoze ruoze 24 Nov 28 21:26 .ssh
drwxrwxr-x 2 ruoze ruoze 6 Nov 27 21:34 tmp
-rw------- 1 ruoze ruoze 2231 Nov 28 21:34 .viminfo
[ruoze@rzdata001 ~]$ cd .ssh
[ruoze@rzdata001 .ssh]$ ll
total 4
-rw-r--r-- 1 ruoze ruoze 171 Nov 28 21:26 known_hosts
[ruoze@rzdata001 .ssh]$ cd ..
[ruoze@rzdata001 ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/ruoze/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/ruoze/.ssh/id_rsa.
Your public key has been saved in /home/ruoze/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:mjU3RrWZ8YMEp64oeIMSSGJEqyYBreXFslzerByuFfY ruoze@rzdata001
The key's randomart image is:
+---[RSA 2048]----+
|o+ . ..= |
|o = + = B |
|oO * o o = o |
|B.+ = o o . |
|=. + = S = |
|o. o= E= = . |
|. oo+ + . |
| ... o |
| |
+----[SHA256]-----+
[ruoze@rzdata001 ~]$ ll
[ruoze@rzdata001 ~]$ cd .ssh/
[ruoze@rzdata001 .ssh]$ ll
total 12
-rw------- 1 ruoze ruoze 1675 Nov 28 21:36 id_rsa
-rw-r--r-- 1 ruoze ruoze 397 Nov 28 21:36 id_rsa.pub
-rw-r--r-- 1 ruoze ruoze 171 Nov 28 21:26 known_hosts
[ruoze@rzdata001 .ssh]$ cat ./id_rsa.pub >> authorized_keys
[ruoze@rzdata001 .ssh]$ ll
total 16
-rw-rw-r-- 1 ruoze ruoze 397 Nov 28 21:38 authorized_keys
-rw------- 1 ruoze ruoze 1675 Nov 28 21:36 id_rsa
-rw-r--r-- 1 ruoze ruoze 397 Nov 28 21:36 id_rsa.pub
-rw-r--r-- 1 ruoze ruoze 171 Nov 28 21:26 known_hosts
[ruoze@rzdata001 .ssh]$ chmod 0600 authorized_keys
[ruoze@rzdata001 .ssh]$ ll
total 16
-rw------- 1 ruoze ruoze 397 Nov 28 21:38 authorized_keys
-rw------- 1 ruoze ruoze 1675 Nov 28 21:36 id_rsa
-rw-r--r-- 1 ruoze ruoze 397 Nov 28 21:36 id_rsa.pub
-rw-r--r-- 1 ruoze ruoze 171 Nov 28 21:26 known_hosts
[ruoze@rzdata001 .ssh]$
2.8 环境变量 hadoop
[ruoze@rzdata001 .ssh]$ cd
[ruoze@rzdata001 ~]$ vim .bashrc
# env
export HADOOP_HOME=/home/ruoze/app/hadoop
export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH
[ruoze@rzdata001 hadoop]$
[ruoze@rzdata001 ~]$ source .bashrc
[ruoze@rzdata001 ~]$
[ruoze@rzdata001 ~]$ which hadoop
~/app/hadoop/bin/hadoop
[ruoze@rzdata001 ~]$
2.9 格式化
[ruoze@rzdata001 ~]$ hdfs namenode -format
·····
19/11/28 21:44:25 INFO common.Storage: Storage directory /tmp/hadoop-ruoze/dfs/name has been successfully formatted.
······
[ruoze@rzdata001 ~]$
2.10 第一次启动
[ruoze@rzdata001 ~]$ start-dfs.sh
19/11/28 21:45:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [rzdata001]
The authenticity of host 'rzdata001 (192.168.0.3)' can't be established.
ECDSA key fingerprint is SHA256:OLqoaMxlGFbCq4sC9pYgF+FdbcXHbEbtSrnMiGGFbVw.
ECDSA key fingerprint is MD5:d3:5b:4a:ef:8e:00:41:a0:5e:80:ef:75:76:8a:a3:49.
Are you sure you want to continue connecting (yes/no)? yes
rzdata001: Warning: Permanently added 'rzdata001,192.168.0.3' (ECDSA) to the list of known hosts.
rzdata001: starting namenode, logging to /home/ruoze/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-ruoze-namenode-rzdata001.out
localhost: starting datanode, logging to /home/ruoze/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-ruoze-datanode-rzdata001.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:OLqoaMxlGFbCq4sC9pYgF+FdbcXHbEbtSrnMiGGFbVw.
ECDSA key fingerprint is MD5:d3:5b:4a:ef:8e:00:41:a0:5e:80:ef:75:76:8a:a3:49.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /home/ruoze/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-ruoze-secondarynamenode-rzdata001.out
19/11/28 21:46:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[ruoze@rzdata001 ~]$ jps
5905 NameNode
6230 SecondaryNameNode
6030 DataNode
6350 Jps
[ruoze@rzdata001 ~]$
2.11 设置DN SNN都以 rzdata001启动
[ruoze@rzdata001 ~]$ netstat -nltp | grep 5905
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.0.3:9000 0.0.0.0:* LISTEN 5905/java
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 5905/java
[ruoze@rzdata001 ~]$ netstat -nltp | grep 6230
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 6230/java
[ruoze@rzdata001 ~]$ netstat -nltp | grep 6030
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 6030/java
tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 6030/java
tcp 0 0 127.0.0.1:45570 0.0.0.0:* LISTEN 6030/java
tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 6030/java
[ruoze@rzdata001 ~]$
可以发现DN和SNN都是以0.0.0.0启动的,现在要改成以rzdata001启动:
[ruoze@rzdata001 hadoop]$ pwd
/home/ruoze/app/hadoop/etc/hadoop
[ruoze@rzdata001 hadoop]$ vim hdfs-site.xml
<configuration>中添加以下内容
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>rzdata001:50090</value>
</property>
<property>
<name>dfs.namenode.secondary.https-address</name>
<value>rzdata001:50091</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>rzdata001:50010</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>rzdata001:50075</value>
</property>
<property>
<name>dfs.datanode.ipc.address</name>
<value>rzdata001:50020</value>
</property>
2.12官网的参数文件 在哪里找?
此版本,登录:https://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/SingleCluster.html
页面左下方:
》Configuration
hdfs-default.xml
hdfs-rbf-default.xml
mapred-default.xml
yarn-default.xml
Deprecated Properties
2.13主要概念
概念 | 名称 | 昵称 | 作用 | 主备 |
---|---|---|---|---|
namenode | 名称节点 | 老大 | 读写请求先经过它 | 主节点 |
datanode | 数据节点 | 小弟 | 存储数据 检索数据 | 从节点 |
secondary namenode | 第二名称节点 | 老二 | h+1 | 主节点的备份节点 |
大数据组件基本都是主从架构,但是 hbase读写请求不经过老大 master进程
http://ruozedata001:50070
Safemode is off.