一、Hadoop
1.1狭义:apache hadoop软件 开源的
1.2广义:以apache hadoop软件为主的生态圈(hive sqoop flume spark flink hbase 。。。。)
常见的apache大数据的网址:主件.apache.org
hadoop.apache.org
xxx.apache.org
spark.apache.org
kafka.apache.org
二、hadoop软件
apache hadoop软件:
1.x 基本不用
2.x 企业主流==》CDH5.x系列
3.x 尝试使用==》CDH6.x系列
http://archive.cloudera.com/cdh5/cdh/5/
http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.16.2.tar.gz
hadoop-2.6.0-cdh5.16.2.tar.gz
apache hadoop2.6.0 + 以后的patch==apache hadoop2.9(相当于补丁+Apache Hadoop2.9)
CDH5.14.0 hadoop-2.6.0
CDH5.16.2 hadoop-2.6.0
http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.16.2-changes.log
http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.16.2-changes.log
cloudera
选择cdh的好处:版本兼容性 不必考虑
三、Hadoop的组成
hdfs 存储
mapreduce 计算 作业 有价值的 数据挖掘 ==》
由于开发难度高 代码量大 维护困难 计算慢,
所以大家基本不会使用MR,都使用hive sql spark flink
yarn 资源(内存 VCORE)+作业调度
海量的数据 1000台
hadoop-2.6.0-cdh5.16.2.tar.gz
wget http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.16.2.tar.gz
四、部署
三种部署模式
Now you are ready to start your Hadoop cluster in one of the three supported modes:
Local (Standalone) Mode 本地模式 不用
Pseudo-Distributed Mode 伪分布式模式 学习 测试 1台
Fully-Distributed Mode 分布式模型 集群模式 生产
环境要求
Required Software:
a.java
mkdir /usr/java
cd /usr/java
[root@pxj31 /root]#cd /usr/java/
[root@pxj31 /usr/java]#ll
总用量 0
drwxr-xr-x. 8 root root 255 11月 16 15:42 jdk1.8.0_121
创建用户 目录
[root@pxj31 /root]#useradd pxj
[root@pxj31 /root]#id pxj
uid=1000(pxj) gid=1000(pxj) 组=1000(pxj)
[root@pxj31 /root]#su - pxj
[pxj@pxj31 /home/pxj]$mkdir app software sourcecode log tmp data lib
[pxj@pxj31 /home/pxj]$ll
总用量 0
drwxrwxr-x. 2 pxj pxj 6 12月 1 00:05 app 解压的文件夹 软连接
drwxrwxr-x. 2 pxj pxj 6 12月 1 00:05 data 数据
drwxrwxr-x. 2 pxj pxj 6 12月 1 00:05 lib 第三方的jar
drwxrwxr-x. 2 pxj pxj 6 12月 1 00:05 log 日志文件夹
drwxrwxr-x. 2 pxj pxj 6 12月 1 00:05 software 压缩包
drwxrwxr-x. 2 pxj pxj 6 12月 1 00:05 sourcecode 源代码编译
drwxrwxr-x. 2 pxj pxj 6 12月 1 00:05 tmp 临时文件夹 ???/tmp
上传压缩包
[
pxj@pxj31 /home/pxj/software]$ll
总用量 424176
-rw-r--r--. 1 root root 434354462 11月 30 23:44 hadoop-2.6.0-cdh5.16.2.tar.gz
[pxj@pxj31 /home/pxj]$chown pxj:pxj -R /home/pxj/software/*
解压
[pxj@pxj31 /home/pxj/software]$tar -zxvf hadoop-2.6.0-cdh5.16.2.tar.gz -C ../app/
hadoop-2.6.0-cdh5.16.2/
hadoop-2.6.0-cdh5.16.2/share/
hadoop-2.6.0-cdh5.16.2/share/hadoop/
hadoop-2.6.0-cdh5.16.2/share/hadoop/common/
hadoop-2.6.0-cdh5.16.2/share/hadoop/common/sources/
hadoop-2.6.0-cdh5.16.2/share/hadoop/common/sources/hadoop-common-2.6.0-cdh5.16.2-sources.jar
[pxj@pxj31 /home/pxj/app]$ll
总用量 0
drwxr-xr-x. 3 pxj pxj 19 6月 3 19:11 hadoop-2.6.0-cdh5.16.2
做软连接
[pxj@pxj31 /home/pxj/app]$ln -s hadoop-2.6.0-cdh5.16.2 hadoop
[pxj@pxj31 /home/pxj/app]$ll
总用量 0
lrwxrwxrwx. 1 pxj pxj 22 12月 1 00:22 hadoop -> hadoop-2.6.0-cdh5.16.2
drwxr-xr-x. 3 pxj pxj 19 6月 3 19:11 hadoop-2.6.0-cdh5.16.2
官方参考文档:
https://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/SingleCluster.html
http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.16.2/hadoop-project-dist/hadoop-common/SingleCluster.html
JAVA_HOME 显性配置
[pxj@pxj31 /home/pxj/app/hadoop/etc/hadoop]$vi hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_121
配置IP地址
[pxj@pxj31 /home/pxj/app/hadoop/etc/hadoop]$cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.25.31 pxj31
ssh无密码信任关系
1.生成公钥和私钥
[pxj@pxj31 /home/pxj]$ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/pxj/.ssh/id_rsa):
Created directory '/home/pxj/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/pxj/.ssh/id_rsa.
Your public key has been saved in /home/pxj/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:Y6ePUjG4QGO+wPvlT7Sp2+cD7poMsbPOeaPWppYVuzQ pxj@pxj31
The key's randomart image is:
+---[RSA 2048]----+
| |
| + |
| . + . . |
| o o o o |
| o.o +So. |
| . .oEoo* |
| .+B ==. |
| .=*O=.oo |
| +B=*B=oo. |
+----[SHA256]-----+
2.导入公钥到认证文件
$ [pxj@pxj31 /home/pxj]$cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
3.权限
[pxj@pxj31 /home/pxj]$chmod 600 ~/.ssh/authorized_keys
4.测试
[pxj@pxj31 /home/pxj]$ssh pxj31 date
2019年 12月 01日 星期日 01:07:55 CST
参考文章:
http://blog.itpub.net/30089851/viewspace-2127102/ 故障
http://blog.itpub.net/30089851/viewspace-1992210/ ssh多台 坑
## 配置用户环境变量
```shell
[pxj@pxj31 /home/pxj/app/hadoop]$vim ~/.bashrc
.bashrc
Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
Uncomment the following line if you don't like systemctl's auto-paging feature:
export SYSTEMD_PAGER=
User specific aliases and functions
export HADOOP_HOME=/home/ruoze/app/hadoop
export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH
[pxj@pxj31 /home/pxj/app/hadoop]$source ~/.bashrc
[pxj@pxj31 /home/pxj]$which hadoop
~/app/hadoop/bin/hadoop
修改配置文件
[pxj@pxj31 /home/pxj/app/hadoop/etc/hadoop]$vim core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://pxj31:9000</value>
</property>
</configuration>
[pxj@pxj31 /home/pxj/app/hadoop/etc/hadoop]$vim hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
格式化
[pxj@pxj31 /home/pxj]$hdfs namenode -format
has been successfully formatted.
第一次启动
[pxj@pxj31 /home/pxj]$start-dfs.sh
19/12/01 01:29:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [pxj31]
pxj31: starting namenode, logging to /home/pxj/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-pxj-namenode-pxj31.out
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is SHA256:wBnpszXvvVt7NH/NuxDRgLHkCXU1CStTXPflPsQw1AI.
ECDSA key fingerprint is MD5:7e:92:4e:a6:a7:65:93:43:b6:b2:53:a3:48:14:0a:ae.
Are you sure you want to continue connecting (yes/no)? yes
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
localhost: starting datanode, logging to /home/pxj/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-pxj-datanode-pxj31.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:wBnpszXvvVt7NH/NuxDRgLHkCXU1CStTXPflPsQw1AI.
ECDSA key fingerprint is MD5:7e:92:4e:a6:a7:65:93:43:b6:b2:53:a3:48:14:0a:ae.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /home/pxj/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-pxj-secondarynamenode-pxj31.out
19/12/01 01:30:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[pxj@pxj31 /home/pxj]$jps
13955 SecondaryNameNode
14071 Jps
13641 NameNode
13773 DataNode
坑:
[pxj@pxj31 /home/pxj/.ssh]$cat known_hosts
pxj31,192.168.25.31 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBCuE4mM++/m5+KufqPqfoulxSKCvNQu5obqsULglJD5aGgDapf/61g16DqiHdlqUYFjiey7dRTFrO+qkT+IXMA0=
localhost ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBCuE4mM++/m5+KufqPqfoulxSKCvNQu5obqsULglJD5aGgDapf/61g16DqiHdlqUYFjiey7dRTFrO+qkT+IXMA0=
0.0.0.0 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBCuE4mM++/m5+KufqPqfoulxSKCvNQu5obqsULglJD5aGgDapf/61g16DqiHdlqUYFjiey7dRTFrO+qkT+IXMA0=
DN SNN都以pxj31启动
pxj31: starting namenode, logging to /home/pxj/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-pxj-namenode-pxj31.out
localhost: starting datanode, logging to /home/pxj/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-pxj-datanode-pxj31.out
0.0.0.0: starting secondarynamenode, logging to /home/pxj/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-pxj-secondarynamenode-pxj31.out
NN:ruozedata001 fs.defaultFS控制的
DN: slaves文件
修正
[pxj@pxj31 /home/pxj/app/hadoop/etc/hadoop]$vim core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://pxj31:9000</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>pxj31:50090</value>
</property>
<property>
<name>dfs.namenode.secondary.https-address</name>
<value>pxj31:50091</value>
</property>
</configuration>
[pxj@pxj31 /home/pxj/app/hadoop/etc/hadoop]$vim slaves
[pxj@pxj31 /home/pxj/app/hadoop/etc/hadoop]$cat slaves
pxj31
namenode 名称节点 老大 读写请求先经过它 主节点
datanode 数据节点 小弟 存储数据 检索数据 从节点
secondarynamenode 第二名称节点 老二 h+1
hadoop相关命令
创建文件夹
[pxj@pxj31 /home/pxj/app/hadoop/etc/hadoop]$hadoop fs -mkdir /a
19/12/01 01:47:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
查看
[pxj@pxj31 /home/pxj/app/hadoop/etc/hadoop]$hadoop fs -ls /
19/12/01 01:47:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
drwxr-xr-x - pxj supergroup 0 2019-12-01 01:47 /a
上传文件
[pxj@pxj31 /home/pxj/app/hadoop/etc/hadoop]$hadoop fs -put slaves /a
19/12/01 01:48:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[pxj@pxj31 /home/pxj/app/hadoop/etc/hadoop]$hadoop fs -ls /a
19/12/01 01:48:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r-- 1 pxj supergroup 6 2019-12-01 01:48 /a/slaves
下载文件
[pxj@pxj31 /home/pxj/app/hadoop/etc/hadoop]$hadoop fs -get /a/slaves /home/pxj/slaves1
19/12/01 01:49:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[pxj@pxj31 /home/pxj/app/hadoop/etc/hadoop]$
[pxj@pxj31 /home/pxj]$ll
总用量 4
drwxrwxr-x. 3 pxj pxj 50 12月 1 00:40 app
drwxrwxr-x. 2 pxj pxj 6 12月 1 00:05 data
drwxrwxr-x. 2 pxj pxj 6 12月 1 00:05 lib
drwxrwxr-x. 2 pxj pxj 6 12月 1 00:05 log
-rw-r--r--. 1 pxj pxj 6 12月 1 01:49 slaves1
drwxrwxr-x. 2 pxj pxj 43 12月 1 00:37 software
drwxrwxr-x. 2 pxj pxj 6 12月 1 00:05 sourcecode
drwxrwxr-x. 2 pxj pxj 6 12月 1 00:05 tmp
删除文件
[pxj@pxj31 /home/pxj]$hadoop fs -rm /a/slaves
19/12/01 01:51:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Deleted /a/slaves
删除文件夹
[pxj@pxj31 /home/pxj]$hadoop fs -rmdir /a
19/12/01 01:54:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[pxj@pxj31 /home/pxj]$