1. Hadoop的说法
- 广义说法:以hadoop软件为主的生态圈
- 狭义说法:hadoop软件
2. apache
apache基金会有很多的开源项目,包括hadoop、saprk、hive、flink等,其统一网址为xxx.apache.org,For Example
3. hadoop软件
其三个版本分别为1.x,2.x,3.x,
其中2.x
最为常用,3.x
是新版本,不知道是否存在坑,企业上不做使用考虑,喜用2.x
版本
hadoop三大模块
- hdfs: 是存储大数据的分布式文件系统,做存储数据使用
- mapreduce: 是分布式计算框架,用于计算,分为map端和reduce端,其使用再后面详细介绍
- yarn: 是资源调度器,用于资源和作业的调度管理。
我们使用的是cdh版本,其是cloudera公司将apache hadoop 2.6.0 的源代码经过修复bug、加入新功能之后编译为直接的版本,其两者使用上没有什么区别,只是在生产上使用cdh版的。
4. hdfs部署
4.1 创建无密码hadoop用户
创建hadoop用户用于专项管理,可通过命令useradd hadoop
完成,在其家目录下创建文件夹app、source、software
[root@hadoop001 ~]# useradd hadoop
[root@hadoop001 ~]# vi /etc/sudoers
添加以下内容进去:
Hadoop hadoop ALL=(ALL) NOPASSWD:ALL
[root@hadoop001 ~]# su - hadoop
[hadoop@hadoop001 ~]$ ll
total 0
[hadoop@hadoop001 ~]$ mkdir app
[hadoop@hadoop001 ~]$ mkdir software
[hadoop@hadoop001 ~]$ mkdir source
[hadoop@hadoop001 ~]$ ll
total 12
drwxrwxr-x. 2 hadoop hadoop 4096 Sep 17 14:53 app
drwxrwxr-x. 2 hadoop hadoop 4096 Sep 17 14:54 software
drwxrwxr-x. 2 hadoop hadoop 4096 Sep 17 14:54 source
4.2 下载hdfs文件
切换到app文件夹下使用wget命令来下载hadoop的二进制包,下载好之后用命令tar -xzvf 压缩包
进行解压
wget http://archive-primary.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.7.0.tar.gz
解压
[hadoop@hadoop001 app]$ tar -xzvf hadoop-2.6.0-cdh5.7.0.tar.gz
4.3 JAVA1.7部署
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ ll /usr/java/
total 319160
drwxr-xr-x 8 root root 4096 Apr 11 2015 jdk1.7.0_80
drwxr-xr-x 8 root root 4096 Apr 11 2015 jdk1.8.0_45
-rw-r--r-- 1 root root 153530841 Jul 8 2015 jdk-7u80-linux-x64.tar.gz
-rw-r--r-- 1 root root 173271626 Sep 19 11:49 jdk-8u45-linux-x64.gz
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ echo $JAVA_HOME
/usr/java/jdk1.7.0_80
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ which java
/usr/java/jdk1.7.0_80/bin/java
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ java -version
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$
4.4 准备
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ cd etc/hadoop
[hadoop@hadoop001 hadoop]$ vi hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_80
export HADOOP_PREFIX=/home/hadoop/app/hadoop-2.6.0-cdh5.7.0
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ bin/hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
启动三种模式
- Local (Standalone) Mode: 单机 没有进程 不用
- Pseudo-Distributed Mode: 伪分布式 1台机器 进程 学习
- Fully-Distributed Mode: 分布式 进程 生产
4.5 配置文件
[hadoop@hadoop001 hadoop]$ vi core-site.xml
将以下内容加进去
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop002:9000</value>
</property>
</configuration>
[hadoop@hadoop001 hadoop]$ vi hdfs-site.xml
加入下面内容
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
4.6 无密码ssh
一定要看清楚是在哪个用户哪个路径下输入命令的
[hadoop@hadoop001 hadoop]$ cd
[hadoop@hadoop001 ~]$ rm -rf .ssh
[hadoop@hadoop001 ~]$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Created directory '/home/hadoop/.ssh'.
Your identification has been saved in /home/hadoop/.ssh/id_dsa.
Your public key has been saved in /home/hadoop/.ssh/id_dsa.pub.
The key fingerprint is:
10:b2:0c:fa:24:e7:81:b9:94:0f:f9:53:51:c9:7d:9e hadoop@hadoop001
The key's randomart image is:
+--[ DSA 1024]----+
| . ..+.o |
| +oo o.+ . . |
|==+ o.. o . |
|.B+.. . E |
|. o+ S |
| . |
| |
| |
| |
+-----------------+
[hadoop@hadoop001 ~]$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
[hadoop@hadoop001 ~]$ cd .ssh
[hadoop@hadoop001 .ssh]$ ll
total 12
-rw-rw-r--. 1 hadoop hadoop 606 Sep 17 17:06 authorized_keys
-rw-------. 1 hadoop hadoop 668 Sep 17 17:06 id_dsa
-rw-r--r--. 1 hadoop hadoop 606 Sep 17 17:06 id_dsa.pub
[hadoop@hadoop001 .ssh]$ chmod 600 authorized_keys
[hadoop@hadoop001 .ssh]$ ssh hadoop001
The authenticity of host 'hadoop001 (192.168.137.190)' can't be established.
RSA key fingerprint is 09:9f:45:4e:60:17:91:57:95:f7:a4:1e:3b:2a:a9:bd.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'hadoop001,192.168.137.190' (RSA) to the list of known hosts.
Last login: Mon Sep 17 20:31:59 2018 from hadoop001
4.7 环境配置
[hadoop@hadoop001 ~]$ vi .bash_profile
添加以下内容到个人环境变量文件.bash_profile
export JAVA_HOME=/usr/java/jdk1.7.0_80
export HADOOP_PREFIX=/home/hadoop/app/hadoop-2.6.0-cdh5.7.0
export PATH=$HADOOP_PREFIX/bin:$JAVA_HOME/bin:$PATH
添加完成后保存退出,并生效个人环境变量文件
[hadoop@hadoop001 ~]$ sorrce .bash_profile
检查能否无密码ssh远程连接:
[hadoop@hadoop001 ~]$ ssh hadoop001
Last login: Tue Sep 18 06:38:19 2018 from hadoop001
[hadoop@hadoop001 ~]$ which hdfs
~/app/hadoop-2.6.0-cdh5.7.0/bin/hdfs
配置slaves
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ cd etc/hadoop
[hadoop@hadoop001 hadoop]$ vi slaves
将localhost改成hadoop001
格式化
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ bin/hdfs namenode -format
看到Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted
表明格式化成功。
启动
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ sbin/start-dfs.sh
查看启动是否成功
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ jps
24398 Jps
24110 DataNode
24300 SecondaryNameNode
24013 NameNode
出现上面结果,则表明DataNode、SecondaryNameNode和NameNode三个进程都启动了。