一.Hadoop
Hadoop是一个由Apache基金会所开发的分布式系统基础架构。
用户可以在不了解分布式底层细节的情况下,开发分布式程序。充分利用集群的威力进行高速运算和存储。
Hadoop实现了一个分布式文件系统(Hadoop Distributed File
System),简称HDFS。HDFS有高容错性的特点,并且设计用来部署在低廉的(low-cost)硬件上;而且它提供高吞吐量(high
throughput)来访问应用程序的数据,适合那些有着超大数据集(large data
set)的应用程序。HDFS放宽了(relax)POSIX的要求,可以以流的形式访问(streaming access)文件系统中的数据。
Hadoop的框架最核心的设计就是:HDFS和MapReduce。HDFS为海量的数据提供了存储,而MapReduce则为海量的数据提供了计算
二.Hadoop单机模式的配置
1.下载安装包
[root@server1 ~]# ls
hadoop-3.0.3.tar.gz jdk-8u181-linux-x64.tar.gz
[root@server1 ~]# ls
hadoop-3.0.3.tar.gz jdk-8u181-linux-x64.tar.gz
2.创建hadoop用户
[root@server1 ~]# useradd hadoop
[root@server1 ~]# id hadoop
uid=1000(hadoop) gid=1000(hadoop) groups=1000(hadoop)
3.将安装包放到hadoop的家目录/home/hadop
[root@server1 ~]# mv * /home/hadoop/
[root@server1 ~]# su - hadoop
[hadoop@server1 ~]$ ls
hadoop-3.0.3.tar.gz jdk-8u181-linux-x64.tar.gz
4.解压jdk并创建java软连接
[hadoop@server1 ~]$ tar zxf jdk-8u181-linux-x64.tar.gz
[hadoop@server1 ~]$ ls
hadoop-3.0.3.tar.gz jdk1.8.0_181 jdk-8u181-linux-x64.tar.gz
[hadoop@server1 ~]$ ln -s jdk1.8.0_181/ java
[hadoop@server1 ~]$ ls
hadoop-3.0.3.tar.gz java jdk1.8.0_181 jdk-8u181-linux-x64.tar.gz
5.解压并创建hadoop软连接
[hadoop@server1 ~]$ tar zxf hadoop-3.0.3.tar.gz
[hadoop@server1 ~]$ ln -s hadoop-3.0.3 hadoop
[hadoop@server1 ~]$ ls
hadoop hadoop-3.0.3.tar.gz jdk1.8.0_181
hadoop-3.0.3 java jdk-8u181-linux-x64.tar.gz
[hadoop@server1 ~]$ cd hadoop
[hadoop@server1 hadoop]$ ls
bin include libexec NOTICE.txt sbin
etc lib LICENSE.txt README.txt share
6.修改hadoop-env.sh脚本文件
[hadoop@server1 hadoop]$ cd etc/hadoop/
[hadoop@server1 hadoop]$ vim hadoop-env.sh
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop/etc/hadoop
7.加载环境变量
[hadoop@server1 ~]$ vim .bash_profile
PATH=$PATH:$HOME/.local/bin:$HOME/bin:$HOME/java/bin
[hadoop@server1 ~]$ source .bash_profile
[hadoop@server1 ~]$ jps
2442 Jps
8.创建输入目录input
[hadoop@server1 ~]$ ls
hadoop hadoop-3.0.3.tar.gz jdk1.8.0_181
hadoop-3.0.3 java jdk-8u181-linux-x64.tar.gz
[hadoop@server1 ~]$ cd hadoop
[hadoop@server1 hadoop]$ ls
bin include libexec NOTICE.txt sbin
etc lib LICENSE.txt README.txt share
[hadoop@server1 hadoop]$ mkdir input/
[hadoop@server1 hadoop]$ ls
bin include lib LICENSE.txt README.txt share
etc input libexec NOTICE.txt sbin
10.将数据写入input目录里
[hadoop@server1 hadoop]$ cp etc/hadoop/*.xml input
[hadoop@server1 hadoop]$ ls input/
capacity-scheduler.xml hdfs-site.xml kms-site.xml
core-site.xml httpfs-site.xml