了解Hadoop:
Hadoop实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS。HDFS有高容错性的特点,并且设计用来部署在低廉的(low-cost)硬件上;而且它提供高吞吐量(high throughput)来访问应用程序的数据,适合那些有着超大数据集(large data set)的应用程序。HDFS放宽了(relax)POSIX的要求,可以以流的形式访问(streaming access)文件系统中的数据。
Hadoop的框架最核心的设计就是:HDFS和MapReduce。HDFS为海量的数据提供了存储,则MapReduce为海量的数据提供了计算。
Hadoop|docker|openstack|elk
1.HDFS2
2.zookper:分布式锁管理器
所需环境:Java|ssh
[hadoop@hadoop1 ~]$ ls
hadoop-2.7.3.tar.gz jdk-7u79-linux-x64.tar.gz
[hadoop@hadoop1 ~]$ tar zxf jdk-7u79-linux-x64.tar.gz
[hadoop@hadoop1 ~]$ ln -s jdk1.7.0_79/ java
[hadoop@hadoop1 hadoop-2.7.3]$ mkdir input
[hadoop@hadoop1 hadoop-2.7.3]$ which java
~/java/bin/java
[hadoop@hadoop1 hadoop-2.7.3]$ which javac
~/java/bin/javac
[hadoop@hadoop1 hadoop-2.7.3]$ vim ~/.bash_profile PATH=$PATH:$HOME/bin:/home/hadoop/java/bin
[hadoop@hadoop1 hadoop-2.7.3]$ source ~/.bash_profile
[hadoop@hadoop1 hadoop-2.7.3]$ vim etc/hadoop/hadoop-env.sh
......
export JAVA_HOME=/home/hadoop/java
[hadoop@hadoop1 hadoop-2.7.3]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar
[hadoop@hadoop1 hadoop-2.7.3]$ cp etc/hadoop/*.xml input/
[hadoop@hadoop1 hadoop-2.7.3]$ ls input/
[hadoop@hadoop1 hadoop-2.7.3]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount input output #此时主机名需要解析,要不然会报错
[hadoop@hadoop1 hadoop-2.7.3]$ cd etc/hadoop/
[hadoop@hadoop1 hadoop]$ vim slaves
[hadoop@hadoop1 hadoop]$ vim core-site.xml
[hadoop@hadoop1 hadoop]$ vim hdfs-site.xml
[hadoop@hadoop1 hadoop]$ ssh 172.25.30.1
[hadoop@hadoop1 hadoop]$ su -
Password:
[root@hadoop1 ~]# passwd hadoop
[hadoop@hadoop1 hadoop]$ ssh 172.25.30.1
hadoop@172.25.30.1's password:
[hadoop@hadoop1 ~]$ ssh-keygen
[hadoop@hadoop1 ~]$ logout
Connection to 172.25.30.1 closed.
[hadoop@hadoop1 hadoop]$ ssh-copy-id 172.25.30.1
[hadoop@hadoop1 hadoop]$ ssh 172.25.30.1
Last login: Sat Nov 11 10:42:19 2017 from 172.25.30.1
[hadoop@hadoop1 ~]$ logout
Connection to 172.25.30.1 closed.
[hadoop@hadoop1 tmp]$ cd /home/hadoop/hadoop-2.7.3
[hadoop@hadoop1 hadoop-2.7.3]$ bin/hdfs -format namenode #格式化文件系统
[hadoop@hadoop1 hadoop-2.7.3]$ which jps
~/java/bin/jps
[hadoop@hadoop1 hadoop-2.7.3]$ jps #查看进程,若此时没有,ps ax查看进程,看是否节点和数据全部启动
&此时浏览器可以访问到:9000端口(http://172.25.30.1:9000/)
50070端口(http://172.25.30.1:50070/dfshealth.html#tab-overview)
&&测试:并且查看数据存储状况http://172.25.30.1:50070/dfshealth.html#tab-overview
&&实现分布式
安装rpcbind并开启
共享文件
启动nfs服务
安装nfs-utils
创建用户,挂载
slave:定义数据节点
在1|2|3上安装rpcbind|nfs-utils
[root@hadoop2 ~]# yum install rpcbind -y
[root@hadoop2 ~]# /etc/init.d/rpcbind start
Starting rpcbind: [ OK ]
[root@hadoop1 ~]# yum install nfs-utils -y
[root@hadoop1 ~]# vim /etc/exports
/home/hadoop 172.25.30.0/255.255.255.0(rw,anonuid=900,anongid=900)
[root@hadoop1 ~]# /etc/init.d/nfs start
[root@hadoop1 ~]# exportfs -rv
exporting 172.25.30.0/255.255.255.0:/home/hadoop
[root@hadoop2 ~]# useradd -u 900 hadoop
[root@hadoop2 ~]# /etc/init.d/nfs start
[root@hadoop2 ~]# showmount -e 172.25.30.1
Export list for 172.25.30.1:
/home/hadoop 172.25.30.0/255.255.255.0
[root@hadoop2 ~]# mount 172.25.30.1:/home/hadoop/ /home/hadoop
[root@hadoop2 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root 19G 915M 17G 6% /
tmpfs 499M 0 499M 0% /dev/shm
/dev/vda1 485M 33M 427M 8% /boot
172.25.30.1:/home/hadoop/ 19G 2.2G 16G 13% /home/hadoop
[root@hadoop2 ~]#
[hadoop@hadoop1 ~]$ ssh 172.25.30.2
The authenticity of host '172.25.30.2 (172.25.30.2)' can't be established.
RSA key fingerprint is 80:1f:91:52:83:29:a2:19:79:56:16:91:b1:78:b1:48.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '172.25.30.2' (RSA) to the list of known hosts.
[hadoop@hadoop2 ~]$ ls
hadoop-2.7.3 hadoop-2.7.3.tar.gz java jdk1.7.0_79 jdk-7u79-linux-x64.tar.gz
[hadoop@hadoop2 ~]$ mkdir hadoop2 #实现数据同步
[hadoop@hadoop2 ~]$logout
[hadoop@hadoop1 ~]$ ls
hadoop2 hadoop-2.7.3.tar.gz jdk1.7.0_79
hadoop-2.7.3 java jdk-7u79-linux-x64.tar.gz
[root@hadoop2 ~]# ls
anaconda-ks.cfg install.log install.log.syslog
[root@hadoop2 ~]# su - hadoop
[hadoop@hadoop2 ~]$ ls
hadoop2 hadoop-2.7.3.tar.gz jdk1.7.0_79
hadoop-2.7.3 java jdk-7u79-linux-x64.tar.gz
[hadoop@hadoop2 ~]$
[root@hadoop3 ~]# su - hadoop
[hadoop@hadoop3 ~]$ ls
hadoop2 hadoop-2.7.3.tar.gz jdk1.7.0_79
hadoop-2.7.3 java jdk-7u79-linux-x64.tar.gz
[hadoop@hadoop1 hadoop]$ vim slaves
172.25.30.2
172.25.30.3
[hadoop@hadoop1 hadoop]$ vim hdfs-site.xml
<value>2</value> #节点数改为2
[hadoop@hadoop1 hadoop-2.7.3]$ bin/hdfs namenode -format
在线增加节点:
主机名同步,时间解析
增加虚拟机
[root@hadoop4 ~]# /etc/init.d/rpcbind start
[root@hadoop4 ~]# /etc/init.d/nfs start
[root@hadoop4 ~]# showmount -e 172.25.30.1
Export list for 172.25.30.1:
/home/hadoop 172.25.30.0/255.255.255.0
[root@hadoop4 ~]# useradd -u 900 hadoop #需要添加用户才能挂在成功
[root@hadoop4 ~]# mount 172.25.30.1:/home/hadoop/ /home/hadoop
[hadoop@hadoop4 hadoop]$ vim hdfs-site.xml #修改节点个数
5.删除节点
Hadoop数据流的读写
写:block的方式
mapreduce()
storm(流式)
YARN 资源管理器(ResourceManager)
1|5:nn