Hadoop技术原理:
Hdfs主要模块:NameNode、DataNode
Yarn主要模块:ResourceManager、NodeManager
常用命令:
1)用hadoop fs 操作hdfs网盘,使用Uri的格式访问
2)使用start-dfs.sh启动hdfs
HDFS主要模块及运行原理:
1)NameNode:
功能:是整个文件系统的管理节点。维护整个文件系统的文件目录树,文件/目录的元数据和
每个文件对应的数据块列表。接收用户的请求。
2)DataNode:
功能:是HA(高可用性)的一个解决方案,是备用镜像,但不支持热备
系统环境:
RHEL6.5 selinux and iptables is disabled
Hadoop 、jdk、zookeeper 程序使用 nfs 共享同步配置文件
软件版本:
hadoop-2.7.3.tar.gz zookeeper-3.4.9.tar.gz jdk-7u79-linux-x64.tar.gz
hbase-1.2.4-bin.tar.gz
hadoop单机版测试
[root@server1 ~]# useradd -u 800 hadoop 建立用户
[root@server1 ~]# passwd hadoop 设置密码
[root@server1 ~]# su hadoop
[hadoop@server1 ~]$ pwd
/home/hadoop
[hadoop@server1 ~]$ ls
hadoop-2.7.3.tar.gz jdk-7u79-linux-x64.tar.gz
hadoop的安装配置:
[hadoop@server1 ~]$ tar zxf jdk-7u79-linux-x64.tar.gz 解压
[hadoop@server1 ~]$ ls
hadoop-2.7.3.tar.gz jdk1.7.0_79 jdk-7u79-linux-x64.tar.gz
[hadoop@server1 ~]$ ln -s jdk
jdk1.7.0_79/ jdk-7u79-linux-x64.tar.gz
[hadoop@server1 ~]$ ln -s jdk1.7.0_79/ java 制作软链接
[hadoop@server1 ~]$ pwd
/home/hadoop
[hadoop@server1 ~]$ ls
hadoop-2.7.3.tar.gz java jdk1.7.0_79 jdk-7u79-linux-x64.tar.gz
[hadoop@server1 ~]$ tar zxf hadoop-2.7.3.tar.gz 解压
[hadoop@server1 ~]$ ls
hadoop-2.7.3 java jdk-7u79-linux-x64.tar.gz
hadoop-2.7.3.tar.gz jdk1.7.0_79
[hadoop@server1 ~]$ cd hadoop-2.7.3
[hadoop@server1 hadoop-2.7.3]$ ls
bin include libexec NOTICE.txt sbin
etc lib LICENSE.txt README.txt share
[hadoop@server1 hadoop-2.7.3]$ cd etc/hadoop/
[hadoop@server1 hadoop]$ ls
capacity-scheduler.xml kms-env.sh
configuration.xsl kms-log4j.properties
container-executor.cfg kms-site.xml
core-site.xml log4j.properties
hadoop-env.cmd mapred-env.cmd
hadoop-env.sh mapred-env.sh
hadoop-metrics2.properties mapred-queues.xml.template
hadoop-metrics.properties mapred-site.xml.template
hadoop-policy.xml slaves
hdfs-site.xml ssl-client.xml.example
httpfs-env.sh ssl-server.xml.example
httpfs-log4j.properties yarn-env.cmd
httpfs-signature.secret yarn-env.sh
httpfs-site.xml yarn-site.xml
kms-acls.xml
配置环境变量:
[hadoop@server1 hadoop]$ vim hadoop-env.sh
25 export JAVA_HOME=/home/hadoop/java
[hadoop@server1 hadoop]$ cd
[hadoop@server1 ~]$ vim .bash_profile
[hadoop@server1 ~]$ cat .bash_profile
10 PATH=$PATH:$HOME/bin:~/java/bin
[hadoop@server1 ~]$ source .bash_profile
配置成功可以调用java
[hadoop@server1 ~]$ java
[hadoop@server1 ~]$ javac
[hadoop@server1 ~]$ jps
6596 Jps
[hadoop@server1 ~]$ ls
hadoop-2.7.3 java jdk-7u79-linux-x64.tar.gz
hadoop-2.7.3.tar.gz jdk1.7.0_79
[hadoop@server1 ~]$ cd hadoop-2.7.3
[hadoop@server1 hadoop-2.7.3]$ ls
bin include libexec NOTICE.txt sbin
etc lib LICENSE.txt README.txt share
[hadoop@server1 hadoop-2.7.3]$ mkdir input
[hadoop@server1 hadoop-2.7.3]$ cp etc/hadoop/*.xml input/
[hadoop@server1 hadoop-2.7.3]$ ls input/
capacity-scheduler.xml hdfs-site.xml kms-site.xml
core-site.xml httpfs-site.xml yarn-site.xml
hadoop-policy.xml kms-acls.xml
[hadoop@server1 hadoop-2.7.3]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+' 调用jar包