Hadoop
一,HDFS的安装
1) 准备虚拟机
-
更改ip
vi /etc/sysconfig/network-scripts/ifcfg-eth0 OnBoot--->yes BOOTPROTO=static IPADDR=192.168.40.20 (与虚拟机中的NAT模式中的IP一致(前三个字段一致))
2)安装JDK1.8
将Linux 版本的Jdk拉至系统的root目录下
> rpm -ivh jdk-8u71.....jar -C /usr/java 将jdk解压至usr下的java目录下
3)配置JAVA环境变量
export JAVA_HOME=/usr/java/jdk1.8.0_181
export PATH=$PATH:$JAVA_HOME/bin
4)配置主机名与IP的映射关系
配置主机名
vi /etc/hosts
192.168.40.20 HadoopNode00
IP的映射关系
vi /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=HadoopNode00 #主机名
5)关闭防火墙及其自启动
service iptables stop # 关闭防火墙
chkconfig iptables off #关闭防火墙开机自启动
6)ssh免密登录
ssh-keygen -t rsa #生成密钥
ssh-copy-id HadoopNode00
7)解压Hadoop
解压Hadoop到指定目录
mkdir /usr/hadoop/
tar -zxvf /home/hadoop/hadoop-2.6.0.tar.gz -C /home/hadoop
8)配置Hadoop环境变量
export HADOOP_HOME=/usr/hadoop/hadoop-2.6.0
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
9)配置etc/hadoop/core-site.xml 及etc/hadoop/hdfs-site.xml
<--配置core-site.xml-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://HadoopNode00:9000</value> <--配置hadoop的hdfs-->
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop-2.6.0/hadoop-${user.name}</value>
</property>
<--开启回收机制-->
<property>
<name>fs.trash.interval</name>
<value>1</value>
</property>
<--配置hdfs-site.xml-->
<property>
<name>dfs.replication</name> <--配置副本因子-->
<value>1</value>
</property>
10)格式化namenode
注意:第一次启动hdfs的时候,需要格式化namenode
###格式化namenode
[root@HadoopNode00 ~]# hdfs namenode -format
###文件树形结构
[root@HadoopNode00 ~]# tree /usr/hadoop/hadoop-2.6.0/hadoop-root
11)启动hdfs
[root@HadoopNode00 ~]# start-dfs.sh #开启hdfs
[root@HadoopNode00 ~]# stop-dfs.sh #关闭hdfs
进入web界面
http://主机名:50070
windows 下配置域名与ip的映射: C:\Windows\System32\drivers\etc \hosts
二,HDFS Shell的相关操作
1)hdfs shell
[root@HadoopNode00 ~]# hadoop fs #查看hdfs的所有的shell命令
Usage: hadoop fs [generic options]
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
[-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] <path> ...]
[-cp [-f] [-p | -p[topax]] [-d] <src> ... <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> ...]]
[-du [-s] [-h] [-x] <path> ...]
[-expunge]
[-find <path> ... <expression> ...]
[-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-getfacl [-R] <path>]
[-getfattr [-R] {-n name | -d} [-e en] <path>]
[-getmerge [-nl] [-skip-empty-file] <src> <localdst>]
[-help [cmd ...]]
[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
[-mkdir [-p] <path> ...]
[-moveFromLocal <localsrc> ... <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> ... <dst>]
[-put [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...]
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
[-setfattr {-n name [-v value] | -x name} <path>]
[-setrep [-R] [-w] <rep> <path> ...]
[-stat [format] <path> ...]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> ...]
[-touchz <path> ...]
[-truncate [-w] <length> <path> ...]
[-usage [cmd ...]]
Generic options supported are:
-conf <configuration file> specify an application configuration file
-D <property=value> define a value for a given property
-fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.
-jt <local|resourcemanager:port> specify a ResourceManager
-files <file1,...> specify a comma-separated list of files to be copied to the map reduce cluster
-libjars <jar1,...> specify a comma-separated list of jar files to be included in the classpath
-archives <archive1,...> specify a comma-separated list of archives to be unarchived on the compute machines
The general command line syntax is:
command [genericOptions] [commandOptions]
2) hadoop的基本操作shell命令
#(1)上传文件#
#将root目录下的test.txt文件上传至hdfs的根目录下 (若指定上传后的名字,则在最终的目录下加上要更改的名字)
[root@HadoopNode00 ~]# hadoop fs -put /root/test.txt /
#(2) ls文件(查看文件列表list see)#
##查看hdfs文件中的所有文件目录
[root@HadoopNode00 ~]# hadoop fs -ls /
#(3)下载文件#
[root@HadoopNode00 ~]# hadoop fs -get /test.txt /root/test1.txt ##从hdfs根目录下将test.txt文件下载至本地的root目录下并更名为test1.txt
#(4)删除文件
[root@HadoopNode00 ~]# hadoop fs -rm /test.txt #删除hdfs 根目录下的文件test.txt
#(5)查看文件
[root@HadoopNode00 ~]#hadoop fs -cat /test.txt #查看hdfs根目录下的文件test.txt的内容
#(6)创建文件夹
[root@HadoopNode00 ~]# hadoop fs -mkdir /Merle #在hdfs根目录下创建一个名为Merle的文件夹
#(7)复制文件
[root@HadoopNode00 ~]# hadoop fs -cp /test.txt /Merle/ #将根目录下的test.txt文件复制到Merle文件夹下