1、HDFS
Hadoop分布式文件系统,为海量的数据提供了存储。
2、HDFS基本概念
(1)块
HDFS的文件被分成块进行存储,HDFS块的默认大小是64MB,块是文件存储处理的逻辑单元。
(2)节点
HDFS集群有两类节点,并以管理者-工作者模式运行,即一个NameNode(管理者,管理节点,存放元数据)和多个DataNode(工作者,工作节点,存放数据块)。
一个HDFS cluster包含一个NameNode和若干的DataNode,NameNode主要负责管理hdfs文件系统,DataNode主要是用来存储数据文件。
3、命令行操作
(1)创建目录,hadoop fs -mkdir /test
[root@localhost sbin]# hadoop fs -mkdir /test
(2)列出文件,hadoop fs -ls /
[root@localhost sbin]# hadoop fs -ls /
Found 1 items
drwxr-xr-x - root supergroup 0 2016-11-25 18:46 /test
(3)从本地系统拷贝文件到DFS,hadoop fs -put hadoop-env.sh /test/
[root@localhost hadoop]# hadoop fs -put hadoop-env.sh /test/
(4)显示文件内容,hadoop fs -cat /test/hadoop-env.sh
[root@localhost hadoop]# hadoop fs -cat /test/hadoop-env.sh
(5)从DFS拷贝文件到本地文件系统,hadoop fs -get /test/hadoop-env.sh hadoop-env2.sh
[root@localhost hadoop]# hadoop fs -get /test/hadoop-env.sh hadoop-env2.sh
16/11/25 18:57:02 WARN hdfs.DFSClient: DFSInputStream has been closed already
[root@localhost hadoop]# ls
capacity-scheduler.xml httpfs-env.sh mapred-queues.xml.template
configuration.xsl httpfs-log4j.properties mapred-site.xml
container-executor.cfg httpfs-signature.secret mapred-site.xml.template
core-site.xml httpfs-site.xml slaves
hadoop-env2.sh kms-acls.xml ssl-client.xml.example
hadoop-env.cmd kms-env.sh ssl-server.xml.example
hadoop-env.sh kms-log4j.properties yarn-env.cmd
hadoop-metrics2.properties kms-site.xml yarn-env.sh
hadoop-metrics.properties log4j.properties yarn-site.xml
hadoop-policy.xml mapred-env.cmd
hdfs-site.xml mapred-env.sh
(6)显示文件系统的基本数据,hadoop dfsadmin -report
[root@localhost hadoop]# hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Configured Capacity: 18746441728 (17.46 GB)
Present Capacity: 12141334709 (11.31 GB)
DFS Remaining: 12141318144 (11.31 GB)
DFS Used: 16565 (16.18 KB)
DFS Used%: 0.00%
Under replicated blocks: 1
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (1):
Name: 127.0.0.1:50010 (localhost)
Hostname: localhost
Decommission Status : Normal
Configured Capacity: 18746441728 (17.46 GB)
DFS Used: 16565 (16.18 KB)
Non DFS Used: 6605107019 (6.15 GB)
DFS Remaining: 12141318144 (11.31 GB)
DFS Used%: 0.00%
DFS Remaining%: 64.77%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Nov 25 18:49:47 CST 2016
(7)删除操作,hadoop fs -rm /test/hadoop-env.sh
[root@localhost sbin]# hadoop fs -rm /test/hadoop-env.sh
16/11/26 12:02:00 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /test/hadoop-env.sh