HADOOP - QUICK GUIDE-[2]-HDFS OVERVIEW
原文
https://www.tutorialspoint.com/hadoop/hadoop_quick_guide.htm
HDFS OVERVIEW
Features of HDFS
- 适合分布式存储和处理
- Hadoop提供和HDFS交互的命令接口
- 内嵌的namenode和datanode帮助很容易的检测集群状态
- 流式streaming获取文件系统数据
- HDFS提供数据一致性和权限管理
HDFS Architecture
master-slave architecture
Namenode
The namenode is the commodity hardware that contains the GNU/Linux operating system and the namenode software. It is a software that can be run on commodity hardware. The system having the namenode acts as Master server,does the following tasks:
- 管理文件系统的namespace
- 管理clients对文件的访问
- 执行文件操作,如renaming、closing、opening file and directories
Datanode
The datanode is a commodity hardware having the GNU/Linux operating system and datanode software.
主要管理数据存储:
- Datanode 在文件系统执行读写操作
- 其他操作如block create、deletion、根据namenode的指令执行replication
Block
HDFS可以读写的最小数据单元称为一个Block。默认的Block大小为64MB,每个节点可具体配置改变其大小。
Goals of HDFS
- Fault detection and recovery:由于HDFS包含大规模的商业硬件,failure的因素很多,HDFS负责及时发现错误并对其恢复。
- Huge datasets:一个集群有上百个nodes,需要数据复制备份,因此是巨大的datasets。
- Hardware at data: 当计算发生在数据附近时,请求任务可以高效地完成。特别是在涉及大量数据集时,它减少了网络流量并增加了吞吐量。
-
HDFS OPERATIONS
Starting HDFS
最初我们需要格式化配置的HDFS文件系统,启动namenode HDFS server,执行命令如下:
$ hadoop namenode -format
然后启动分布式文件系统,按下面命令作为集群启动namenode和data nodes
$ start-dfs.sh
Listing Files in HDFS
查看目录中的文件,以目录或filename作为参数
$HADOOP_HOME/bin/hadoop fs -ls
Inserting Data into HDFS
假设本地文件系统中有文件 file.txt ,以下操作将其insert到HDFS文件系统中:
Step 1
建立input文件夹
HADOOP_HOME/bin/hadoop fs -mkdir /user/input
Step 2
以put 命令将本地文件存储到HDFS文件系统中:
HADOOP_HOME/bin/hadoop fs -put /home/file.txt /us
Step 3
ls命令校验:
HADOOP_HOME/bin/hadoop fs -ls /user/input
Retrieving Data from HDFS
假设HDFS中有文件 outfile,下面示例如何在HDFS中获取需要的数据到本地:
Step 1
使用cat view data:
HADOOP_HOME/bin/hadoop fs -cat /user/output/outfile
Step 2
使用get命令将数据获取到本地:
<script type="math/tex" id="MathJax-Element-5"> </script>HADOOP_HOME/bin/hadoop fs -get /user/output/ /home/hadoop_tp/
Shutting Down the HDFS
$ stop-dfs.sh
COMMAND REFERENCE
HDFS Command Reference
不添加任何参数运行./bin/hadoop dfs 将列举所有其他命名,$HADOOP_HOME/bin/hadoop fs help commandName 将给出简要使用说明。
下面给出一些命令说明
"<path>" means any file or directory name.
"<path>..." means one or more file or directory names.
"<file>" means any filename.
"<src>" and "<dest>" are path names in a directed operation.
"<localSrc>" and "<localDest>" are paths as above, but on the local file system.
Command | Description |
---|---|
-ls <path> | 列举path下的文件内容, 包括names、permissions、owners、size、modification date |
-lsr <path> | 像ls,迭代path下所有文件 |
-du <path> | 磁盘大小(bytes) |
-dus <path> | 像-du,显示summary |
-mv <src> <dest> | 移动文件 |
-cp <src> <dest> | 复制文件 |
-rm <path> | removes文件或空目录,-r迭代删除,如删除文件夹 hdfs dfs -rm -r 目录 |
-rmr <path> | 迭代removes文件或目录 |
-put <localSrc> <dest> | 将本地文件放入DFS |
-copyFromLocal <localSrc> <dest> | 同put |
-moveFromLocal <localSrc> <dest> | 将本地文件放入DFS,成功后删除本地文件 |
-get [-crc] <src> <localDest> | 将DFS文件拷贝到本地 |
-getmerge <src> <localDest> | 将DFS文件拷贝到本地并合并为一个文件 |
-cat <filename> | 展示文件内容 |
-copyToLocal <src> <localDest> | 同get |
-moveToLocal <src> <localDest> | 同get,成功后删除HDFS中文件 |
-mkdir <path> | 建立目录(迭代) |
-setrep [-R] [-w] rep <path> | Sets the target replication factor for files identified by path to rep. |
-touchz <path> | path目录下以当前时间戳建立文件,如果文件存在失败(除非文件size=0),比如新建一个_SUCCESS文件,hadoop dfs -touchz /hdp/ddd/xxx.db/dddd/dt=20180222/_SUCCESS |
-test -[ezd] <path> | Returns 1 if path exists; has zero length; or is a directory or 0 otherwise. |
-stat [format] <path> | Prints information about path. Format is a string which accepts file size in blocks , filename , block size , replication , and modification date . |
-tail [-f] <file2name> | Shows the last 1KB of file on stdout. |
-chmod [-R] mode,mode,… <path>… | Changes the file permissions associated with one or more objects identified by path…. Performs changes recursively with R. mode is a 3-digit octal mode, or {augo}+/-{rwxX}. Assumes if no scope is specified and does not apply an umask. |
-chown [-R] [owner][: [group]] <path>… | Sets the owning user and/or group for files or directories identified by path…. Sets owner recursively if -R is specified. |
-chgrp [-R] group <path>… | Sets the owning group for files or directories identified by path…. Sets group recursively if -R is specified. |
-help <cmdname> | Returns usage information for one of the commands listed above. You must omit the leading ‘-’ character in cmd. |
-grep <cmdname> | 搜索查询,如 hadoop dfs -cat 文件 / grep <要搜索的字符>. |