HADOOP - QUICK GUIDE-[2]-HDFS OVERVIEW

HADOOP - QUICK GUIDE-[2]-HDFS OVERVIEW

原文
https://www.tutorialspoint.com/hadoop/hadoop_quick_guide.htm

HDFS OVERVIEW

Features of HDFS

  • 适合分布式存储和处理
  • Hadoop提供和HDFS交互的命令接口
  • 内嵌的namenode和datanode帮助很容易的检测集群状态
  • 流式streaming获取文件系统数据
  • HDFS提供数据一致性和权限管理

HDFS Architecture

master-slave architecture

这里写图片描述

Namenode

The namenode is the commodity hardware that contains the GNU/Linux operating system and the namenode software. It is a software that can be run on commodity hardware. The system having the namenode acts as Master server,does the following tasks:

  • 管理文件系统的namespace
  • 管理clients对文件的访问
  • 执行文件操作,如renaming、closing、opening file and directories

Datanode

The datanode is a commodity hardware having the GNU/Linux operating system and datanode software.
主要管理数据存储:

  • Datanode 在文件系统执行读写操作
  • 其他操作如block create、deletion、根据namenode的指令执行replication

Block

HDFS可以读写的最小数据单元称为一个Block。默认的Block大小为64MB,每个节点可具体配置改变其大小。

Goals of HDFS

  • Fault detection and recovery:由于HDFS包含大规模的商业硬件,failure的因素很多,HDFS负责及时发现错误并对其恢复。
  • Huge datasets:一个集群有上百个nodes,需要数据复制备份,因此是巨大的datasets。
  • Hardware at data: 当计算发生在数据附近时,请求任务可以高效地完成。特别是在涉及大量数据集时,它减少了网络流量并增加了吞吐量。

-

HDFS OPERATIONS

Starting HDFS

最初我们需要格式化配置的HDFS文件系统,启动namenode HDFS server,执行命令如下:

$ hadoop namenode -format

然后启动分布式文件系统,按下面命令作为集群启动namenode和data nodes

$ start-dfs.sh

Listing Files in HDFS

查看目录中的文件,以目录或filename作为参数

$HADOOP_HOME/bin/hadoop fs -ls

Inserting Data into HDFS

假设本地文件系统中有文件 file.txt ,以下操作将其insert到HDFS文件系统中:

Step 1

建立input文件夹

HADOOP_HOME/bin/hadoop fs -mkdir /user/input

Step 2

以put 命令将本地文件存储到HDFS文件系统中:

HADOOP_HOME/bin/hadoop fs -put /home/file.txt /us

Step 3

ls命令校验:

HADOOP_HOME/bin/hadoop fs -ls /user/input

Retrieving Data from HDFS

假设HDFS中有文件 outfile,下面示例如何在HDFS中获取需要的数据到本地:

Step 1

使用cat view data:

HADOOP_HOME/bin/hadoop fs -cat /user/output/outfile

Step 2

使用get命令将数据获取到本地:

<script type="math/tex" id="MathJax-Element-5"> </script>HADOOP_HOME/bin/hadoop fs -get /user/output/ /home/hadoop_tp/

Shutting Down the HDFS

$ stop-dfs.sh

COMMAND REFERENCE

HDFS Command Reference

不添加任何参数运行./bin/hadoop dfs 将列举所有其他命名,$HADOOP_HOME/bin/hadoop fs ­help commandName 将给出简要使用说明。
下面给出一些命令说明

"<path>" means any file or directory name.  
"<path>..." means one or more file or directory names.  
"<file>" means any filename.  
"<src>" and "<dest>" are path names in a directed operation.  
"<localSrc>" and "<localDest>" are paths as above, but on the local file system.
CommandDescription
-ls <path>列举path下的文件内容,
包括names、permissions、owners、size、modification date
-lsr <path>像ls,迭代path下所有文件
-du <path>磁盘大小(bytes)
-dus <path>像-du,显示summary
-mv <src>
<dest>
移动文件
-cp <src>
<dest>
复制文件
-rm <path>removes文件或空目录,-r迭代删除,如删除文件夹 hdfs dfs -rm -r 目录
-rmr <path>迭代removes文件或目录
-put <localSrc>
<dest>
将本地文件放入DFS
-copyFromLocal <localSrc>
<dest>
同put
-moveFromLocal <localSrc>
<dest>
将本地文件放入DFS,成功后删除本地文件
-get [-crc] <src> <localDest>将DFS文件拷贝到本地
-getmerge <src> <localDest>将DFS文件拷贝到本地并合并为一个文件
-cat <filename>展示文件内容
-copyToLocal <src> <localDest>同get
-moveToLocal <src> <localDest>同get,成功后删除HDFS中文件
-mkdir <path>建立目录(迭代)
-setrep [-R] [-w] rep <path>Sets the target replication factor for files identified by path to rep.
-touchz <path>path目录下以当前时间戳建立文件,如果文件存在失败(除非文件size=0),比如新建一个_SUCCESS文件,hadoop dfs -touchz /hdp/ddd/xxx.db/dddd/dt=20180222/_SUCCESS
-test -[ezd] <path>Returns 1 if path exists; has zero length; or is a directory or 0 otherwise.
-stat [format] <path>Prints information about path. Format is a string which accepts file size in blocks , filename , block size , replication , and modification date .
-tail [-f] <file2name>Shows the last 1KB of file on stdout.
-chmod [-R] mode,mode,… <path>…Changes the file permissions associated with one or more objects identified by path…. Performs changes recursively with R. mode is a 3-digit octal mode, or {augo}+/-{rwxX}. Assumes if no scope is specified and does not apply an umask.
-chown [-R] [owner][: [group]] <path>…Sets the owning user and/or group for files or directories identified by path…. Sets owner recursively if -R is specified.
-chgrp [-R] group <path>…Sets the owning group for files or directories identified by path…. Sets group recursively if -R is specified.
-help <cmdname>Returns usage information for one of the commands listed above. You must omit the leading ‘-’ character in cmd.
-grep <cmdname>搜索查询,如 hadoop dfs -cat 文件 / grep <要搜索的字符>.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值