hdfs大概流程和命令操作

一. hdfs大概流程

在这里插入图片描述

1.1 存储

1. client访问NameNode获取可以存储的空闲DataNode列表,并文件和DN做映射
2. client 根据split分割128M,进行存储到DataNode
3. DataNode之间相互传递备份

1.2 读取

1. client访问NameNode获取文件存储映射列表
2. client根据存储列表,进行读取DN下载

2. 命令操作

输入Hadoop fs就会显示全部命令,如下

root@hecs-x-large-2-linux-20200618145835:~# hadoop fs
Usage: hadoop fs [generic options]
	[-appendToFile <localsrc> ... <dst>]
	[-cat [-ignoreCrc] <src> ...]
	[-checksum <src> ...]
	[-chgrp [-R] GROUP PATH...]
	[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
	[-chown [-R] [OWNER][:[GROUP]] PATH...]
	[-copyFromLocal [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
	[-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] <path> ...]
	[-cp [-f] [-p | -p[topax]] [-d] <src> ... <dst>]
	[-createSnapshot <snapshotDir> [<snapshotName>]]
	[-deleteSnapshot <snapshotDir> <snapshotName>]
	[-df [-h] [<path> ...]]
	[-du [-s] [-h] [-x] <path> ...]
	[-expunge]
	[-find <path> ... <expression> ...]
	[-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-getfacl [-R] <path>]
	[-getfattr [-R] {-n name | -d} [-e en] <path>]
	[-getmerge [-nl] [-skip-empty-file] <src> <localdst>]
	[-help [cmd ...]]
	[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
	[-mkdir [-p] <path> ...]
	[-moveFromLocal <localsrc> ... <dst>]
	[-moveToLocal <src> <localdst>]
	[-mv <src> ... <dst>]
	[-put [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
	[-renameSnapshot <snapshotDir> <oldName> <newName>]
	[-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...]
	[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
	[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
	[-setfattr {-n name [-v value] | -x name} <path>]
	[-setrep [-R] [-w] <rep> <path> ...]
	[-stat [format] <path> ...]
	[-tail [-f] <file>]
	[-test -[defsz] <path>]
	[-text [-ignoreCrc] <src> ...]
	[-touchz <path> ...]
	[-truncate [-w] <length> <path> ...]
	[-usage [cmd ...]]

Generic options supported are:
-conf <configuration file>        specify an application configuration file
-D <property=value>               define a value for a given property
-fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.
-jt <local|resourcemanager:port>  specify a ResourceManager
-files <file1,...>                specify a comma-separated list of files to be copied to the map reduce cluster
-libjars <jar1,...>               specify a comma-separated list of jar files to be included in the classpath
-archives <archive1,...>          specify a comma-separated list of archives to be unarchived on the compute machines

The general command line syntax is:
command [genericOptions] [commandOptions]

1. appendToFile 文件内容追加

-appendToFile < localsrc > … < ds t>

# 1. vim testfile进行新建文件testfile  输入内容hello world
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# vim testfile
# 2. Hadoop上传文件testfile 到远程test文件中
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -put testfile /test
# 3. vim testfile2进行新建文件testfile2 append hello world
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# vim testfile2
# 4. Hadoop上传文件进行追加到test文件中
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -appendToFile testfile2 /test
# 5. 进行下载Hadoop远程文件test
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -get /test
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# ll
total 20
drwxr-xr-x 2 root root 4096 Jan 17 09:52 ./
drwxr-xr-x 3 root root 4096 Jan 17 09:49 ../
-rw-r--r-- 1 root root   31 Jan 17 09:52 test
-rw-r--r-- 1 root root   12 Jan 17 09:49 testfile
-rw-r--r-- 1 root root   19 Jan 17 09:51 testfile2
# 6. 查看文件内容
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# cat test
hello world
append hello world

2. 查看HDFS文件内容

-cat [-ignoreCrc] < src > …
该命令类似Linux中cat命令

# 查看test文件内容
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -cat /test
hello world
append hello world

3. 修改组,拥有者以及权限

[-chgrp [-R] GROUP PATH…]
[-chmod [-R] <MODE[,MODE]… | OCTALMODE> PATH…]
[-chown [-R] [OWNER][:[GROUP]] PATH…]

# 查看hdfs文件列表
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -ls /
Found 3 items
-rw-r--r--   1 root supergroup         31 2021-01-17 09:52 /test
drwx------   - root supergroup          0 2021-01-16 11:03 /tmp
drwxr-xr-x   - root supergroup          0 2021-01-16 11:03 /user

# 修改文件用户组
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -chgrp hadoop /test
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -ls /
Found 3 items
-rw-r--r--   1 root hadoop             31 2021-01-17 09:52 /test
drwx------   - root supergroup          0 2021-01-16 11:03 /tmp
drwxr-xr-x   - root supergroup          0 2021-01-16 11:03 /user

# 修改文件拥有者
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -chown haha /test
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -ls /
Found 3 items
-rw-r--r--   1 haha hadoop             31 2021-01-17 09:52 /test
drwx------   - root supergroup          0 2021-01-16 11:03 /tmp
drwxr-xr-x   - root supergroup          0 2021-01-16 11:03 /user

# 同时修改文件拥有者和用户组
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -chown haha2:hadoop2 /test
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -ls /
Found 3 items
-rw-r--r--   1 haha2 hadoop2            31 2021-01-17 09:52 /test
drwx------   - root  supergroup          0 2021-01-16 11:03 /tmp
drwxr-xr-x   - root  supergroup          0 2021-01-16 11:03 /user

# 修改文件权限
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -chmod 777 /test
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -ls /
Found 3 items
-rwxrwxrwx   1 haha2 hadoop2            31 2021-01-17 09:52 /test
drwx------   - root  supergroup          0 2021-01-16 11:03 /tmp
drwxr-xr-x   - root  supergroup          0 2021-01-16 11:03 /user

3. 从本地copy文件到hdfs、从hdfscopy文件到本地

[-copyFromLocal [-f] [-p] [-l] [-d] … ]
[-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] … ]

# 创建文件copytest
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# touch copytest
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# ll
total 20
drwxr-xr-x 2 root root 4096 Jan 17 10:17 ./
drwxr-xr-x 3 root root 4096 Jan 17 09:49 ../
-rw-r--r-- 1 root root    0 Jan 17 10:17 copytest
-rw-r--r-- 1 root root   31 Jan 17 09:52 test
-rw-r--r-- 1 root root   12 Jan 17 09:49 testfile
-rw-r--r-- 1 root root   19 Jan 17 09:51 testfile2
# 文件从本地copy到hdfs
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -copyFromLocal copytest /copyfile
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -ls /
Found 4 items
-rw-r--r--   1 root  supergroup          0 2021-01-17 10:19 /copyfile
-rwxrwxrwx   1 haha2 hadoop2            31 2021-01-17 09:52 /test
drwx------   - root  supergroup          0 2021-01-16 11:03 /tmp
drwxr-xr-x   - root  supergroup          0 2021-01-16 11:03 /user

# 从hdfs copy到本地
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -copyToLocal /copyfile copyfile2 
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# ll
total 20
drwxr-xr-x 2 root root 4096 Jan 17 10:22 ./
drwxr-xr-x 3 root root 4096 Jan 17 09:49 ../
-rw-r--r-- 1 root root    0 Jan 17 10:22 copyfile2
-rw-r--r-- 1 root root    0 Jan 17 10:17 copytest
-rw-r--r-- 1 root root   31 Jan 17 09:52 test
-rw-r--r-- 1 root root   12 Jan 17 09:49 testfile
-rw-r--r-- 1 root root   19 Jan 17 09:51 testfile2

4. hdfs文件从一个位置复制到另一个位置

[-cp [-f] [-p | -p[topax]] [-d] … ]

# 创建文件夹ha
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -mkdir /ha/
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -ls /
Found 5 items
-rw-r--r--   1 root  supergroup          0 2021-01-17 10:19 /copyfile
drwxr-xr-x   - root  supergroup          0 2021-01-17 10:26 /ha
-rwxrwxrwx   1 haha2 hadoop2            31 2021-01-17 09:52 /test
drwx------   - root  supergroup          0 2021-01-16 11:03 /tmp
drwxr-xr-x   - root  supergroup          0 2021-01-16 11:03 /user
# copyfile文件进行 copy到ha文件夹下
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -cp /copyfile /ha
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -ls /ha
Found 1 items
-rw-r--r--   1 root supergroup          0 2021-01-17 10:27 /ha/copyfile

5. 文件搜索

[-find … …]

root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -find /copy*
/copyfile

6. 文件下载

[-get [-f] [-p] [-ignoreCrc] [-crc] … ]

root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -get /test
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# ll
total 20
drwxr-xr-x 2 root root 4096 Jan 17 09:52 ./
drwxr-xr-x 3 root root 4096 Jan 17 09:49 ../
-rw-r--r-- 1 root root   31 Jan 17 09:52 test
-rw-r--r-- 1 root root   12 Jan 17 09:49 testfile
-rw-r--r-- 1 root root   19 Jan 17 09:51 testfile2

7. 创建目录、删除目录

[-mkdir [-p] …]
[-rm [-f] [-r|-R] [-skipTrash] [-safely] …]

root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -mkdir /ha/
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -ls /
Found 5 items
-rw-r--r--   1 root  supergroup          0 2021-01-17 10:19 /copyfile
drwxr-xr-x   - root  supergroup          0 2021-01-17 10:26 /ha
-rwxrwxrwx   1 haha2 hadoop2            31 2021-01-17 09:52 /test
drwx------   - root  supergroup          0 2021-01-16 11:03 /tmp
drwxr-xr-x   - root  supergroup          0 2021-01-16 11:03 /user

# rmdir只能删除空目录
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -rmdir /ha
rmdir: `/ha': Directory is not empty
# rm删除目录
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -rm -r /ha
Deleted /ha

8. 移动文件或目录

[-moveFromLocal … ]
[-moveToLocal ]
[-mv … ]

# hdfs内文件移动
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -mv /copyfile /ha
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -ls /
Found 4 items
drwxr-xr-x   - root  supergroup          0 2021-01-17 10:40 /ha
-rwxrwxrwx   1 haha2 hadoop2            31 2021-01-17 09:52 /test
drwx------   - root  supergroup          0 2021-01-16 11:03 /tmp
drwxr-xr-x   - root  supergroup          0 2021-01-16 11:03 /user
# 从hdfs copy到本地
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -moveFromLocal copyfile2 /
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# ll
total 20
drwxr-xr-x 2 root root 4096 Jan 17 10:41 ./
drwxr-xr-x 3 root root 4096 Jan 17 09:49 ../
-rw-r--r-- 1 root root    0 Jan 17 10:17 copytest
-rw-r--r-- 1 root root   31 Jan 17 09:52 test
-rw-r--r-- 1 root root   12 Jan 17 09:49 testfile
-rw-r--r-- 1 root root   19 Jan 17 09:51 testfile2
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -ls /
Found 5 items
-rw-r--r--   1 root  supergroup          0 2021-01-17 10:41 /copyfile2
drwxr-xr-x   - root  supergroup          0 2021-01-17 10:40 /ha
-rwxrwxrwx   1 haha2 hadoop2            31 2021-01-17 09:52 /test
drwx------   - root  supergroup          0 2021-01-16 11:03 /tmp
drwxr-xr-x   - root  supergroup          0 2021-01-16 11:03 /user
# 从本地copy到hdfs 该命令在当前版本未实现  格式正确
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -moveToLocal /ha .
moveToLocal: Option '-moveToLocal' is not implemented yet.

9. 上传文件

[-put [-f] [-p] [-l] [-d] … ]

# 上传文件到hdfs中
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -put testfile /test
root@hecs-x-large-2-linux-20200618145835:~/dong/test/hadooptest# hadoop fs -ls /
Found 3 items
-rw-r--r--   1 haha hadoop             31 2021-01-17 09:52 /test
drwx------   - root supergroup          0 2021-01-16 11:03 /tmp
drwxr-xr-x   - root supergroup          0 2021-01-16 11:03 /user
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Hadoop是一个开源的分布式计算框架,其中的Hadoop Distributed File System(HDFS)是其核心组件之一。HDFS是一个设计用于存储大规模数据的分布式文件系统,其目标是提供高可靠性、高性能和高可扩展性。下面对Hadoop 2.x HDFS的源码进行剖析。 HDFS的源码主要包含以下几个关键模块:NameNode、DataNode、BlockManager和FileSystem。 首先,NameNode是HDFS的主节点,负责管理文件系统的命名空间和元数据(例如文件的名称和位置等)。它通过解析客户端的请求,维护了一个表示文件和目录路径的层次结构,并使用高效的数据结构(如内存中的树状结构)来存储和管理元数据。 其次,DataNode是HDFS的工作节点,负责存储和处理实际的数据块。每个DataNode都与一个或多个存储介质(如磁盘)相连,可以提供数据的读取和写入操作。DataNode定期向NameNode报告其存储的数据块的状态,并接收来自NameNode的指令,如复制、移动和删除数据块。 BlockManager是NameNode的重要组成部分,负责管理数据块的复制和位置信息。它通过与DataNode的交互,监控和维护数据块的复制系数(即数据块的副本数),确保数据块的可靠性和可用性。 最后,FileSystem是用户与HDFS进行交互的接口。它提供了一系列的API和命令,例如创建、读取和写入文件等,以便用户可以对HDFS中的文件进行操作。 Hadoop 2.x HDFS的源码剖析主要涉及上述模块的实现细节,包括具体数据结构的设计和实现、请求处理的流程、数据块的复制策略以及与底层存储介质的交互等。剖析源码可以深入了解HDFS的内部工作原理,帮助开发者理解和优化系统的性能,同时也有助于扩展和改进HDFS的功能。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值