Hadoop 读写流程和常用命令
1.读流程
-
client向分布式文件系统
DistributionFileSystem
发送读请求 -
分布式文件系统与NameNode进行Rpc通信
-
NameNode对文件是否存在,用户是否有权限等进行校验,校验如果成功,则向分布式系统返回一个
FsDataInputStream
对象(包含了文件由哪些块组成,block分布在哪些节点等等信息) -
client调用
FsDataInputStream
的read方法,开始读,主要逻辑是- client首先从与自己距离最近的节点开始读第一个块,
- 如果失败,则退回,并记录失败的节点,下次读的时候,不会再读这个节点,而挑选与自己近的另外节点进行读该block的其他副本
- 如果成功,开始读第二个block,重复上面的步骤,直到读完所有的block
- 关闭FsDataInputStream
2.写流程
- client向分布式文件系统
DistributionFileSystem
发送写请求 - 分布式文件系统与NameNode进行Rpc通信
- NameNode对文件是否存在,用户是否有权限等进行校验,如果校验通过,则返回一个
FsDataOutputStream
,值得注意的是,此时并不会存储任何此次文件block相关的信息(因为还没存储,需要等存储成功之后,datanode给反馈了才会存这些映射关系)- 如果文件存在,抛异常
- 权限不足,抛异常
- client调用
FsDataOutputStream
的write方法,开始读,主要逻辑是- 先在第一个距离近,cpu空闲的datanode进行第一个block的写入
- 第一个datanode写入第一个block成功后,所在的datanode会将该block复制一份给第二个datanode,待第二个datanode将此block写成功后,第二个datanode将自己的该block复制给第三个datanode,以此类推,当写入成功的副本数=设定值时,第一个block写入完成
- 开始第二个block的写入,重复上面的步骤,知道文件写完成
- 关闭
FsDataOutputStream
- 分布式文件系统
DistributionFileSystem
向NameNode进行通知写入情况
- client首先从与自己距离最近的节点开始读第一个块,
思考题:
- 1个DN,1个副本,DN挂了能不能读/写?
- 3DN,3副本,其中一个DN挂了,能不能读内容+写?
- 10DN,3副本,其中一个DN挂了,能不能读内容+写?
Answer:
1个DN,1个副本,DN挂了,肯定没法写了,读文件目录等信息,还是ok的,但是读文件内容是没法的
3个DN,3个副本,1个DN挂了,没法写,因为无法满足副本数=3,内容是可以读的,因为还有其他两个DN有每个block的副本
10个DN,3个副本,1个DN挂了,可以读写,资源富足
总结:
只要存活的DN>=副本数,可以写,否则不能写,读是需要看是读目录还是读内容,读能容只需要至少存在一个DN就行了,目录都能读
3.HDFS常用命令
# 调用文件系统
hadoop fs <args>
# 参数和linux几乎一模一样,列几个常见的
hadoop fs -mkdir /test
hadoop fs -put ./hello.log /test/
hadoop fs -cat /test/hello.log
hadoop fs -rm -f -r /test
# 验证本地库是否载入成功
hadoop checknative
# 打印classpath
hadoop classpath
# hdfs
# hdfs dfs=hadoop fs
#检查健康状态,从/开始检查
hdfs fsck /
# 格式话namenode
hdfs namenode -format
# 运行namenode
hdfs namenode
hdfs datanode
附录
Hadoop
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
CLASSNAME run the class named CLASSNAME
or
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
note: please use "yarn jar" to launch
YARN applications, not this command.
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
credential interact with credential providers
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
trace view and modify Hadoop tracing settings
hdfs
Usage: hdfs [--config confdir] [--loglevel loglevel] COMMAND
where COMMAND is one of:
dfs run a filesystem command on the file systems supported in Hadoop.
classpath prints the classpath
namenode -format format the DFS filesystem
secondarynamenode run the DFS secondary namenode
namenode run the DFS namenode
journalnode run the DFS journalnode
zkfc run the ZK Failover Controller daemon
datanode run a DFS datanode
dfsadmin run a DFS admin client
haadmin run a DFS HA admin client
fsck run a DFS filesystem checking utility
balancer run a cluster balancing utility
jmxget get JMX exported values from NameNode or DataNode.
mover run a utility to move block replicas across
storage types
oiv apply the offline fsimage viewer to an fsimage
oiv_legacy apply the offline fsimage viewer to an legacy fsimage
oev apply the offline edits viewer to an edits file
fetchdt fetch a delegation token from the NameNode
getconf get config values from configuration
groups get the groups which users belong to
snapshotDiff diff two snapshots of a directory or diff the
current directory contents with a snapshot
lsSnapshottableDir list all snapshottable dirs owned by the current user
Use -help to see options
portmap run a portmap service
nfs3 run an NFS version 3 gateway
cacheadmin configure the HDFS cache
crypto configure HDFS encryption zones
storagepolicies list/get/set block storage policies
version print the version
hadoop fs,hdfs dfs
Usage: hadoop fs [generic options]
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-count [-q] [-h] <path> ...]
[-cp [-f] [-p | -p[topax]] <src> ... <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> ...]]
[-du [-s] [-h] <path> ...]
[-expunge]
[-find <path> ... <expression> ...]
[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-getfacl [-R] <path>]
[-getfattr [-R] {-n name | -d} [-e en] <path>]
[-getmerge [-nl] <src> <localdst>]
[-help [cmd ...]]
[-ls [-d] [-h] [-R] [<path> ...]]
[-mkdir [-p] <path> ...]
[-moveFromLocal <localsrc> ... <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> ... <dst>]
[-put [-f] [-p] [-l] <localsrc> ... <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] <src> ...]
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
[-setfattr {-n name [-v value] | -x name} <path>]
[-setrep [-R] [-w] <rep> <path> ...]
[-stat [format] <path> ...]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> ...]
[-touchz <path> ...]
[-truncate [-w] <length> <path> ...]
[-usage [cmd ...]]