Hadoop 读写流程和常用命令

Hadoop 读写流程和常用命令

1.读流程

  1. client向分布式文件系统DistributionFileSystem发送读请求

  2. 分布式文件系统与NameNode进行Rpc通信

  3. NameNode对文件是否存在,用户是否有权限等进行校验,校验如果成功,则向分布式系统返回一个FsDataInputStream对象(包含了文件由哪些块组成,block分布在哪些节点等等信息)

  4. client调用FsDataInputStream的read方法,开始读,主要逻辑是

    • client首先从与自己距离最近的节点开始读第一个块,
      • 如果失败,则退回,并记录失败的节点,下次读的时候,不会再读这个节点,而挑选与自己近的另外节点进行读该block的其他副本
    • 如果成功,开始读第二个block,重复上面的步骤,直到读完所有的block
    • 关闭FsDataInputStream

    2.写流程

    1. client向分布式文件系统DistributionFileSystem发送写请求
    2. 分布式文件系统与NameNode进行Rpc通信
    3. NameNode对文件是否存在,用户是否有权限等进行校验,如果校验通过,则返回一个FsDataOutputStream,值得注意的是,此时并不会存储任何此次文件block相关的信息(因为还没存储,需要等存储成功之后,datanode给反馈了才会存这些映射关系)
      • 如果文件存在,抛异常
      • 权限不足,抛异常
    4. client调用FsDataOutputStream的write方法,开始读,主要逻辑是
      • 先在第一个距离近,cpu空闲的datanode进行第一个block的写入
      • 第一个datanode写入第一个block成功后,所在的datanode会将该block复制一份给第二个datanode,待第二个datanode将此block写成功后,第二个datanode将自己的该block复制给第三个datanode,以此类推,当写入成功的副本数=设定值时,第一个block写入完成
      • 开始第二个block的写入,重复上面的步骤,知道文件写完成
      • 关闭FsDataOutputStream
      • 分布式文件系统DistributionFileSystem向NameNode进行通知写入情况

思考题:

  • 1个DN,1个副本,DN挂了能不能读/写?
  • 3DN,3副本,其中一个DN挂了,能不能读内容+写?
  • 10DN,3副本,其中一个DN挂了,能不能读内容+写?

Answer:

1个DN,1个副本,DN挂了,肯定没法写了,读文件目录等信息,还是ok的,但是读文件内容是没法的

3个DN,3个副本,1个DN挂了,没法写,因为无法满足副本数=3,内容是可以读的,因为还有其他两个DN有每个block的副本

10个DN,3个副本,1个DN挂了,可以读写,资源富足

总结:

只要存活的DN>=副本数,可以写,否则不能写,读是需要看是读目录还是读内容,读能容只需要至少存在一个DN就行了,目录都能读


3.HDFS常用命令

# 调用文件系统
hadoop fs <args>
# 参数和linux几乎一模一样,列几个常见的
hadoop fs -mkdir /test
hadoop fs -put ./hello.log /test/
hadoop fs -cat /test/hello.log
hadoop fs -rm -f -r /test

# 验证本地库是否载入成功
hadoop checknative
# 打印classpath
hadoop classpath

# hdfs
# hdfs dfs=hadoop fs
#检查健康状态,从/开始检查
hdfs fsck /
# 格式话namenode
hdfs namenode -format 
# 运行namenode
hdfs namenode
hdfs datanode

附录

Hadoop

Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
  CLASSNAME            run the class named CLASSNAME
 or
  where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar <jar>            run a jar file
                       note: please use "yarn jar" to launch
                             YARN applications, not this command.
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  classpath            prints the class path needed to get the
  credential           interact with credential providers
                       Hadoop jar and the required libraries
  daemonlog            get/set the log level for each daemon
  trace                view and modify Hadoop tracing settings

hdfs

Usage: hdfs [--config confdir] [--loglevel loglevel] COMMAND
       where COMMAND is one of:
  dfs                  run a filesystem command on the file systems supported in Hadoop.
  classpath            prints the classpath
  namenode -format     format the DFS filesystem
  secondarynamenode    run the DFS secondary namenode
  namenode             run the DFS namenode
  journalnode          run the DFS journalnode
  zkfc                 run the ZK Failover Controller daemon
  datanode             run a DFS datanode
  dfsadmin             run a DFS admin client
  haadmin              run a DFS HA admin client
  fsck                 run a DFS filesystem checking utility
  balancer             run a cluster balancing utility
  jmxget               get JMX exported values from NameNode or DataNode.
  mover                run a utility to move block replicas across
                       storage types
  oiv                  apply the offline fsimage viewer to an fsimage
  oiv_legacy           apply the offline fsimage viewer to an legacy fsimage
  oev                  apply the offline edits viewer to an edits file
  fetchdt              fetch a delegation token from the NameNode
  getconf              get config values from configuration
  groups               get the groups which users belong to
  snapshotDiff         diff two snapshots of a directory or diff the
                       current directory contents with a snapshot
  lsSnapshottableDir   list all snapshottable dirs owned by the current user
						Use -help to see options
  portmap              run a portmap service
  nfs3                 run an NFS version 3 gateway
  cacheadmin           configure the HDFS cache
  crypto               configure HDFS encryption zones
  storagepolicies      list/get/set block storage policies
  version              print the version

hadoop fs,hdfs dfs

Usage: hadoop fs [generic options]
	[-appendToFile <localsrc> ... <dst>]
	[-cat [-ignoreCrc] <src> ...]
	[-checksum <src> ...]
	[-chgrp [-R] GROUP PATH...]
	[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
	[-chown [-R] [OWNER][:[GROUP]] PATH...]
	[-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
	[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-count [-q] [-h] <path> ...]
	[-cp [-f] [-p | -p[topax]] <src> ... <dst>]
	[-createSnapshot <snapshotDir> [<snapshotName>]]
	[-deleteSnapshot <snapshotDir> <snapshotName>]
	[-df [-h] [<path> ...]]
	[-du [-s] [-h] <path> ...]
	[-expunge]
	[-find <path> ... <expression> ...]
	[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-getfacl [-R] <path>]
	[-getfattr [-R] {-n name | -d} [-e en] <path>]
	[-getmerge [-nl] <src> <localdst>]
	[-help [cmd ...]]
	[-ls [-d] [-h] [-R] [<path> ...]]
	[-mkdir [-p] <path> ...]
	[-moveFromLocal <localsrc> ... <dst>]
	[-moveToLocal <src> <localdst>]
	[-mv <src> ... <dst>]
	[-put [-f] [-p] [-l] <localsrc> ... <dst>]
	[-renameSnapshot <snapshotDir> <oldName> <newName>]
	[-rm [-f] [-r|-R] [-skipTrash] <src> ...]
	[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
	[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
	[-setfattr {-n name [-v value] | -x name} <path>]
	[-setrep [-R] [-w] <rep> <path> ...]
	[-stat [format] <path> ...]
	[-tail [-f] <file>]
	[-test -[defsz] <path>]
	[-text [-ignoreCrc] <src> ...]
	[-touchz <path> ...]
	[-truncate [-w] <length> <path> ...]
	[-usage [cmd ...]]

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值