一、HDFS介绍
-
基本介绍
- HDFS的全称是Hadoop Distributed File System ,Hadoop的 分布式 文件 系统
- 是一种允许文件通过网络在多台主机上分享的文件系统,可以让多台机器上的多个用户分享文件和存储空间
- HDFS是一种适合大文件存储的分布式文件系统,不适合小文件存储
-
设计思想
二、HDFS基础操作
- HDFS的shell
- 命令格式:bin/hdfs dfs -xxx scheme://authority/path
- 使用hadoop bin目录的hdfs命令,后面指定dfs,表示是操作分布式文件系统的,这些属于固定格式【若在path中配置了Hadoop的bin目录,则直接使用hdfs即可】
- xxx是一个占位符,具体我们想对hdfs做什么操作,就可以在这里指定对应的命令了
- HDFS的schema是hdfs,authority是集群中namenode所在节点的ip和对应的端口号,把ip换成主机名也是一样的,path是我们要操作的文件路径信息
- 其实后面这一长串内容就是core-site.xml配置文件中fs.defaultFS属性的值,这个代表的是HDFS的地址。
- 命令格式:bin/hdfs dfs -xxx scheme://authority/path
- 基础命令
-
hdfs dfs:查看帮助文档
[root@bigdata01 ~]# hdfs dfs Usage: hadoop fs [generic options] [-appendToFile <localsrc> ... <dst>] [-cat [-ignoreCrc] <src> ...] [-checksum <src> ...] [-chgrp [-R] GROUP PATH...] [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...] [-chown [-R] [OWNER][:[GROUP]] PATH...] [-copyFromLocal [-f] [-p] [-l] [-d] [-t <thread count>] <localsrc> ... <dst>] [-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>] [-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] [-e] <path> ...] [-cp [-f] [-p | -p[topax]] [-d] <src> ... <dst>] [-createSnapshot <snapshotDir> [<snapshotName>]] [-deleteSnapshot <snapshotDir> <snapshotName>] [-df [-h] [<path> ...]] [-du [-s] [-h] [-v] [-x] <path> ...] [-expunge] [-find <path> ... <expression> ...] [-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>] [-getfacl [-R] <path>] [-getfattr [-R] {-n name | -d} [-e en] <path>] [-getmerge [-nl] [-skip-empty-file] <src> <localdst>] [-head <file>] [-help [cmd ...]] [-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [-e] [<path> ...]] [-mkdir [-p] <path> ...] [-moveFromLocal <localsrc> ... <dst>] [-moveToLocal <src> <localdst>] [-mv <src> ... <dst>] [-put [-f] [-p] [-l] [-d] <localsrc> ... <dst>] [-renameSnapshot <snapshotDir> <oldName> <newName>] [-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...] [-rmdir [--ignore-fail-on-non-empty] <dir> ...] [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]] [-setfattr {-n name [-v value] | -x name} <path>] [-setrep [-R] [-w] <rep> <path> ...] [-stat [format] <path> ...] [-tail [-f] <file>] [-test -[defsz] <path>] [-text [-ignoreCrc] <src> ...] [-touch [-a] [-m] [-t TIMESTAMP ] [-c] <path> ...] [-touchz <path> ...] [-truncate [-w] <length> <path> ...] [-usage [cmd ...]] Generic options supported are: -conf <configuration file> specify an application configuration file -D <property=value> define a value for a given property -fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations. -jt <local|resourcemanager:port> specify a ResourceManager -files <file1,...> specify a comma-separated list of files to be copied to the map reduce cluster -libjars <jar1,...> specify a comma-separated list of jar files to be included in the classpath -archives <archive1,...> specify a comma-separated list of archives to be unarchived on the compute machines The general command line syntax is: command [genericOptions] [commandOptions]
-
hdfs dfs -ls:查看指定路径信息
[root@bigdata01 hadoop-3.2.0]# hdfs dfs -ls / Found 1 items -rw-r--r-- 2 root supergroup 1361 2022-02-25 18:24 /README.txt
-
hdfs dfs -ls -R:递归显示所有目录信息
[root@bigdata01 hadoop-3.2.0]# hdfs dfs -ls -R / -rw-r--r-- 2 root supergroup 1361 2022-02-25 18:24 /README.txt drwxr-xr-x - root supergroup 0 2022-02-25 18:29 /abc drwxr-xr-x - root supergroup 0 2022-02-25 18:29 /abc/xyz drwxr-xr-x - root supergroup 0 2022-02-25 18:28 /test
-
hdfs dfs -put:上传指定文件
[root@bigdata01 hadoop-3.2.0]# hdfs dfs -put README.txt /
-
hdfs dfs -get:下载指定文件
[root@bigdata01 hadoop-3.2.0]# hdfs dfs -get /README.txt . get: `README.txt': File exists [root@bigdata01 hadoop-3.2.0]# hdfs dfs -get /README.txt README.txt.bak [root@bigdata01 hadoop-3.2.0]# ll total 188 drwxr-xr-x. 2 1001 1002 203 Jan 8 2019 bin drwxr-xr-x. 3 1001 1002 20 Jan 8 2019 etc drwxr-xr-x. 2 1001 1002 106 Jan 8 2019 include drwxr-xr-x. 3 1001 1002 20 Jan 8 2019 lib drwxr-xr-x. 4 1001 1002 4096 Jan 8 2019 libexec -rw-rw-r--. 1 1001 1002 150569 Oct 19 2018 LICENSE.txt -rw-rw-r--. 1 1001 1002 22125 Oct 19 2018 NOTICE.txt -rw-rw-r--. 1 1001 1002 1361 Oct 19 2018 README.txt -rw-r--r--. 1 root root 1361 Feb 25 18:25 README.txt.bak drwxr-xr-x. 3 1001 1002 4096 Feb 25 15:53 sbin drwxr-xr-x. 4 1001 1002 31 Jan 8 2019 share
-
hdfs dfs -cat:查看指定文件
[root@bigdata01 hadoop-3.2.0]# hdfs dfs -cat /README.txt For the latest information about Hadoop, please visit our website at: http://hadoop.apache.org/ and our wiki, at: http://wiki.apache.org/hadoop/ This distribution includes cryptographic software. The country in which you currently reside may have restrictions on the import, possession, use, and/or re-export to another country, of encryption software. BEFORE using any encryption software, please check your country's laws, regulations and policies concerning the import, possession, or use, and re-export of encryption software, to see if this is permitted. See <http://www.wassenaar.org/> for more information. The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has classified this software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes information security software using or performing cryptographic functions with asymmetric algorithms. The form and manner of this Apache Software Foundation distribution makes it eligible for export under the License Exception ENC Technology Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, Section 740.13) for both object code and source code. The following provides more details on the included cryptographic software: Hadoop Core uses the SSL libraries from the Jetty project written by mortbay.org.
-
hdfs dfs -mkdir:创建文件夹
[root@bigdata01 hadoop-3.2.0]# hdfs dfs -mkdir /test [root@bigdata01 hadoop-3.2.0]# hdfs dfs -ls / Found 2 items -rw-r--r-- 2 root supergroup 1361 2022-02-25 18:24 /README.txt drwxr-xr-x - root supergroup 0 2022-02-25 18:28 /test
-
hdfs dfs -mkdir -p:递归创建多级目录
[root@bigdata01 hadoop-3.2.0]# hdfs dfs -mkdir -p /abc/xyz You have mail in /var/spool/mail/root [root@bigdata01 hadoop-3.2.0]# hdfs dfs -ls / Found 3 items -rw-r--r-- 2 root supergroup 1361 2022-02-25 18:24 /README.txt drwxr-xr-x - root supergroup 0 2022-02-25 18:29 /abc drwxr-xr-x - root supergroup 0 2022-02-25 18:28 /test
-
hdfs dfs -rm:删除文件
[root@bigdata01 hadoop-3.2.0]# hdfs dfs -rm /README.txt Deleted /README.txt [root@bigdata01 hadoop-3.2.0]# hdfs dfs -ls / Found 2 items drwxr-xr-x - root supergroup 0 2022-02-25 18:29 /abc drwxr-xr-x - root supergroup 0 2022-02-25 18:28 /test
-
hdfs dfs -rm -r:删除目录
[root@bigdata01 hadoop-3.2.0]# hdfs dfs -rm -r /test Deleted /test [root@bigdata01 hadoop-3.2.0]# hdfs dfs -rm -r /abc Deleted /abc You have mail in /var/spool/mail/root [root@bigdata01 hadoop-3.2.0]# hdfs dfs -ls / [root@bigdata01 hadoop-3.2.0]#
-