hadoop fs,hadoop dfs以及hdfs dfs区别:
The FileSystem (FS) shell is invoked by bin/hadoop fs. All the FS shell commands take path URIs as arguments. The URI format is scheme://autority/path. For HDFS the scheme is hdfs, and for the local filesystem the scheme is file. The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenodehost/parent/child or simply as /parent/child (given that your configuration is set to point to hdfs://namenodehost). Most of the commands in FS shell behave like corresponding Unix commands.?
The HDFS shell is invoked by bin/hadoop dfs. All the HDFS shell commands take path URIs as arguments. The URI format is scheme://autority/path. For HDFS the scheme is hdfs, and for the local filesystem the scheme is file. The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenode:namenodeport/parent/child or simply as /parent/child (given that your configuration is set to point to namenode:namenodeport). Most of the commands in HDFS shell behave like corresponding Unix commands.?
简单说:
hadoop fs:
FS relates to a generic file system which can point to any file systems like local, HDFS etc. So this can be used when you are dealing with different file systems such as Local FS, HFTP FS, S3 FS, and others
可以用于其他文件系统,不止是hdfs文件系统内,也就是说该命令的使用范围更广
hadoop dfs:
dfs is very specific to HDFS. would work for operation relates to HDFS. This has been deprecated and we should use hdfs dfs instead.
专门针对hdfs分布式文件系统
hdfs dfs:
same as hadoop dfs i.e would work for all the operations related to HDFS and is the recommended command instead of hadoop dfs
专门针对hdfs分布式文件系统,相比于hadoop dfs更为推荐,当使用hadoop dfs时内部会被转为hdfs dfs命令
hadoop fs –fs [local | <file system URI>]:
声明hadoop使用的文件系统,如果不声明的话,使用当前配置文件配置的,按如下顺序查找:
-
hadoop jar里的hadoop-default.xml
-
$HADOOP_CONF_DIR下的hadoop-default.xml
-
$HADOOP_CONF_DIR下的hadoop-site.xml。使用local代表将本地文件系统作为hadoop的DFS。
-
如果传递uri做参数,那么就是特定的文件系统作为DFS。
[hadoop@master ~]$ hadoop fs -fs local -ls /home
19/01/20 23:33:29 WARN fs.FileSystem: “local” is a deprecated filesystem name. Use “file:///” instead.
Found 2 items
drwxr-xr-x - root root 4096 2019-01-20 23:32 /home/a
drwx------ - hadoop hadoop 4096 2019-01-20 23:32 /home/hadoop -
hadoop fs –ls <path>:
等同于本地系统的ls,列出在指定目录下的文件内容,(第一列:文件权限,第二列:文件的备份数,第三列:所属用户,第四列:所属组,第五列:文件大小,第六列:最后修改日期,第七列:文件或目录)
hadoop fs –lsr <path>:
递归列出匹配pattern的文件信息,类似ls,只不过递归列出所有子目录信息。
hadoop fs –du <path>:
列出匹配pattern的指定的文件系统空间总量(单位bytes)
hadoop fs –dus <path>:
等价于-du,输出格式也相同,只不过等价于unix的du -sb。
hadoop fs –mv <src> <dst>:
将制定格式的文件 move到指定的目标位置。
hadoop fs –cp <src> <dst>:
拷贝文件到目标位置
hadoop fs –rm [-skipTrash] <src>:
删除匹配pattern的指定文件
hadoop fs –rmr [-skipTrash] <src>:
递归删掉所有的文件和目录
hadoop fs –put <localsrc> … <dst>:
从本地系统拷贝文件到DFS
[hadoop@slave1 ~]$ hadoop fs -put /home/hadoop/core.23619 /home/
[hadoop@slave1 ~]$ hadoop fs -ls /home/Found 2 items
-rw-r–r-- 2 hadoop supergroup 4710400 2019-01-20 23:08 /home/core.23619 drwx-wx-wx - hadoop supergroup 0
2018-04-07 02:22 /home/hadoop
hadoop fs –copyFromLocal <localsrc> … <dst>:
等价于-put
[hadoop@slave1 ~]$ hadoop fs -rm /home/core.23619
19/01/20 23:09:40 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /home/core.23619
[hadoop@slave1 ~]$ hadoop fs -copyFromLocal /home/hadoop/core.23619 /home/
[hadoop@slave1 ~]$ hadoop fs -ls /home/
Found 2 items
-rw-r–r-- 2 hadoop supergroup 4710400 2019-01-20 23:10 /home/core.23619
drwx-wx-wx - hadoop supergroup 0 2018-04-07 02:22 /home/hadoop
hadoop fs –moveFromLocal <localsrc> … <dst>:
等同于-put,只不过源文件在拷贝后被删除
hadoop fs –get [-ignoreCrc] [-crc] <src> <localdst>:
从DFS拷贝文件到本地文件系统,文件匹配pattern
hadoop fs –getmerge <src> <localdst>:
把hdfs里面的多个文件合并成一个文件,合并后文件位于本地系统
hadoop fs –copyToLocal [-ignoreCrc] [-crc] <src> <localdst>:
等价于-get
hadoop fs –cat <src>:
展示文件内容
hdfs fs -text <file>
如果文件是文本格式,相当于cat,如果文件是压缩格式,则会先解压,再查看
hdfs fs -tail <file>
查看dir目录下面a.txt文件的最后1000字节
hadoop fs –mkdir <path>:
在指定位置创建目录
hadoop fs –setrep [-R] [-w] <rep> <path/file>:
设置文件的备份级别,-R标志控制是否递归设置子目录及文件
hadoop fs –chmod [-R] <MODE[,MODE]…|OCTALMODE> PATH…:
修改文件的权限,-R标记递归修改
hadoop fs -chown [-R] [OWNER][:[GROUP]] PATH…:
修改文件的所有者和组,-R表示递归
hadoop fs -chgrp [-R] GROUP PATH…:
等价于-chown … :GROUP …
hadoop fs –count[-q] <path>:
计数文件个数及所占空间的详情
hdfs dfs -expunge
清空回收站hdfs dfsadmin -safemode get
获取当前安全模式
hdfs dfsadmin -safemode enter
进入安全模式
hdfs dfsadmin -sfaemode leave
离开安全模式
Hadoop dfsadmin –refreshNodes
删除节点时使用
这个文件在Hadoop启动时就要配置好才行
<property> <name>dfs.hosts.exclude</name> <value>exclude_file</value> </property>
注:Exclude_file中加入要去除的节点
hdfs dfsadmin -printTopology
打印机架和机架上的节点信息
hdfs dfsadmin -setBalancerBandwidth
设置负载均衡带宽
hdfs dfs -setrep -w 副本数 -R path
设置文件的副本数
hdfs dfsadmin -report
可以快速定位出哪些节点down掉了,HDFS的容量以及使用了多少,以及每个节点的硬盘使用情况。
start-dfs.sh
启动namenode,datanode,启动文件系统
stop-dfs.sh
关闭文件系统
start-yarn.sh
启动resourcemanager,nodemanager
stop-yarn.sh
关闭resourcemanager,nodemanager
start-all.sh
启动hdfs,yarn
stop-all.sh
关闭hdfs,yarn
hdfs-daemon.sh start datanode
单独启动datanode
start-balancer.sh -t 10%
启动负载均衡
hdfs namenode -format
格式化文件系统
- hadoop jar file.jar 执行jar包程序
- hadoop job -kill job_201901210937_0052 杀死正在执行的jar包程序
- hadoop job -submit <job-file> 提交作业
- hadoop job -status <job-id> 打印map和reduce完成百分比和所有计数器
hadoop job -counter <job-id> <group-name> <counter-name> 打印计数器的值
hadoop job -kill <job-id> 杀死指定作业
hadoop job -history <jobOutputDir>
打印作业的细节、失败及被杀死原因的细节。更多的关于一个作业的细节比如成功的任务,做过的任务尝试等信息可以通过指定[all]选项查看
-
hadoop job -list [all] 显示所有作业,-list只显示将要完成的作业。
-
hadoop job -kill -task <task-id> 杀死任务,被杀死的任务不会不利于失败尝试。
-
hadoop job -fail -task <task-id> 使任务失败,被失败的任务会对失败尝试不利。
-
hdfs fsck <path> -move 移动受损文件到/lost+found
-
hdfs fsck <path> -delete 删除受损文件
-
hdfs fsck <path> -openforwrite 打印出写打开的文件
-
hdfs fsck <path> -files 打印出正被检查的文件
-
hdfs fsck <path> -blocks 打印出块信息报告
-
hdfs fsck <path> -locations 打印出每个块的位置信息
-
hdfs fsck <path> -racks 打印出data-node的网络拓扑结构