最新Hadoop大数据应用:Linux 部署 HDFS 分布式集群_linux操作hdfs,大数据开发开发者

img
img

网上学习资料一大堆,但如果学到的知识不成体系,遇到问题时只是浅尝辄止,不再深入研究,那么很难做到真正的技术提升。

需要这份系统化资料的朋友,可以戳这里获取

一个人可以走的很快,但一群人才能走的更远!不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!

fsck run a DFS filesystem checking utility
balancer run a cluster balancing utility
jmxget get JMX exported values from NameNode or DataNode.
mover run a utility to move block replicas across
storage types
oiv apply the offline fsimage viewer to an fsimage
oiv_legacy apply the offline fsimage viewer to an legacy fsimage
oev apply the offline edits viewer to an edits file
fetchdt fetch a delegation token from the NameNode
getconf get config values from configuration
groups get the groups which users belong to
snapshotDiff diff two snapshots of a directory or diff the
current directory contents with a snapshot
lsSnapshottableDir list all snapshottable dirs owned by the current user
Use -help to see options
portmap run a portmap service
nfs3 run an NFS version 3 gateway
cacheadmin configure the HDFS cache
crypto configure HDFS encryption zones
storagepolicies list/get/set block storage policies
version print the version

Most commands print help when invoked w/o parameters.


![](https://img-blog.csdnimg.cn/direct/32dc6059dcac43e490efc6abcb9016a1.png)


(14)格式化hdfs



[root@hadoop hadoop]# ./bin/hdfs namenode -format



![](https://img-blog.csdnimg.cn/direct/397957969c0a461480203d1b9c329cac.png)


![](https://img-blog.csdnimg.cn/direct/37263975f4d0468ebae16bbe2416931e.png)![](https://img-blog.csdnimg.cn/direct/9420b9ebce6b41be9e70e40ae286f5dd.png)


查看目录



[root@hadoop hadoop]# cd /var/hadoop/
[root@hadoop hadoop]# tree .
.
└── dfs
└── name
└── current
├── fsimage_0000000000000000000
├── fsimage_0000000000000000000.md5
├── seen_txid
└── VERSION

3 directories, 4 files


![](https://img-blog.csdnimg.cn/direct/56ec329649ac46dd8aa0ab73a0cc21a9.png)


(15) 启动集群


查看目录



[root@hadoop hadoop]# cd ~
[root@hadoop ~]# cd /usr/local/hadoop/
[root@hadoop hadoop]# ls


![](https://img-blog.csdnimg.cn/direct/9e4cece8ee37446d8fd4353e40822b31.png)


启动



[root@hadoop hadoop]# ./sbin/start-dfs.sh


![](https://img-blog.csdnimg.cn/direct/9fdcd194b1e84a0e83dc4229990804c1.png)


查看日志(新生成logs目录)



[root@hadoop hadoop]# cd logs/ ; ll


![](https://img-blog.csdnimg.cn/direct/962667c8495145dea7dc993674b96c30.png)


查看jps



[root@hadoop hadoop]# jps


![](https://img-blog.csdnimg.cn/direct/2187c609d19c4a07988bafc0f98b35ae.png)


datanode节点查看(node01)


![](https://img-blog.csdnimg.cn/direct/89087e2867bc4793ac459b6c63c9afe4.png)


![](https://img-blog.csdnimg.cn/direct/60d53597e46747d19025839a1bd3a007.png)


datanode节点查看(node02)


![](https://img-blog.csdnimg.cn/direct/09769bce91aa4c63aa6c17c99fa96ddb.png)


![](https://img-blog.csdnimg.cn/direct/01edf7d1b4004d57bd3c141984f32a8f.png)


datanode节点查看(node03)


![](https://img-blog.csdnimg.cn/direct/a53b57c7ec3a49baad1cfd814cc19c26.png)


![](https://img-blog.csdnimg.cn/direct/1b00c6dd32b946a28ec73a353ead6016.png)


(16)查看命令



[root@hadoop hadoop]# ./bin/hdfs dfsadmin
Usage: hdfs dfsadmin
Note: Administrative commands can only be run as the HDFS superuser.
[-report [-live] [-dead] [-decommissioning]]
[-safemode <enter | leave | get | wait>]
[-saveNamespace]
[-rollEdits]
[-restoreFailedStorage true|false|check]
[-refreshNodes]
[-setQuota …]
[-clrQuota …]
[-setSpaceQuota [-storageType ] …]
[-clrSpaceQuota [-storageType ] …]
[-finalizeUpgrade]
[-rollingUpgrade [<query|prepare|finalize>]]
[-refreshServiceAcl]
[-refreshUserToGroupsMappings]
[-refreshSuperUserGroupsConfiguration]
[-refreshCallQueue]
[-refresh host:ipc_port [arg1…argn]
[-reconfig <datanode|…> host:ipc_port <start|status>]
[-printTopology]
[-refreshNamenodes datanode_host:ipc_port]
[-deleteBlockPool datanode_host:ipc_port blockpoolId [force]]
[-setBalancerBandwidth ]
[-fetchImage ]
[-allowSnapshot ]
[-disallowSnapshot ]
[-shutdownDatanode <datanode_host:ipc_port> [upgrade]]
[-getDatanodeInfo <datanode_host:ipc_port>]
[-metasave filename]
[-triggerBlockReport [-incremental] <datanode_host:ipc_port>]
[-help [cmd]]

Generic options supported are
-conf specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|resourcemanager:port> specify a ResourceManager
-files specify comma separated files to be copied to the map reduce cluster
-libjars specify comma separated jar files to include in the classpath.
-archives specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is


![](https://img-blog.csdnimg.cn/direct/8381a265b8e04ef1b1b11cec9ff462bb.png)


(17)验证集群


查看报告,发现3个节点



[root@hadoop hadoop]# ./bin/hdfs dfsadmin -report
Configured Capacity: 616594919424 (574.25 GB)
Present Capacity: 598915952640 (557.78 GB)
DFS Remaining: 598915915776 (557.78 GB)
DFS Used: 36864 (36 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0


Live datanodes (3):

Name: 192.168.204.53:50010 (node03)
Hostname: node03
Decommission Status : Normal
Configured Capacity: 205531639808 (191.42 GB)
DFS Used: 12288 (12 KB)
Non DFS Used: 5620584448 (5.23 GB)
DFS Remaining: 199911043072 (186.18 GB)
DFS Used%: 0.00%
DFS Remaining%: 97.27%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Mar 14 10:30:18 CST 2024

Name: 192.168.204.51:50010 (node01)
Hostname: node01
Decommission Status : Normal
Configured Capacity: 205531639808 (191.42 GB)
DFS Used: 12288 (12 KB)
Non DFS Used: 6028849152 (5.61 GB)
DFS Remaining: 199502778368 (185.80 GB)
DFS Used%: 0.00%
DFS Remaining%: 97.07%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Mar 14 10:30:18 CST 2024

Name: 192.168.204.52:50010 (node02)
Hostname: node02
Decommission Status : Normal
Configured Capacity: 205531639808 (191.42 GB)
DFS Used: 12288 (12 KB)
Non DFS Used: 6029533184 (5.62 GB)
DFS Remaining: 199502094336 (185.80 GB)
DFS Used%: 0.00%
DFS Remaining%: 97.07%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Mar 14 10:30:18 CST 2024


![](https://img-blog.csdnimg.cn/direct/0ee57b3db1744faa9e56cb82bdbb3318.png)


(18)web页面验证



http://192.168.204.50:50070/


![](https://img-blog.csdnimg.cn/direct/a3b4cdae18044643acf2ea766cbb566a.png)



http://192.168.204.50:50090/


![](https://img-blog.csdnimg.cn/direct/856f7574d4044199a33bf686685f36a2.png)



http://192.168.204.51:50075/


![](https://img-blog.csdnimg.cn/direct/09fecbed1af94d3ca6fb32076c3fd2ca.png)


(19)访问系统


![](https://img-blog.csdnimg.cn/direct/a465aa1939b141b99a084ec972e3d517.png)


目前为空


![](https://img-blog.csdnimg.cn/direct/1f33ac1ed80148d6a964d6ba4e3e048d.png)



### 3.Linux 使用 HDFS 文件系统


(1)查看命令



[root@hadoop hadoop]# ./bin/hadoop
Usage: hadoop [–config confdir] [COMMAND | CLASSNAME]
CLASSNAME run the class named CLASSNAME
or
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar run a jar file
note: please use “yarn jar” to launch
YARN applications, not this command.
checknative [-a|-h] check native hadoop and compression libraries availability
distcp copy file or directories recursively
archive -archiveName NAME -p * create a hadoop archive
classpath prints the class path needed to get the
credential interact with credential providers
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
trace view and modify Hadoop tracing settings

Most commands print help when invoked w/o parameters.


![](https://img-blog.csdnimg.cn/direct/9c4937f5691f4e77b4277f7957df81bb.png)



[root@hadoop hadoop]# ./bin/hadoop fs
Usage: hadoop fs [generic options]
[-appendToFile … ]
[-cat [-ignoreCrc] …]
[-checksum …]
[-chgrp [-R] GROUP PATH…]
[-chmod [-R] <MODE[,MODE]… | OCTALMODE> PATH…]
[-chown [-R] [OWNER][:[GROUP]] PATH…]
[-copyFromLocal [-f] [-p] [-l] … ]
[-copyToLocal [-p] [-ignoreCrc] [-crc] … ]
[-count [-q] [-h] …]
[-cp [-f] [-p | -p[topax]] … ]
[-createSnapshot []]
[-deleteSnapshot ]
[-df [-h] [ …]]
[-du [-s] [-h] …]
[-expunge]
[-find … …]
[-get [-p] [-ignoreCrc] [-crc] … ]
[-getfacl [-R] ]
[-getfattr [-R] {-n name | -d} [-e en] ]
[-getmerge [-nl] ]
[-help [cmd …]]
[-ls [-d] [-h] [-R] [ …]]
[-mkdir [-p] …]
[-moveFromLocal … ]
[-moveToLocal ]
[-mv … ]
[-put [-f] [-p] [-l] … ]
[-renameSnapshot ]
[-rm [-f] [-r|-R] [-skipTrash] …]
[-rmdir [–ignore-fail-on-non-empty]

…]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} ]|[–set <acl_spec> ]]
[-setfattr {-n name [-v value] | -x name} ]
[-setrep [-R] [-w] …]
[-stat [format] …]
[-tail [-f] ]
[-test -[defsz] ]
[-text [-ignoreCrc] …]
[-touchz …]
[-truncate [-w] …]
[-usage [cmd …]]

Generic options supported are
-conf specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|resourcemanager:port> specify a ResourceManager
-files specify comma separated files to be copied to the map reduce cluster
-libjars specify comma separated jar files to include in the classpath.
-archives specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]


![](https://img-blog.csdnimg.cn/direct/6501a99a25834ef990911c212d7fb75e.png)


(2)查看文件目录



[root@hadoop hadoop]# ./bin/hadoop fs -ls /


![](https://img-blog.csdnimg.cn/direct/8c09717e72a04ad787578a06c093a141.png)


(3)创建文件夹



[root@hadoop hadoop]# ./bin/hadoop fs -mkdir /devops


![](https://img-blog.csdnimg.cn/direct/1cab41bd02de4dbcb22e44a5b7f04074.png)


查看


![](https://img-blog.csdnimg.cn/direct/86fa44e4cdd84e79930b26b435727b74.png)


查看web


![](https://img-blog.csdnimg.cn/direct/8cc05181fe9e4b1dad1fb76603cebeb5.png)


(4)上传文件



[root@hadoop hadoop]# ./bin/hadoop fs -put *.txt /devops/


![](https://img-blog.csdnimg.cn/direct/6825eedf7445490f94537b7fc74e1331.png)


查看



[root@hadoop hadoop]# ./bin/hadoop fs -ls /devops/


![](https://img-blog.csdnimg.cn/direct/ced588f7e08e4a569aab1119d2e347dd.png)


查看web



Permission Owner Group Size Last Modified Replication Block Size Name
-rw-r–r-- root supergroup 84.4 KB 2024/3/14 11:05:33 2 128 MB LICENSE.txt
-rw-r–r-- root supergroup 14.63 KB 2024/3/14 11:05:34 2 128 MB NOTICE.txt
-rw-r–r-- root supergroup 1.33 KB 2024/3/14 11:05:34 2 128 MB README.txt


![](https://img-blog.csdnimg.cn/direct/5e1ff5d4d8964c1191f6f0100def6ea5.png)


下载


![](https://img-blog.csdnimg.cn/direct/3e91a4e4747545b6a9855bb5c35d715f.png)



(5)创建文件



[root@hadoop hadoop]# ./bin/hadoop fs -touchz /tfile


![](https://img-blog.csdnimg.cn/direct/a0bb91610dbf46a0b3cee38d6f5003d8.png)


查看



[root@hadoop hadoop]# ./bin/hadoop fs -ls /


![](https://img-blog.csdnimg.cn/direct/95cf59d3a5a947cdae701215be076307.png)



(5)下载文件



[root@hadoop hadoop]# ./bin/hadoop fs -get /tfile /tmp/


![](https://img-blog.csdnimg.cn/direct/59527926c5f046668d9efed5255098ca.png)


查看



[root@hadoop hadoop]# ls -l /tmp/ | grep tfile


![](https://img-blog.csdnimg.cn/direct/0c2539482b334b9a8dfb1a4ef7e653cc.png)


查看web


![](https://img-blog.csdnimg.cn/direct/48df2600f0e5442a984f8609fb2c7c85.png)


(6) 查看命令比较


之前的设置


![](https://img-blog.csdnimg.cn/direct/283eec4e074246c1a4f4a0d8ad4b41c8.png)


所以查看功能相同



[root@hadoop hadoop]# ./bin/hadoop fs -ls /

[root@hadoop hadoop]# ./bin/hadoop fs -ls hdfs://hadoop:9000/


![](https://img-blog.csdnimg.cn/direct/031f343c09f747d6a17d6d5005213400.png)


另外官网默认是file ,使用的是本地文件目录


![](https://img-blog.csdnimg.cn/direct/24040906f8844d369d9156d17ccd593f.png)



[root@hadoop hadoop]# ./bin/hadoop fs -ls file:///


![](https://img-blog.csdnimg.cn/direct/86868b3025934643b99582932913f339.png)




## 二、问题


### 1.ssh-copy-id 报错


(1)报错



/usr/bin/ssh-copy-id: ERROR: ssh: connect to host hadoop port 22: Connection refused


![](https://img-blog.csdnimg.cn/direct/41380fdf70654ff28bc4c9539e0f0e01.png)


(2)原因分析


主机解析错误。


(3)解决方法


修改前:


![](https://img-blog.csdnimg.cn/direct/592756a4ec5a4e6fbabec1d96b6d3194.png)


修改后:


![](https://img-blog.csdnimg.cn/direct/296f980d58cf4c979221744bed185d8e.png)


成功:


![](https://img-blog.csdnimg.cn/direct/51374b5f3ed842918c4116c73ef93ce4.png)



### 2. 如何禁用ssh key 检测


(1)修改配置文件



[root@hadoop .ssh]# vim /etc/ssh/ssh_config


![](https://img-blog.csdnimg.cn/direct/a44e99a303c842b7a50b82a2eb822fcc.png)


添加配置



StrictHostKeyChecking no


![](https://img-blog.csdnimg.cn/direct/e52b7dc191564a229435a86ec935e281.png)


成功:


![](https://img-blog.csdnimg.cn/direct/e57085a874124de1b7a6f29a5476c2af.png)



### 3.HDFS有哪些配置文件


(1)配置文件



1)环境配置文件
hadoop-env.sh

2)核心配置文件
core-site.xml

3)HDFS配置文件
hdfs-site.xml

4)节点配置文件
slaves


### 4.hadoop查看版本报错


(1) 报错


![](https://img-blog.csdnimg.cn/direct/0713a972b0f74ff890613e722306fdba.png)


(2)原因分析


未申明JAVA环境。


(3)解决方法


申明JAVA环境。


查看



rpm -ql java-1.8.0-openjdk


![](https://img-blog.csdnimg.cn/direct/7b7ee4f82e1848059cea57a11f719963.png)


确定JAVA环境



/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.402.b06-1.el7_9.x86_64/jre


确定配置路径



/usr/local/hadoop/etc/hadoop


修改配置文件



[root@hadoop hadoop]# vim hadoop-env.sh


![](https://img-blog.csdnimg.cn/direct/0823be14dc2c45bc893df9fffeb81faf.png)



修改前:


![](https://img-blog.csdnimg.cn/direct/aeefba2167654fc8b164e1efd8dc7da6.png)


![](https://img-blog.csdnimg.cn/direct/f80af343915c4a2a924a1533966a9229.png)


修改后:


![](https://img-blog.csdnimg.cn/direct/334ccca052b74bb0af004ac2ba64b4dd.png)


![](https://img-blog.csdnimg.cn/direct/db949b42288b42a0a0b9e66f373a7a33.png)


成功:



[root@hadoop hadoop]# ./bin/hadoop version


![](https://img-blog.csdnimg.cn/direct/22a78178002f47e3943c35443b79f9aa.png)



### 5.启动集群报错


(1)报错


![](https://img-blog.csdnimg.cn/direct/36e06ccd9956451f9edd1439a994adb5.png)


(2)原因分析


ssh-copy-id 未对本地主机验证。


(3)解决方法


ssh-copy-id 对本地主机验证。



[root@hadoop hadoop]# ssh-copy-id hadoop


![](https://img-blog.csdnimg.cn/direct/816a44676be04282876b74c65c736e11.png)


如继续报错


![](https://img-blog.csdnimg.cn/direct/91d1a66826df4d50947509847deaf403.png)


需要停止Hadoop HDFS守护进程NameNode、SecondaryNameNode和DataNode



[root@hadoop hadoop]# ./sbin/stop-dfs.sh


![](https://img-blog.csdnimg.cn/direct/5beeebd4be1a4dea925ad5fff23020a7.png)


再次启动


![](https://img-blog.csdnimg.cn/direct/e88fac80e808455691c212a3783a6045.png)



### 6.hadoop 的启动和停止命令


(1)命令



sbin/start-all.sh 启动所有的Hadoop守护进程。包括NameNode、 Secondary NameNode、DataNode、ResourceManager、NodeManager
sbin/stop-all.sh 停止所有的Hadoop守护进程。包括NameNode、 Secondary NameNode、DataNode、ResourceManager、NodeManager
sbin/start-dfs.sh 启动Hadoop HDFS守护进程NameNode、SecondaryNameNode、DataNode
sbin/stop-dfs.sh 停止Hadoop HDFS守护进程NameNode、SecondaryNameNode和DataNode
sbin/hadoop-daemons.sh start namenode 单独启动NameNode守护进程
sbin/hadoop-daemons.sh stop namenode 单独停止NameNode守护进程
sbin/hadoop-daemons.sh start datanode 单独启动DataNode守护进程
sbin/hadoop-daemons.sh stop datanode 单独停止DataNode守护进程
sbin/hadoop-daemons.sh start secondarynamenode 单独启动SecondaryNameNode守护进程
sbin/hadoop-daemons.sh stop secondarynamenode 单独停止SecondaryNameNode守护进程
sbin/start-yarn.sh 启动ResourceManager、NodeManager
sbin/stop-yarn.sh 停止ResourceManager、NodeManager
sbin/yarn-daemon.sh start resourcemanager 单独启动ResourceManager
sbin/yarn-daemons.sh start nodemanager 单独启动NodeManager
sbin/yarn-daemon.sh stop resourcemanager 单独停止ResourceManager
sbin/yarn-daemons.sh stopnodemanager 单独停止NodeManager
sbin/mr-jobhistory-daemon.sh start historyserver 手动启动jobhistory
sbin/mr-jobhistory-daemon.sh stop historyserver 手动停止jobhistory


### 7.上传文件报错


(1)报错


![](https://img-blog.csdnimg.cn/direct/81b945f91e72423299480660201d407c.png)


(2)原因分析


命令错误


(3)解决方法


使用正确命令



[root@hadoop hadoop]# ./bin/hadoop fs -put *.txt /devops/


![](https://img-blog.csdnimg.cn/direct/6825eedf7445490f94537b7fc74e1331.png)



### 8.HDFS 使用命令




![img](https://img-blog.csdnimg.cn/img_convert/db854e48834604ef7cccf087ba67a759.png)
![img](https://img-blog.csdnimg.cn/img_convert/d82349324c5ec6cdf15cebdc21536a49.png)
![img](https://img-blog.csdnimg.cn/img_convert/d6e93c22e7bd30ae1f9e7d848ccaa403.png)

**既有适合小白学习的零基础资料,也有适合3年以上经验的小伙伴深入学习提升的进阶课程,涵盖了95%以上大数据知识点,真正体系化!**

**由于文件比较多,这里只是将部分目录截图出来,全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频,并且后续会持续更新**

**[需要这份系统化资料的朋友,可以戳这里获取](https://bbs.csdn.net/topics/618545628)**

/yarn-daemon.sh stop resourcemanager 单独停止ResourceManager
sbin/yarn-daemons.sh stopnodemanager 单独停止NodeManager
sbin/mr-jobhistory-daemon.sh start historyserver 手动启动jobhistory
sbin/mr-jobhistory-daemon.sh stop historyserver 手动停止jobhistory

7.上传文件报错

(1)报错

(2)原因分析

命令错误

(3)解决方法

使用正确命令

[root@hadoop hadoop]# ./bin/hadoop fs -put *.txt /devops/

8.HDFS 使用命令

[外链图片转存中…(img-2ZxX6fPG-1715106194183)]
[外链图片转存中…(img-t7WMEf4E-1715106194183)]
[外链图片转存中…(img-MyhNQSUU-1715106194183)]

既有适合小白学习的零基础资料,也有适合3年以上经验的小伙伴深入学习提升的进阶课程,涵盖了95%以上大数据知识点,真正体系化!

由于文件比较多,这里只是将部分目录截图出来,全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频,并且后续会持续更新

需要这份系统化资料的朋友,可以戳这里获取

  • 10
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值