HDFS读写流程以及多节点、单节点磁盘负载均衡

副本放置策略

生产上建议3个副本。

fuben

第一个副本:

假如上传节点为DN节点,优先放置本节点;否则就随机挑选一台磁盘不太慢,CPU不太繁忙的节点。

第二个副本:

放置在于第一个副本的不同的机架的节点上

第三个副本:

放置于第二个副本相同机架的不同节点上

正常很多公司单独选择一个节点,作为client node,没有DN和NN,只有集群的XML文件,可以做通信,能知道数据提交到什么地方

CDH机架有一个默认机架,这个机架看成一个大的、虚拟的概念;CDH一般不调整这种默认机架。

文件写流程

hadoop fs -put xxx.log /
对于用户是无感知的,并不知道底层是怎么做的,运用到FSDataOutputStream对象

write

  1. 当执行hadoop fs -put xxx.log / 这条命令的时候,Client调用FileSystem.create(filePath)分布式文件系统的方法,也就是NN进行【rpc】通信,NN检测路径是否存在、否有权限取创建;假如路径存在,或者权限不足就会返回错误信息。
    假如ok,就创建一个新文件,不关联任何的block块,返回一个FSDataOutputStream对象(核心)
  2. Client调用FSDataOutputStream对象的write()方法,先将第一块的第一个副本写到第一个DN,第一个副本写完,就传输给第二个DN;第二个副本写完,就传输给第三个DN;第三个副本写完,就返回一个ack packet确认包给第二个DN,第二个DN接收到第三个的ack packet确认包,加上自身如果ok,就返回一个ack packet确认包给第一个DN,第一个DN接收到第二个DN的ack packet确认包加上自身ok,就返回ack packet确认包给FSDataOutputStream对象,标志第一个块3个副本写完。
    然后余下的块依次这样写。
  3. 当向文件写入数据完成后,Client调用FSDataOutputStream.close()方法,关闭输出流。
  4. 再调用FileSystem.complete()方法,告诉NN该文件写入成功。

文件读流程

read


运用到FSDataOutputStream对象

  1. Client调用FileSystem.open(filePath)方法,与NN进行【rpc】通信,返回该文件的部分或者全部的block列表(也就是返回FSDataInputStream对象)
  2. Client调用FSDataInputStream对象read()方法:

a. 与第一个块最近的DN进行read(优先是自己),读取完成后,会check;
假如成功,就关闭与当前DN的通信;假如失败,会记录失败块+DN信息,损坏的下次不会再读取;那么会去该块的第二个DN地址读取。
b. 然后去第二个块的最近的DN上通信读取,check后。
假如成功,就关闭与当前DN的通信;假如失败,会记录失败块+DN信息,损坏的下次不会再读取;那么会去该块的第三个DN地址读取。
c. 假如block列表读取完成后,文件还未结束,就再次调用FileSystem,会从NN获取该文件的下一批次的block列表。(这个操作感觉就是连续的数据流,对于客户端操作是透明无感知的)

  3. Client调用FSDataInputStream.close()方法,关闭输入流。

pid文件

默认存储在/tmp目录(但是tmp有30天的机制jps也会有影响)
处理方法:
mkdir /home/ruoze/tmp
chmod -R 777 /home/ruoze/tmp
hadoop-env.sh中修改:
export HADOOP_PID_DIR=/home/ruoze/tmp
yarn-env.sh中修改:
export YARN_PID_DIR=/home/ruoze/tmp

常规命令

hadoop fs==> hdfs dfs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
[mao@JD hadoop]$ hadoop fs
Usage: hadoop fs [generic options]
	[-appendToFile <localsrc> ... <dst>]
	[-cat [-ignoreCrc] <src> ...]
	[-checksum <src> ...]
	[-chgrp [-R] GROUP PATH...]
	[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
	[-chown [-R] [OWNER][:[GROUP]] PATH...]
	[-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
	[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-count [-q] [-h] [-v] [-x] <path> ...]
	[-cp [-f] [-p | -p[topax]] <src> ... <dst>]
	[-createSnapshot <snapshotDir> [<snapshotName>]]
	[-deleteSnapshot <snapshotDir> <snapshotName>]
	[-df [-h] [<path> ...]]
	[-du [-s] [-h] [-x] <path> ...]
	[-expunge]
	[-find <path> ... <expression> ...]
	[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-getfacl [-R] <path>]
	[-getfattr [-R] {-n name | -d} [-e en] <path>]
	[-getmerge [-nl] <src> <localdst>]
	[-help [cmd ...]]
	[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
	[-mkdir [-p] <path> ...]
	[-moveFromLocal <localsrc> ... <dst>]
	[-moveToLocal <src> <localdst>]
	[-mv <src> ... <dst>]
	[-put [-f] [-p] [-l] <localsrc> ... <dst>]
	[-renameSnapshot <snapshotDir> <oldName> <newName>]
	[-rm [-f] [-r|-R] [-skipTrash] <src> ...]
	[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
	[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
	[-setfattr {-n name [-v value] | -x name} <path>]
	[-setrep [-R] [-w] <rep> <path> ...]
	[-stat [format] <path> ...]
	[-tail [-f] <file>]
	[-test -[defsz] <path>]
	[-text [-ignoreCrc] <src> ...]
	[-touchz <path> ...]
	[-usage [cmd ...]]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

这些命令正常记住如下这些就可以了:

1
2
3
4
5
6
7
8
9
[-cat [-ignoreCrc] <src> ...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
[-mkdir [-p] <path> ...]
[-put [-f] [-p] [-l] <localsrc> ... <dst>]
[-rm [-f] [-r|-R] [-skipTrash] <src> ...]

但是切记检查生产环境是否开启回收站,CDH中回收站是默认是开启的

1
2
3
4
5
<property>
        <name>fs.trash.interval</name>
        <value>100</value>
    </property>
</configuration>
1
2
3
[mao@JD hadoop]$ hdfs dfs -rm /wordcount/input/1.log
19/12/06 00:21:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/12/06 00:21:16 INFO fs.TrashPolicyDefault: Moved: 'hdfs://JD:9000/wordcount/input/1.log' to trash at: hdfs://JD:9000/user/mao/.Trash/Current/wordcount/input/1.log

开了回收站,慎用 -skipTrash
这个命令一定不要用hdfs dfs -rm -skipTrash /rz.log!

一定使用hdfs dfs -rm /rz.log放入回收站,CDH默认保留保留7天,7天后会自动删除
fs.trash.interval 10080 7天(分钟为单位)

HDFS安全模式应用:(hdfs dfsadmin -safemode)

anquanmoshi


HDFS集群故障启动NN LOG显示,进入safe mode,正常手动让其离开安全模式。
手动这做很少,一般是在上游设置一个开关,把数据截流。

手动启动安全模式:hdfs dfsadmin -safemode enter

1
2
3
[mao@JD root]$ hdfs dfsadmin -safemode enter
19/12/06 18:54:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Safe mode is ON

安全模式:只对写有影响

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
[mao@JD software]$ hdfs dfs -put mao.log /
19/12/06 19:41:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
put: Cannot create file/mao.log._COPYING_. Name node is in safe mode.
[mao@JD software]$ hdfs dfs -ls /
19/12/06 19:42:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 3 items
drwx------   - mao supergroup          0 2019-12-02 19:40 /tmp
drwx------   - mao supergroup          0 2019-12-06 00:21 /user
drwxr-xr-x   - mao supergroup          0 2019-12-02 19:40 /wordcount
[mao@JD software]$ hdfs dfs -cat /wordcount/output1/part-r-00000
19/12/06 19:44:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1	2
2	2
3	2
a	3
aaaa	1
afei	1
b	1
bcd	1
c	1

离开安全模式的命令:hdfs dfsadmin -safemode leave

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
[mao@JD root]$ hdfs dfsadmin -safemode leave
19/12/06 19:56:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Safe mode is OFF

[mao@JD root]$ hdfs fsck /
19/12/06 19:57:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connecting to namenode via http://JD:50070/fsck?ugi=mao&path=%2F
FSCK started by mao (auth:SIMPLE) from /192.168.0.3 for path / at Fri Dec 06 19:57:07 CST 2019
......Status: HEALTHY
 Total size:	175079 B
 Total dirs:	18
 Total files:	6
 Total symlinks:		0
 Total blocks (validated):	5 (avg. block size 35015 B)
 Minimally replicated blocks:	5 (100.0 %)
 Over-replicated blocks:	0 (0.0 %)
 Under-replicated blocks:	0 (0.0 %)
 Mis-replicated blocks:		0 (0.0 %)
 Default replication factor:	1
 Average block replication:	1.0
 Corrupt blocks:		0
 Missing replicas:		0 (0.0 %)
 Number of data-nodes:		1
 Number of racks:		1
FSCK ended at Fri Dec 06 19:57:07 CST 2019 in 4 milliseconds

The filesystem under path '/' is HEALTHY

各DN节点的数据均衡:

1
2
3
4
5
[mao@JD sbin]$ start-balancer.sh 
starting balancer, logging to /home/mao/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-mao-balancer-JD.out
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
实际shell脚本的代码
[mao@JD root]$ cat /home/mao/app/hadoop/sbin/start-balancer.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#!/usr/bin/env bash

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

bin=`dirname "${BASH_SOURCE-$0}"`
bin=`cd "$bin"; pwd`

DEFAULT_LIBEXEC_DIR="$bin"/../libexec
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
. $HADOOP_LIBEXEC_DIR/hdfs-config.sh

# Start balancer daemon.

"$HADOOP_PREFIX"/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script "$bin"/hdfs start balancer $@

"$HADOOP_PREFIX"/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script “$bin”/hdfs start balancer $@

注意:
hdfs 命令里并没有start balancer这个命令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
[mao@JD root]$ hdfs
Usage: hdfs [--config confdir] COMMAND
       where COMMAND is one of:
  dfs                  run a filesystem command on the file systems supported in Hadoop.
  namenode -format     format the DFS filesystem
  secondarynamenode    run the DFS secondary namenode
  namenode             run the DFS namenode
  journalnode          run the DFS journalnode
  zkfc                 run the ZK Failover Controller daemon
  datanode             run a DFS datanode
  dfsadmin             run a DFS admin client
  diskbalancer         Distributes data evenly among disks on a given node
  haadmin              run a DFS HA admin client
  fsck                 run a DFS filesystem checking utility
  balancer             run a cluster balancing utility
  jmxget               get JMX exported values from NameNode or DataNode.
  mover                run a utility to move block replicas across
                       storage types
  oiv                  apply the offline fsimage viewer to an fsimage
  oiv_legacy           apply the offline fsimage viewer to an legacy fsimage
  oev                  apply the offline edits viewer to an edits file
  fetchdt              fetch a delegation token from the NameNode
  getconf              get config values from configuration
  groups               get the groups which users belong to
  snapshotDiff         diff two snapshots of a directory or diff the
                       current directory contents with a snapshot
  lsSnapshottableDir   list all snapshottable dirs owned by the current user
						Use -help to see options
  portmap              run a portmap service
  nfs3                 run an NFS version 3 gateway
  cacheadmin           configure the HDFS cache
  crypto               configure HDFS encryption zones
  storagepolicies      list/get/set block storage policies
  version              print the version

Most commands print help when invoked w/o parameters.

阈值:
threshold = 10.0
然后取used平均值:
90+60+80 = 230/3 = 76%
所有节点的磁盘used与集群的平均used之差要小于这个阈值
90-76=14
60-76=16
80-76=4

manyDN


而磁盘之间迁移多少是程序自己控制:
dfs.datanode.balance.bandwidthPerSec 30m

执行:(start-balancer.sh

1
2
3
4
5
6
[mao@JD root]$ start-balancer.sh
starting balancer, logging to /home/mao/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-mao-balancer-JD.out
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved 
[mao@JD root]$ start-balancer.sh -threshold 5
starting balancer, logging to /home/mao/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-mao-balancer-JD.out
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved

crontab job 每天凌晨调度

停止执行:(stop-balancer.sh

1
2
[mao@JD root]$ stop-balancer.sh
no balancer to stop

作用:
每天调度,数据平衡,毛刺修正,使得数据在一定的区间里

一个DN节点的多个磁盘的数据均衡:

官网说明:
Apache Hadoop 3.3.1 – HDFS Disk Balancer

比如:
df -h
/data01 90%
/data02 60%
/data03 80%
/data04 0%
hdfs-site.xml中的dfs.disk.balancer.enabled参数要设置为true

需要三步操作:

  1. hdfs diskbalancer -plan JD 生成JD.plan.json
  2. hdfs diskbalancer -execute JD.plan.json 执行
  3. hdfs diskbalancer -queryJD 查询状态

这个工作什么时候手动或调度执行?

  1. 新盘加入
  2. 监控服务器的磁盘剩余空间,小于阈值10%,发邮件预警,然后进行手动执行

多硬盘参数的配置:

dfs.datanode.data.dir /data01,/data02,/data03,/data04(默认是一块盘)
comma-delimited(中间用逗号分隔)

/data01 disk1
/data02 disk2
/data03 disk3

DN的生产上挂载多个物理的磁盘目录的目的:

为了高效率写、高效率读,提前规划好2-3年存储量,避免后期加磁盘维护的工作量

sigonDN

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值