副本放置策略
生产上建议3个副本。
第一个副本:
假如上传节点为DN节点,优先放置本节点;否则就随机挑选一台磁盘不太慢,CPU不太繁忙的节点。
第二个副本:
放置在于第一个副本的不同的机架的节点上
第三个副本:
放置于第二个副本相同机架的不同节点上
正常很多公司单独选择一个节点,作为client node,没有DN和NN,只有集群的XML文件,可以做通信,能知道数据提交到什么地方
CDH机架有一个默认机架,这个机架看成一个大的、虚拟的概念;CDH一般不调整这种默认机架。
文件写流程
hadoop fs -put xxx.log /
对于用户是无感知的,并不知道底层是怎么做的,运用到FSDataOutputStream对象
- 当执行hadoop fs -put xxx.log / 这条命令的时候,Client调用FileSystem.create(filePath)分布式文件系统的方法,也就是NN进行【rpc】通信,NN检测路径是否存在、否有权限取创建;假如路径存在,或者权限不足就会返回错误信息。
假如ok,就创建一个新文件,不关联任何的block块,返回一个FSDataOutputStream对象(核心) - Client调用FSDataOutputStream对象的write()方法,先将第一块的第一个副本写到第一个DN,第一个副本写完,就传输给第二个DN;第二个副本写完,就传输给第三个DN;第三个副本写完,就返回一个ack packet确认包给第二个DN,第二个DN接收到第三个的ack packet确认包,加上自身如果ok,就返回一个ack packet确认包给第一个DN,第一个DN接收到第二个DN的ack packet确认包加上自身ok,就返回ack packet确认包给FSDataOutputStream对象,标志第一个块3个副本写完。
然后余下的块依次这样写。 - 当向文件写入数据完成后,Client调用FSDataOutputStream.close()方法,关闭输出流。
- 再调用FileSystem.complete()方法,告诉NN该文件写入成功。
文件读流程
运用到FSDataOutputStream对象
- Client调用FileSystem.open(filePath)方法,与NN进行【rpc】通信,返回该文件的部分或者全部的block列表(也就是返回FSDataInputStream对象)
- Client调用FSDataInputStream对象read()方法:
a. 与第一个块最近的DN进行read(优先是自己),读取完成后,会check;
假如成功,就关闭与当前DN的通信;假如失败,会记录失败块+DN信息,损坏的下次不会再读取;那么会去该块的第二个DN地址读取。
b. 然后去第二个块的最近的DN上通信读取,check后。
假如成功,就关闭与当前DN的通信;假如失败,会记录失败块+DN信息,损坏的下次不会再读取;那么会去该块的第三个DN地址读取。
c. 假如block列表读取完成后,文件还未结束,就再次调用FileSystem,会从NN获取该文件的下一批次的block列表。(这个操作感觉就是连续的数据流,对于客户端操作是透明无感知的)
3. Client调用FSDataInputStream.close()方法,关闭输入流。
pid文件
默认存储在/tmp目录(但是tmp有30天的机制jps也会有影响)
处理方法:
mkdir /home/ruoze/tmp
chmod -R 777 /home/ruoze/tmp
hadoop-env.sh中修改:
export HADOOP_PID_DIR=/home/ruoze/tmp
yarn-env.sh中修改:
export YARN_PID_DIR=/home/ruoze/tmp
常规命令
hadoop fs==> hdfs dfs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | [mao@JD hadoop]$ hadoop fs Usage: hadoop fs [generic options] [-appendToFile <localsrc> ... <dst>] [-cat [-ignoreCrc] <src> ...] [-checksum <src> ...] [-chgrp [-R] GROUP PATH...] [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...] [-chown [-R] [OWNER][:[GROUP]] PATH...] [-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>] [-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>] [-count [-q] [-h] [-v] [-x] <path> ...] [-cp [-f] [-p | -p[topax]] <src> ... <dst>] [-createSnapshot <snapshotDir> [<snapshotName>]] [-deleteSnapshot <snapshotDir> <snapshotName>] [-df [-h] [<path> ...]] [-du [-s] [-h] [-x] <path> ...] [-expunge] [-find <path> ... <expression> ...] [-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>] [-getfacl [-R] <path>] [-getfattr [-R] {-n name | -d} [-e en] <path>] [-getmerge [-nl] <src> <localdst>] [-help [cmd ...]] [-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]] [-mkdir [-p] <path> ...] [-moveFromLocal <localsrc> ... <dst>] [-moveToLocal <src> <localdst>] [-mv <src> ... <dst>] [-put [-f] [-p] [-l] <localsrc> ... <dst>] [-renameSnapshot <snapshotDir> <oldName> <newName>] [-rm [-f] [-r|-R] [-skipTrash] <src> ...] [-rmdir [--ignore-fail-on-non-empty] <dir> ...] [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]] [-setfattr {-n name [-v value] | -x name} <path>] [-setrep [-R] [-w] <rep> <path> ...] [-stat [format] <path> ...] [-tail [-f] <file>] [-test -[defsz] <path>] [-text [-ignoreCrc] <src> ...] [-touchz <path> ...] [-usage [cmd ...]] Generic options supported are -conf <configuration file> specify an application configuration file -D <property=value> use value for given property -fs <local|namenode:port> specify a namenode -jt <local|resourcemanager:port> specify a ResourceManager -files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster -libjars <comma separated list of jars> specify comma separated jar files to include in the classpath. -archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] |
这些命令正常记住如下这些就可以了:
1 2 3 4 5 6 7 8 9 | [-cat [-ignoreCrc] <src> ...] [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...] [-chown [-R] [OWNER][:[GROUP]] PATH...] [-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>] [-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>] [-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]] [-mkdir [-p] <path> ...] [-put [-f] [-p] [-l] <localsrc> ... <dst>] [-rm [-f] [-r|-R] [-skipTrash] <src> ...] |
但是切记检查生产环境是否开启回收站,CDH中回收站是默认是开启的
1 2 3 4 5 | <property> <name>fs.trash.interval</name> <value>100</value> </property> </configuration> |
1 2 3 | [mao@JD hadoop]$ hdfs dfs -rm /wordcount/input/1.log 19/12/06 00:21:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 19/12/06 00:21:16 INFO fs.TrashPolicyDefault: Moved: 'hdfs://JD:9000/wordcount/input/1.log' to trash at: hdfs://JD:9000/user/mao/.Trash/Current/wordcount/input/1.log |
开了回收站,慎用 -skipTrash
这个命令一定不要用hdfs dfs -rm -skipTrash /rz.log!
一定使用hdfs dfs -rm /rz.log放入回收站,CDH默认保留保留7天,7天后会自动删除
fs.trash.interval 10080 7天(分钟为单位)
HDFS安全模式应用:(hdfs dfsadmin -safemode)
HDFS集群故障启动NN LOG显示,进入safe mode,正常手动让其离开安全模式。
手动这做很少,一般是在上游设置一个开关,把数据截流。
手动启动安全模式:hdfs dfsadmin -safemode enter
1 2 3 | [mao@JD root]$ hdfs dfsadmin -safemode enter 19/12/06 18:54:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Safe mode is ON |
安全模式:只对写有影响
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | [mao@JD software]$ hdfs dfs -put mao.log / 19/12/06 19:41:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable put: Cannot create file/mao.log._COPYING_. Name node is in safe mode. [mao@JD software]$ hdfs dfs -ls / 19/12/06 19:42:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 3 items drwx------ - mao supergroup 0 2019-12-02 19:40 /tmp drwx------ - mao supergroup 0 2019-12-06 00:21 /user drwxr-xr-x - mao supergroup 0 2019-12-02 19:40 /wordcount [mao@JD software]$ hdfs dfs -cat /wordcount/output1/part-r-00000 19/12/06 19:44:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 1 2 2 2 3 2 a 3 aaaa 1 afei 1 b 1 bcd 1 c 1 |
离开安全模式的命令:hdfs dfsadmin -safemode leave
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | [mao@JD root]$ hdfs dfsadmin -safemode leave 19/12/06 19:56:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Safe mode is OFF [mao@JD root]$ hdfs fsck / 19/12/06 19:57:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Connecting to namenode via http://JD:50070/fsck?ugi=mao&path=%2F FSCK started by mao (auth:SIMPLE) from /192.168.0.3 for path / at Fri Dec 06 19:57:07 CST 2019 ......Status: HEALTHY Total size: 175079 B Total dirs: 18 Total files: 6 Total symlinks: 0 Total blocks (validated): 5 (avg. block size 35015 B) Minimally replicated blocks: 5 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 1 Average block replication: 1.0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 1 Number of racks: 1 FSCK ended at Fri Dec 06 19:57:07 CST 2019 in 4 milliseconds The filesystem under path '/' is HEALTHY |
各DN节点的数据均衡:
1 2 3 4 5 | [mao@JD sbin]$ start-balancer.sh starting balancer, logging to /home/mao/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-mao-balancer-JD.out Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved 实际shell脚本的代码 [mao@JD root]$ cat /home/mao/app/hadoop/sbin/start-balancer.sh |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | #!/usr/bin/env bash # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. bin=`dirname "${BASH_SOURCE-$0}"` bin=`cd "$bin"; pwd` DEFAULT_LIBEXEC_DIR="$bin"/../libexec HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR} . $HADOOP_LIBEXEC_DIR/hdfs-config.sh # Start balancer daemon. "$HADOOP_PREFIX"/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script "$bin"/hdfs start balancer $@ |
"$HADOOP_PREFIX"/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script “$bin”/hdfs start balancer $@
注意:
hdfs 命令里并没有start balancer这个命令
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | [mao@JD root]$ hdfs Usage: hdfs [--config confdir] COMMAND where COMMAND is one of: dfs run a filesystem command on the file systems supported in Hadoop. namenode -format format the DFS filesystem secondarynamenode run the DFS secondary namenode namenode run the DFS namenode journalnode run the DFS journalnode zkfc run the ZK Failover Controller daemon datanode run a DFS datanode dfsadmin run a DFS admin client diskbalancer Distributes data evenly among disks on a given node haadmin run a DFS HA admin client fsck run a DFS filesystem checking utility balancer run a cluster balancing utility jmxget get JMX exported values from NameNode or DataNode. mover run a utility to move block replicas across storage types oiv apply the offline fsimage viewer to an fsimage oiv_legacy apply the offline fsimage viewer to an legacy fsimage oev apply the offline edits viewer to an edits file fetchdt fetch a delegation token from the NameNode getconf get config values from configuration groups get the groups which users belong to snapshotDiff diff two snapshots of a directory or diff the current directory contents with a snapshot lsSnapshottableDir list all snapshottable dirs owned by the current user Use -help to see options portmap run a portmap service nfs3 run an NFS version 3 gateway cacheadmin configure the HDFS cache crypto configure HDFS encryption zones storagepolicies list/get/set block storage policies version print the version Most commands print help when invoked w/o parameters. |
阈值:
threshold = 10.0
然后取used平均值:
90+60+80 = 230/3 = 76%
所有节点的磁盘used与集群的平均used之差要小于这个阈值
90-76=14
60-76=16
80-76=4
而磁盘之间迁移多少是程序自己控制:
dfs.datanode.balance.bandwidthPerSec 30m
执行:(start-balancer.sh)
1 2 3 4 5 6 | [mao@JD root]$ start-balancer.sh starting balancer, logging to /home/mao/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-mao-balancer-JD.out Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved [mao@JD root]$ start-balancer.sh -threshold 5 starting balancer, logging to /home/mao/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-mao-balancer-JD.out Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved |
crontab job 每天凌晨调度
停止执行:(stop-balancer.sh)
1 2 | [mao@JD root]$ stop-balancer.sh no balancer to stop |
作用:
每天调度,数据平衡,毛刺修正,使得数据在一定的区间里
一个DN节点的多个磁盘的数据均衡:
官网说明:
Apache Hadoop 3.3.1 – HDFS Disk Balancer
比如:
df -h
/data01 90%
/data02 60%
/data03 80%
/data04 0%
hdfs-site.xml中的dfs.disk.balancer.enabled参数要设置为true
需要三步操作:
- hdfs diskbalancer -plan JD 生成JD.plan.json
- hdfs diskbalancer -execute JD.plan.json 执行
- hdfs diskbalancer -queryJD 查询状态
这个工作什么时候手动或调度执行?
- 新盘加入
- 监控服务器的磁盘剩余空间,小于阈值10%,发邮件预警,然后进行手动执行
多硬盘参数的配置:
dfs.datanode.data.dir /data01,/data02,/data03,/data04(默认是一块盘)
comma-delimited(中间用逗号分隔)
/data01 disk1
/data02 disk2
/data03 disk3
DN的生产上挂载多个物理的磁盘目录的目的:
为了高效率写、高效率读,提前规划好2-3年存储量,避免后期加磁盘维护的工作量