一、Hdfs磁盘存储策略
1、 指定本地目录存储策略
data目录为Hot策略对应DISK;
data1目录为Cold策略对应ARCHIVE;
dfs.datanode.data.dir
[DISK]/opt/beh/data/namenode/dfs/data,[ARCHIVE]/opt/beh/data/namenode/dfs/data1
重启hdfs
$ stop-dfs.sh
$ start-dfs.sh
2、指定hdfs目录的存储策略
查看hdfs存储策略
$ hdfs storagepolicies -listPolicies
Block Storage Policies:
BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}
BlockStoragePolicy{WARM:5, storageTypes=[DISK, ARCHIVE], creationFallbacks=[DISK, ARCHIVE], replicationFallbacks=[DISK, ARCHIVE]}
BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
BlockStoragePolicy{ONE_SSD:10, storageTypes=[SSD, DISK], creationFallbacks=[SSD, DISK], replicationFallbacks=[SSD, DISK]}
BlockStoragePolicy{ALL_SSD:12, storageTypes=[SSD], creationFallbacks=[DISK], replicationFallbacks=[DISK]}
BlockStoragePolicy{LAZY_PERSIST:15, storageTypes=[RAM_DISK, DISK], creationFallbacks=[DISK], replicationFallbacks=[DISK]}
创建2个hdfs目录
$ hadoop fs -mkdir /Cold_data
$ hadoop fs -mkdir /Hot_data
指定hdfs目录存储策略
$ hdfs storagepolicies -setStoragePolicy -path hdfs://breath:9000/Cold_data -policy COLD
Set storage policy COLD on hdfs://breath:9000/Cold_data
$ hdfs storagepolicies -setStoragePolicy -path hdfs://breath:9000/Hot_data -policy HOT
Set storage policy HOT on hdfs://breath:9000/Hot_data
查看2个目录的存储策略是否正确
$ hdfs storagepolicies -getStoragePolicy -path /Cold_data
The storage policy of /Cold_data:
BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}
$ hdfs storagepolicies -getStoragePolicy -path /Hot_data
The storage policy of /Hot_data:
BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
3、存储测试
查看未上传文件存储目录的大小
$ cd /opt/beh/data/namenode/dfs
$ du -sh *
38M data
16K data1
30M name
14M namesecondary
生成一个1000M大小的文件
$ dd if=/dev/zero of=test.txt bs=1000M count=1
记录了1+0 的读入
记录了1+0 的写出
1048576000字节(1.0 GB)已复制,3.11214 秒,337 MB/秒
将生成的文件上传到/Cold_data目录
$ hadoop fs -put test.txt /Cold_data
[x] 查看此时存储目录的大小
$ du -sh *
38M data
1008M data1
30M name
14M namesecondary
4、测试结果说明
上传的文件全部存储在了data1目录下
因为hdfs上的/Cold_data指定的是COLD 策略,与hdfs-site.xml里面ARCHIVE策略的data1目录相对应,所以文件存储达到了测试目的
二、Hdfs预留空间配置
1、参数修改
修改hdfs-site.xml配置文件,添加参数
dfs.datanode.du.reserved
32212254720
修改dfs.datanode.data.dir,只保留一个本地存储目录;
-重启hdfs
$ stop-dfs.sh
$ start-dfs.sh
2、上传文件
查看磁盘空间
$ df -h
文件系统 容量 已用 可用 已用% 挂载点
/dev/mapper/centos-root 46G 14G 32G 31% /
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 8.5M 7.8G 1% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/vda1 497M 125M 373M 25% /boot
tmpfs 1.6G 0 1.6G 0% /run/user/0
tmpfs 1.6G 0 1.6G 0% /run/user/1000
往hdfs上上传文件,一次上传一个2G大小的文件
$ hadoop fs -put test1.txt /Cold_data/test1.txt
$ hadoop fs -put test1.txt /Cold_data/test2.txt
。
。
。
$ hadoop fs -put test1.txt /Cold_data/test7.txt
$ hadoop fs -put test1.txt /Cold_data/test8.txt
16/11/12 16:30:54 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.io.EOFException: Premature EOF: no length prefix available
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2239)
at org.apache.hadoop.hdfs.DFSOutputStream
D
a
t
a
S
t
r
e
a
m
e
r
.
c
r
e
a
t
e
B
l
o
c
k
O
u
t
p
u
t
S
t
r
e
a
m
(
D
F
S
O
u
t
p
u
t
S
t
r
e
a
m
.
j
a
v
a
:
1451
)
a
t
o
r
g
.
a
p
a
c
h
e
.
h
a
d
o
o
p
.
h
d
f
s
.
D
F
S
O
u
t
p
u
t
S
t
r
e
a
m
DataStreamer.createBlockOutputStream(DFSOutputStream.java:1451) at org.apache.hadoop.hdfs.DFSOutputStream
DataStreamer.createBlockOutputStream(DFSOutputStream.java:1451)atorg.apache.hadoop.hdfs.DFSOutputStreamDataStreamer.nextBlockOutputStream(DFSOutputStream.java:1373)
at org.apache.hadoop.hdfs.DFSOutputStream
D
a
t
a
S
t
r
e
a
m
e
r
.
r
u
n
(
D
F
S
O
u
t
p
u
t
S
t
r
e
a
m
.
j
a
v
a
:
600
)
16
/
11
/
1216
:
30
:
54
I
N
F
O
h
d
f
s
.
D
F
S
C
l
i
e
n
t
:
A
b
a
n
d
o
n
i
n
g
B
P
−
456596110
−
192.168.134.129
−
1450512233024
:
b
l
k
1
07374407
6
3
25416
/
11
/
1216
:
30
:
54
I
N
F
O
h
d
f
s
.
D
F
S
C
l
i
e
n
t
:
E
x
c
l
u
d
i
n
g
d
a
t
a
n
o
d
e
D
a
t
a
n
o
d
e
I
n
f
o
W
i
t
h
S
t
o
r
a
g
e
[
10.10.1.31
:
50010
,
D
S
−
01
c
3
c
362
−
44
f
4
−
46
e
b
−
a
8
d
8
−
57
d
2
c
2
d
5
f
196
,
A
R
C
H
I
V
E
]
16
/
11
/
1216
:
30
:
54
W
A
R
N
h
d
f
s
.
D
F
S
C
l
i
e
n
t
:
D
a
t
a
S
t
r
e
a
m
e
r
E
x
c
e
p
t
i
o
n
o
r
g
.
a
p
a
c
h
e
.
h
a
d
o
o
p
.
i
p
c
.
R
e
m
o
t
e
E
x
c
e
p
t
i
o
n
(
j
a
v
a
.
i
o
.
I
O
E
x
c
e
p
t
i
o
n
)
:
F
i
l
e
/
C
o
l
d
d
a
t
a
/
t
e
s
t
8.
t
x
t
.
C
O
P
Y
I
N
G
c
o
u
l
d
o
n
l
y
b
e
r
e
p
l
i
c
a
t
e
d
t
o
0
n
o
d
e
s
i
n
s
t
e
a
d
o
f
m
i
n
R
e
p
l
i
c
a
t
i
o
n
(
=
1
)
.
T
h
e
r
e
a
r
e
1
d
a
t
a
n
o
d
e
(
s
)
r
u
n
n
i
n
g
a
n
d
1
n
o
d
e
(
s
)
a
r
e
e
x
c
l
u
d
e
d
i
n
t
h
i
s
o
p
e
r
a
t
i
o
n
.
a
t
o
r
g
.
a
p
a
c
h
e
.
h
a
d
o
o
p
.
h
d
f
s
.
s
e
r
v
e
r
.
b
l
o
c
k
m
a
n
a
g
e
m
e
n
t
.
B
l
o
c
k
M
a
n
a
g
e
r
.
c
h
o
o
s
e
T
a
r
g
e
t
4
N
e
w
B
l
o
c
k
(
B
l
o
c
k
M
a
n
a
g
e
r
.
j
a
v
a
:
1541
)
a
t
o
r
g
.
a
p
a
c
h
e
.
h
a
d
o
o
p
.
h
d
f
s
.
s
e
r
v
e
r
.
n
a
m
e
n
o
d
e
.
F
S
N
a
m
e
s
y
s
t
e
m
.
g
e
t
A
d
d
i
t
i
o
n
a
l
B
l
o
c
k
(
F
S
N
a
m
e
s
y
s
t
e
m
.
j
a
v
a
:
3289
)
a
t
o
r
g
.
a
p
a
c
h
e
.
h
a
d
o
o
p
.
h
d
f
s
.
s
e
r
v
e
r
.
n
a
m
e
n
o
d
e
.
N
a
m
e
N
o
d
e
R
p
c
S
e
r
v
e
r
.
a
d
d
B
l
o
c
k
(
N
a
m
e
N
o
d
e
R
p
c
S
e
r
v
e
r
.
j
a
v
a
:
668
)
a
t
o
r
g
.
a
p
a
c
h
e
.
h
a
d
o
o
p
.
h
d
f
s
.
s
e
r
v
e
r
.
n
a
m
e
n
o
d
e
.
A
u
t
h
o
r
i
z
a
t
i
o
n
P
r
o
v
i
d
e
r
P
r
o
x
y
C
l
i
e
n
t
P
r
o
t
o
c
o
l
.
a
d
d
B
l
o
c
k
(
A
u
t
h
o
r
i
z
a
t
i
o
n
P
r
o
v
i
d
e
r
P
r
o
x
y
C
l
i
e
n
t
P
r
o
t
o
c
o
l
.
j
a
v
a
:
212
)
a
t
o
r
g
.
a
p
a
c
h
e
.
h
a
d
o
o
p
.
h
d
f
s
.
p
r
o
t
o
c
o
l
P
B
.
C
l
i
e
n
t
N
a
m
e
n
o
d
e
P
r
o
t
o
c
o
l
S
e
r
v
e
r
S
i
d
e
T
r
a
n
s
l
a
t
o
r
P
B
.
a
d
d
B
l
o
c
k
(
C
l
i
e
n
t
N
a
m
e
n
o
d
e
P
r
o
t
o
c
o
l
S
e
r
v
e
r
S
i
d
e
T
r
a
n
s
l
a
t
o
r
P
B
.
j
a
v
a
:
483
)
a
t
o
r
g
.
a
p
a
c
h
e
.
h
a
d
o
o
p
.
h
d
f
s
.
p
r
o
t
o
c
o
l
.
p
r
o
t
o
.
C
l
i
e
n
t
N
a
m
e
n
o
d
e
P
r
o
t
o
c
o
l
P
r
o
t
o
s
DataStreamer.run(DFSOutputStream.java:600) 16/11/12 16:30:54 INFO hdfs.DFSClient: Abandoning BP-456596110-192.168.134.129-1450512233024:blk_1073744076_3254 16/11/12 16:30:54 INFO hdfs.DFSClient: Excluding datanode DatanodeInfoWithStorage[10.10.1.31:50010,DS-01c3c362-44f4-46eb-a8d8-57d2c2d5f196,ARCHIVE] 16/11/12 16:30:54 WARN hdfs.DFSClient: DataStreamer Exception org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /Cold_data/test8.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1541) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3289) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:668) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:212) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:483) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos
DataStreamer.run(DFSOutputStream.java:600)16/11/1216:30:54INFOhdfs.DFSClient:AbandoningBP−456596110−192.168.134.129−1450512233024:blk1073744076325416/11/1216:30:54INFOhdfs.DFSClient:ExcludingdatanodeDatanodeInfoWithStorage[10.10.1.31:50010,DS−01c3c362−44f4−46eb−a8d8−57d2c2d5f196,ARCHIVE]16/11/1216:30:54WARNhdfs.DFSClient:DataStreamerExceptionorg.apache.hadoop.ipc.RemoteException(java.io.IOException):File/Colddata/test8.txt.COPYINGcouldonlybereplicatedto0nodesinsteadofminReplication(=1).Thereare1datanode(s)runningand1node(s)areexcludedinthisoperation.atorg.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1541)atorg.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3289)atorg.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:668)atorg.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:212)atorg.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:483)atorg.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtosClientNamenodeProtocol
2.
c
a
l
l
B
l
o
c
k
i
n
g
M
e
t
h
o
d
(
C
l
i
e
n
t
N
a
m
e
n
o
d
e
P
r
o
t
o
c
o
l
P
r
o
t
o
s
.
j
a
v
a
)
a
t
o
r
g
.
a
p
a
c
h
e
.
h
a
d
o
o
p
.
i
p
c
.
P
r
o
t
o
b
u
f
R
p
c
E
n
g
i
n
e
2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine
2.callBlockingMethod(ClientNamenodeProtocolProtos.java)atorg.apache.hadoop.ipc.ProtobufRpcEngineServer
P
r
o
t
o
B
u
f
R
p
c
I
n
v
o
k
e
r
.
c
a
l
l
(
P
r
o
t
o
b
u
f
R
p
c
E
n
g
i
n
e
.
j
a
v
a
:
619
)
a
t
o
r
g
.
a
p
a
c
h
e
.
h
a
d
o
o
p
.
i
p
c
.
R
P
C
ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC
ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)atorg.apache.hadoop.ipc.RPCServer.call(RPC.java:1060)
at org.apache.hadoop.ipc.Server$Handler
1.
r
u
n
(
S
e
r
v
e
r
.
j
a
v
a
:
2044
)
a
t
o
r
g
.
a
p
a
c
h
e
.
h
a
d
o
o
p
.
i
p
c
.
S
e
r
v
e
r
1.run(Server.java:2044) at org.apache.hadoop.ipc.Server
1.run(Server.java:2044)atorg.apache.hadoop.ipc.ServerHandler
1.
r
u
n
(
S
e
r
v
e
r
.
j
a
v
a
:
2040
)
a
t
j
a
v
a
.
s
e
c
u
r
i
t
y
.
A
c
c
e
s
s
C
o
n
t
r
o
l
l
e
r
.
d
o
P
r
i
v
i
l
e
g
e
d
(
N
a
t
i
v
e
M
e
t
h
o
d
)
a
t
j
a
v
a
x
.
s
e
c
u
r
i
t
y
.
a
u
t
h
.
S
u
b
j
e
c
t
.
d
o
A
s
(
S
u
b
j
e
c
t
.
j
a
v
a
:
415
)
a
t
o
r
g
.
a
p
a
c
h
e
.
h
a
d
o
o
p
.
s
e
c
u
r
i
t
y
.
U
s
e
r
G
r
o
u
p
I
n
f
o
r
m
a
t
i
o
n
.
d
o
A
s
(
U
s
e
r
G
r
o
u
p
I
n
f
o
r
m
a
t
i
o
n
.
j
a
v
a
:
1671
)
a
t
o
r
g
.
a
p
a
c
h
e
.
h
a
d
o
o
p
.
i
p
c
.
S
e
r
v
e
r
1.run(Server.java:2040) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.ipc.Server
1.run(Server.java:2040)atjava.security.AccessController.doPrivileged(NativeMethod)atjavax.security.auth.Subject.doAs(Subject.java:415)atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)atorg.apache.hadoop.ipc.ServerHandler.run(Server.java:2038)
at org.apache.hadoop.ipc.Client.call(Client.java:1468)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1544)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:600)
put: File /Cold_data/test8.txt.COPYING could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
分析
此时数据目录/opt/beh/data/namenode/dfs的空间大小如下
$ cd /opt/beh/data/namenode/dfs
$ du -sh *
15G data
12K data1
34M name
19M namesecondary
[x] 查看此时的磁盘空间
$ df -h
文件系统 容量 已用 可用 已用% 挂载点
/dev/mapper/centos-root 46G 27G 19G 59% /
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 8.5M 7.8G 1% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/vda1 497M 125M 373M 25% /boot
tmpfs 1.6G 0 1.6G 0% /run/user/0
tmpfs 1.6G 0 1.6G 0% /run/user/1000
3、总结
出现报错说明磁盘预留空间配置生效,但是查看磁盘空间可以看到,本地目录剩余可用空间并不是Hdfs设置的预留空间;
Hdfs对一个数据目录的可用存储认定是当前目录所在磁盘的总空间(此处为/目录46G),并不是当前目录的可用空间。
实际上的HDFS的剩余空间计算:
当前目录(磁盘)的总空间46G - Hdfs已使用的总空间15G=31G
而此时预留空间为30G,因此hdfs剩余的可用空间为1G,所以当再次上传一个大小为2G的文件时,出现以上的报错。
因为此处测试直接使用了/目录的存储,其它非Hdfs占用了部分空间,当hdfs的数据目录对单块磁盘一一对应,每块磁盘的剩余可用空间大小与预留空间配置的值相当时,就不会再往该磁盘写入数据。