0. 问题
Hadoop-2.7.3,报错com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit.
,完整报错堆栈如下
2024-01-29 07:49:34,679 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: RemoteException in offerService
org.apache.hadoop.ipc.RemoteException(java.io.IOException): java.lang.IllegalStateException: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.runBlockOp(BlockManager.java:3778)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1323)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:171)
at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28756)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
Caused by: java.lang.IllegalStateException: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit.
at org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:332)
at org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:310)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processFirstBlockReport(BlockManager.java:2090)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1837)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer$1.call(NameNodeRpcServer.java:1326)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer$1.call(NameNodeRpcServer.java:1323)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.processQueue(BlockManager.java:3837)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.run(BlockManager.java:3816)
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit.
at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
at com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
at com.google.protobuf.CodedInputStream.readRawByte(CodedInputStream.java:769)
at com.google.protobuf.CodedInputStream.readRawVarint64(CodedInputStream.java:462)
at com.google.protobuf.CodedInputStream.readSInt64(CodedInputStream.java:363)
at org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:326)
... 8 more
at org.apache.hadoop.ipc.Client.call(Client.java:1475)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy15.blockReport(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:202)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:475)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:688)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:824)
at java.lang.Thread.run(Thread.java:748)
1. 分析
查阅Hadoop源码,在datanode将文件块副本信息上报给namenode时,负责信息反序列化的CodedInputStream
组件报出了这个错误
@Override
public BlockReportReplica next() {
currentBlockIndex++;
try {
// zig-zag to reduce size of legacy blocks and mask off bits
// we don't (yet) understand
block.setBlockId(cis.readSInt64());
block.setNumBytes(cis.readRawVarint64() & NUM_BYTES_MASK);
block.setGenerationStamp(cis.readRawVarint64());
long state = cis.readRawVarint64() & REPLICA_STATE_MASK; // 此处
block.setState(ReplicaState.getState((int)state));
} catch (IOException e) {
throw new IllegalStateException(e);
}
return block;
}
再查阅protobuf\CodedInputStream
,验证了原因为buffersize
超过limitsize
private boolean refillBuffer(boolean mustSucceed) throws IOException {
if (this.bufferPos < this.bufferSize) {
throw new IllegalStateException("refillBuffer() called when buffer wasn't empty.");
} else if (this.totalBytesRetired + this.bufferSize == this.currentLimit) {
if (mustSucceed) {
throw InvalidProtocolBufferException.truncatedMessage();
} else {
return false;
}
} else {
this.totalBytesRetired += this.bufferSize;
this.bufferPos = 0;
this.bufferSize = this.input == null ? -1 : this.input.read(this.buffer);
if (this.bufferSize != 0 && this.bufferSize >= -1) {
if (this.bufferSize == -1) {
this.bufferSize = 0;
if (mustSucceed) {
throw InvalidProtocolBufferException.truncatedMessage();
} else {
return false;
}
} else {
this.recomputeBufferSizeAfterLimit();
int totalBytesRead = this.totalBytesRetired + this.bufferSize + this.bufferSizeAfterLimit;
// 此处判断读取的bytes数量
if (totalBytesRead <= this.sizeLimit && totalBytesRead >= 0) {
return true;
} else {
// 此处抛出limitExceeded异常,异常内容即为开头观察到的message
throw InvalidProtocolBufferException.sizeLimitExceeded();
}
}
} else {
throw new IllegalStateException("InputStream#read(byte[]) returned invalid result: " + this.bufferSize + "\nThe InputStream implementation is buggy.");
}
}
}
2. 解决
因为protobuf是hadoop引用的组件且不能外部调整参数,所以只能重新编译组件,替换后重启
2.1 重新编译
protobuf-2.5.0 CodedInputStream(源码地址https://github.com/protocolbuffers/protobuf/releases?page=15)
private static final int DEFAULT_RECURSION_LIMIT = 64;
private static final int DEFAULT_SIZE_LIMIT = 64 << 20; // 64MB
private static final int BUFFER_SIZE = 4096;
修改DEFAULT_SIZE_LIMIT 为64 << 21
,即128MB,重新编译即可
编译方法参考这里,感谢这位老哥,流程写的干净利落!
编译好的jar包替换到所有节点的对应目录中,每台节点上共7个jar包均需要替换,可以自己写个脚本
${HADOOP_HOME}/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/protobuf-java-2.5.0.jar
${HADOOP_HOME}/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar
${HADOOP_HOME}/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar
${HADOOP_HOME}/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar
${HADOOP_HOME}/share/hadoop/common/lib/protobuf-java-2.5.0.jar
${HADOOP_HOME}/share/hadoop/tools/lib/protobuf-java-2.5.0.jar
${HADOOP_HOME}/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/protobuf-java-2.5.0.jar
2.2 修改hadoop配置
每台节点修改hdfs-site.xml,增加如下配置,将ipc.maximum.data.length
从默认的64Mb增加到对应protobuf的128Mb,然后重启集群(也可以使用动态配置,具体方法自行搜索)
<property>
<name>ipc.maximum.data.length</name>
<value>134217728</value>
</property>
3. 根因
业务背景不便多言,直接说根因
海量小文件存储在hadoop中,会造成单个文件块中包含的信息过多,文件块上报时会将包含的文件信息一并上传,导致文件信息甚至比文件大小的一半(64MB)还大。
Hadoop适合存储超大文件,选型时记住扬长避短吧~