最开始是数据写不进去 报错:
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /datas/hudi_datas/hive/hudi-ts/chuancan_TEst/.hoodie/metadata/files/.files-0000_00000000000000.log.1_0-0-0 could only be written to 0 of the 1 minReplication nodes. There are 3 datanode(s) running and 3 node(s) are excluded in this operation.
此时应意识到是datanode出现文件块损坏,但选择了重启
grep "ERROR" xxx.log
hdfs为HA部署、其中一个Namenode启动失败。报错如下:
2023-09-25 18:51:27,025 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM to [172.18.1.5x:8485, 172.18.1.5x:8485, 172.18.1.56:8485], stream=null))
JournalNode 报错如下:
2023-09-25 18:38:00,076 ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Aborting current sync attempt.
2023-09-25 18:38:36,277 ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNode: RECEIVED SIGNAL 15: SIGTERM
2023-09-25 18:51:08,294 ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Download of Edit Log file for Syncing failed. Deleting temp file: /data/dfs/jn/nameservice1/edits.sync/edits_0000000000056969091-0000000000056969870
JournalNode 为3、4、5节点。但4、5节点报错。所以判断3节点元数据无问题,采用节点3为恢复元数据节点。
解决:
1.首先备份3、4、5节点的nameservice1目录
2.将3节点的 /data/dfs/jn/nameservice1 拷贝至4、5节点 之后启动