Failed to open or create partition
com.cloudera.cmon.tstore.leveldb.LDBPartitionManager$LDBPartitionException: Unable to open DB in directory /var/lib/cloudera-service-monitor/ts/stream/partitions/stream_2019-03-18T12:37:18.736Z for partition LDBPartitionMetadataWrapper{tableName=stream, partitionName=stream_2019-03-18T12:37:18.736Z, startTime=2019-03-18T12:37:18.736Z, endTime=null, version=2, state=CLOSED}
at com.cloudera.cmon.tstore.leveldb.LDBUtils.openOrCreatePartitionDB(LDBUtils.java:194)
at com.cloudera.cmon.tstore.leveldb.LDBPartitionManager.getOrOpenInternal(LDBPartitionManager.java:620)
at com.cloudera.cmon.tstore.leveldb.LDBPartitionManager.openOrCreatePartitionLDB(LDBPartitionManager.java:557)
at com.cloudera.cmon.tstore.leveldb.LDBPartitionManager.getPartition(LDBPartitionManager.java:451)
at com.cloudera.cmon.tstore.leveldb.LDBPartitionManager.getPartitionRange(LDBPartitionManager.java:872)
at com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesStreamTable.read(LDBTimeSeriesStreamTable.java:229)
at com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesStreamTable.read(LDBTimeSeriesStreamTable.java:420)
at com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRawStreamTable.read(LDBTimeSeriesRawStreamTable.java:242)
at com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesStore.readFromStreamTable(LDBTimeSeriesStore.java:646)
at com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesStore.read(LDBTimeSeriesStore.java:582)
at com.cloudera.cmon.tstore.AggregatingTimeSeriesStore.read(AggregatingTimeSeriesStore.java:505)
at com.cloudera.cmon.kaiser.BulkMetricFetcher.issueQuery(BulkMetricFetcher.java:440)
at com.cloudera.cmon.kaiser.BulkMetricFetcher.access$000(BulkMetricFetcher.java:45)
at com.cloudera.cmon.kaiser.BulkMetricFetcher$1.call(BulkMetricFetcher.java:394)
at com.cloudera.cmon.kaiser.BulkMetricFetcher$1.call(BulkMetricFetcher.java:391)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 8 missing files; e.g.: /var/lib/cloudera-service-monitor/ts/stream/partitions/stream_2019-03-18T12:37:18.736Z/000005.sst
at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:194)
at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:212)
at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
at com.cloudera.cmon.tstore.leveldb.LDBUtils.openOrCreatePartitionDB(LDBUtils.java:184)
... 20 more
CDH集群断电,重启后发现CM无法监控集群状况,查看角色日志,发现这个错误。其中通过Caused by可以看到是有8个文件丢失,把/var/lib/cloudera-service-monitor/重命名,再重启CM就可以了。
这是自己搭建的集群,生产上的集群不知道会有什么后果!!! 慎重!!!