cassandra的GC非常频繁,造成业务大量的写入失败

问题:
因业务增长,近期出现代码报错如下:

{"level":"ERROR","ts":"2021-06-12 10:16:20","file":"controller/network.go:30","msg":"UploadNetworkDetails error","error":"gocql: no response received from cassandra within timeout period","uuid":"89b6c07d-62de-447d-85e5-dfb9dbaa59e1"}
{"level":"ERROR","ts":"2021-06-12 10:16:54","file":"controller/network.go:30","msg":"UploadNetworkDetails error","error":"gocql: no response received from cassandra within timeout period","uuid":"59a927d2-1ca8-46df-9a34-58d9dc48c36b"}
{"level":"ERROR","ts":"2021-06-12 10:16:54","file":"controller/network.go:30","msg":"UploadNetworkDetails error","error":"gocql: no response received from cassandra within timeout period","uuid":"b22b589b-6e34-407d-8d9d-9ae74d21dd4a"}
{"level":"ERROR","ts":"2021-06-12 10:16:54","file":"controller/network.go:30","msg":"UploadNetworkDetails error","error":"gocql: no response received from cassandra within timeout period","uuid":"3c3bafc4-673a-4ce1-a7b7-9ff7e53737b1"}
{"level":"ERROR","ts":"2021-06-12 10:16:54","file":"controller/auto_check.go:30","msg":"UploadAutoCheck error","error":"gocql: no response received from cassandra within timeout period","uuid":"a11330ae-8a54-4c62-bae6-ab14af6f541d"}
{"level":"ERROR","ts":"2021-06-12 10:16:54","file":"controller/auto_check.go:30","msg":"UploadAutoCheck error","error":"gocql: no response received from cassandra within timeout period","uuid":"7e4813bd-417c-4473-ac6c-1f08c41cd7bb"}
{"level":"ERROR","ts":"2021-06-12 10:16:54","file":"controller/network.go:30","msg":"UploadNetworkDetails error","error":"gocql: no response received from cassandra within timeout period","uuid":"39f4f485-ca90-4701-bceb-28850b48c26e"}
{"level":"ERROR","ts":"2021-06-12 10:16:54","file":"controller/auto_check.go:30","msg":"UploadAutoCheck error","error":"gocql: no response received from cassandra within timeout period","uuid":"775fdbb2-1743-4f67-8098-5a18b4fec5dd"}
....

监控分析:
compaction pending和thread pending堆积
cpu和io压力明显

日志分析:
通过分析debug.log 和system.log
发现GC非常频繁

统计system.log

# cat /var/log/cassandra/system.log | grep 'GC '|grep "2021-06-12"|wc -l
193  --一天产生了近两百次GC操作,历史最多近300多次
[root@dcd212 2021-06-15 14:40:49 ~]
# cat /var/log/cassandra/system.log | grep 'GC '|grep "2021-06-12"|awk -F'[ ,:]' '{print $5,$6,$7}'|uniq -c |sort -n
    '只保留了一分钟内执行了两次以上的GC记录'
      3 2021-06-12 10 08
      3 2021-06-12 10 13
      3 2021-06-12 10 14
      3 2021-06-12 19 54
      3 2021-06-12 19 55
      4 2021-06-12 10 09
      4 2021-06-12 15 05
      4 2021-06-12 19 38
      5 2021-06-12 10 11
      5 2021-06-12 10 12
      5 2021-06-12 10 18
      6 2021-06-12 19 39
      8 2021-06-12 10 15
     10 2021-06-12 10 17
     13 2021-06-12 10 16

查看:2021-06-12 10:16 和2021-06-12 10:17的GC情况

# cat /var/log/cassandra/system.log | grep 'GC '|grep "2021-06-12 10:16"
INFO  [Service Thread] 2021-06-12 10:16:08,679 GCInspector.java:285 - G1 Young Generation GC in 424ms.  G1 Eden Space: 14881390592 -> 0; G1 Old Gen: 12725911968 -> 12729713040; G1 Survivor Space: 914358272 -> 1660944384; 
INFO  [Service Thread] 2021-06-12 10:16:12,857 GCInspector.java:285 - G1 Young Generation GC in 592ms.  G1 Eden Space: 9755951104 -> 0; G1 Old Gen: 12725518752 -> 14310792096; G1 Survivor Space: 1660944384 -> 520093696; 
INFO  [Service Thread] 2021-06-12 10:16:17,105 GCInspector.java:285 - G1 Young Generation GC in 443ms.  G1 Eden Space: 6341787648 -> 0; G1 Old Gen: 14310792128 -> 14818476488; G1 Survivor Space: 520093696 -> 478150656; 
INFO  [Service Thread] 2021-06-12 10:16:23,910 GCInspector.java:285 - G1 Young Generation GC in 555ms.  G1 Eden Space: 10745806848 -> 0; G1 Old Gen: 14814282200 -> 14826865104; G1 Survivor Space: 478150656 -> 1224736768; 
INFO  [Service Thread] 2021-06-12 10:16:25,076 GCInspector.java:285 - G1 Young Generation GC in 420ms.  G1 Eden Space: 3372220416 -> 0; G1 Old Gen: 14822670816 -> 16018047464; G1 Survivor Space: 1224736768 -> 142606336; 
INFO  [Service Thread] 2021-06-12 10:16:30,072 GCInspector.java:285 - G1 Young Generation GC in 450ms.  G1 Eden Space: 8128561152 -> 0; G1 Old Gen: 16018047496 -> 16026436104; G1 Survivor Space: 142606336 -> 662700032; 
INFO  [Service Thread] 2021-06-12 10:16:33,400 GCInspector.java:285 - G1 Young Generation GC in 334ms.  G1 Eden Space: 4638900224 -> 0; G1 Old Gen: 16022241816 -> 16680747552; G1 Survivor Space: 662700032 -> 394264576; 
INFO  [Service Thread] 2021-06-12 10:16:37,680 GCInspector.java:285 - G1 Young Generation GC in 276ms.  G1 Eden Space: 6870269952 -> 0; G1 Old Gen: 16676553264 -> 16684942376; G1 Survivor Space: 394264576 -> 855638016; 
INFO  [Service Thread] 2021-06-12 10:16:40,827 GCInspector.java:285 - G1 Young Generation GC in 352ms.  G1 Eden Space: 5930745856 -> 0; G1 Old Gen: 16684942376 -> 17511219752; G1 Survivor Space: 855638016 -> 394264576; 
INFO  [Service Thread] 2021-06-12 10:16:45,750 GCInspector.java:285 - G1 Young Generation GC in 332ms.  G1 Eden Space: 6811549696 -> 0; G1 Old Gen: 17511219784 -> 17515414600; G1 Survivor Space: 394264576 -> 838860800; 
INFO  [Service Thread] 2021-06-12 10:16:50,453 GCInspector.java:285 - G1 Young Generation GC in 404ms.  G1 Eden Space: 7281311744 -> 0; G1 Old Gen: 17515414632 -> 17884512880; G1 Survivor Space: 838860800 -> 1006632960; 
INFO  [Service Thread] 2021-06-12 10:16:54,643 GCInspector.java:285 - G1 Young Generation GC in 874ms.  G1 Eden Space: 6987710464 -> 0; G1 Old Gen: 17884512880 -> 18836619880; G1 Survivor Space: 1006632960 -> 419430400; 
INFO  [Service Thread] 2021-06-12 10:16:56,223 GCInspector.java:285 - G1 Young Generation GC in 320ms.  G1 Eden Space: 3565158400 -> 0; G1 Old Gen: 18832425592 -> 19260244600; G1 Survivor Space: 419430400 -> 176160768;
INFO  [Service Thread] 2021-06-12 10:17:04,411 GCInspector.java:285 - G1 Young Generation GC in 458ms.  G1 Eden Space: 8665432064 -> 0; G1 Old Gen: 19260244632 -> 19272827536; G1 Survivor Space: 176160768 -> 914358272; 
INFO  [Service Thread] 2021-06-12 10:17:05,816 GCInspector.java:285 - G1 Young Generation GC in 275ms.  G1 Eden Space: 3548381184 -> 0; G1 Old Gen: 19268633248 -> 20178797224; G1 Survivor Space: 914358272 -> 218103808; 
INFO  [Service Thread] 2021-06-12 10:17:09,467 GCInspector.java:285 - G1 Young Generation GC in 325ms.  G1 Eden Space: 7230980096 -> 0; G1 Survivor Space: 218103808 -> 595591168; 
INFO  [Service Thread] 2021-06-12 10:17:14,061 GCInspector.java:285 - G1 Young Generation GC in 328ms.  G1 Eden Space: 7038042112 -> 0; G1 Old Gen: 20178797256 -> 20367540944; G1 Survivor Space: 595591168 -> 746586112; 
INFO  [Service Thread] 2021-06-12 10:17:17,015 GCInspector.java:285 - G1 Young Generation GC in 276ms.  G1 Eden Space: 7222591488 -> 0; G1 Old Gen: 20367540944 -> 20791165640; G1 Survivor Space: 746586112 -> 394264576; 
INFO  [Service Thread] 2021-06-12 10:17:20,148 GCInspector.java:285 - G1 Young Generation GC in 229ms.  G1 Eden Space: 8044675072 -> 0; G1 Old Gen: 20791165640 -> 20791166144; G1 Survivor Space: 394264576 -> 494927872; 
INFO  [Service Thread] 2021-06-12 10:17:22,120 GCInspector.java:285 - G1 Young Generation GC in 209ms.  G1 Eden Space: 7977566208 -> 0; G1 Survivor Space: 494927872 -> 578813952; 
INFO  [Service Thread] 2021-06-12 10:17:25,177 GCInspector.java:285 - G1 Young Generation GC in 236ms.  G1 Eden Space: 7809794048 -> 0; G1 Old Gen: 20791166144 -> 21117620936; G1 Survivor Space: 578813952 -> 343932928; 
INFO  [Service Thread] 2021-06-12 10:17:36,034 GCInspector.java:285 - G1 Young Generation GC in 212ms.  G1 Eden Space: 7507804160 -> 0; G1 Old Gen: 21126009552 -> 21203301584; G1 Survivor Space: 562036736 -> 553648128; 
INFO  [Service Thread] 2021-06-12 10:17:54,560 GCInspector.java:285 - G1 Young Generation GC in 212ms.  G1 Eden Space: 6937378816 -> 0; G1 Old Gen: 21822890720 -> 21818696424; G1 Survivor Space: 494927872 -> 562036736;

统计分析debug.log

INFO  [SASI-General:651] 2021-06-12 10:16:38,951 PerSSTableIndexWriter.java:268 - Flushed index segment /data1/data/services/cassandra/data/class/cl_autocheck-87df2b90c37211ebbae3a7203a91ffe6/md-429-big-SI_i_cl_autocheck_id.db_26, took 3131 ms.
INFO  [Service Thread] 2021-06-12 10:16:40,827 GCInspector.java:285 - G1 Young Generation GC in 352ms.  G1 Eden Space: 5930745856 -> 0; G1 Old Gen: 16684942376 -> 17511219752; G1 Survivor Space: 855638016 -> 394264576;
INFO  [SASI-General:652] 2021-06-12 10:16:42,088 PerSSTableIndexWriter.java:268 - Flushed index segment /data1/data/services/cassandra/data/class/c'l_liveinfo-c06c6770c37211ebbae3a7203a91ffe6/md-28-big-SI_i_cl_liveinfo_id.db_8, took 3239 ms.
INFO  [Service Thread] 2021-06-12 10:16:45,750 GCInspector.java:285 - G1 Young Generation GC in 332ms.  G1 Eden Space: 6811549696 -> 0; G1 Old Gen: 17511219784 -> 17515414600; G1 Survivor Space: 394264576 -> 838860800;
INFO  [SASI-General:653] 2021-06-12 10:16:46,648 PerSSTableIndexWriter.java:268 - Flushed index segment /data1/data/services/cassandra/data/class/cl_liveinfo-c06c6770c37211ebbae3a7203a91ffe6/md-28-big-SI_i_cl_liveinfo_id.db_9, took 3208 ms.
INFO  [Service Thread] 2021-06-12 10:16:50,453 GCInspector.java:285 - G1 Young Generation GC in 404ms.  G1 Eden Space: 7281311744 -> 0; G1 Old Gen: 17515414632 -> 17884512880; G1 Survivor Space: 838860800 -> 1006632960;
INFO  [SASI-General:654] 2021-06-12 10:16:51,005 PerSSTableIndexWriter.java:268 - Flushed index segment /data1/data/services/cassandra/data/class/cl_liveinfo-c06c6770c37211ebbae3a7203a91ffe6/md-28-big-SI_i_cl_liveinfo_id.db_10, took 3338 ms.
INFO  [SASI-General:648] 2021-06-12 10:16:53,267 PerSSTableIndexWriter.java:268 - Flushed index segment /data1/data/services/cassandra/data/class/cl_autocheck-87df2b90c37211ebbae3a7203a91ffe6/md-429-big-SI_i_cl_autocheck_id.db_27, took 3350 ms.
INFO  [Service Thread] 2021-06-12 10:16:54,643 GCInspector.java:285 - G1 Young Generation GC in 874ms.  G1 Eden Space: 6987710464 -> 0; G1 Old Gen: 17884512880 -> 18836619880; G1 Survivor Space: 1006632960 -> 419430400;
INFO  [Service Thread] 2021-06-12 10:16:56,223 GCInspector.java:285 - G1 Young Generation GC in 320ms.  G1 Eden Space: 3565158400 -> 0; G1 Old Gen: 18832425592 -> 19260244600; G1 Survivor Space: 419430400 -> 176160768;
NFO  [SASI-General:647] 2021-06-12 10:16:56,316 PerSSTableIndexWriter.java:268 - Flushed index segment /data1/data/services/cassandra/data/class/cl_liveinfo-c06c6770c37211ebbae3a7203a91ffe6/md-28-big-SI_i_cl_liveinfo_id.db_11, took 4122 ms.
INFO  [SASI-General:649] 2021-06-12 10:17:00,543 PerSSTableIndexWriter.java:268 - Flushed index segment /data1/data/services/cassandra/data/class/cl_liveinfo-c06c6770c37211ebbae3a7203a91ffe6/md-28-big-SI_i_cl_liveinfo_id.db_12, took 2896 ms.
INFO  [Service Thread] 2021-06-12 10:17:04,411 GCInspector.java:285 - G1 Young Generation GC in 458ms.  G1 Eden Space: 8665432064 -> 0; G1 Old Gen: 19260244632 -> 19272827536; G1 Survivor Space: 176160768 -> 914358272;
INFO  [Service Thread] 2021-06-12 10:17:05,816 GCInspector.java:285 - G1 Young Generation GC in 275ms.  G1 Eden Space: 3548381184 -> 0; G1 Old Gen: 19268633248 -> 20178797224; G1 Survivor Space: 914358272 -> 218103808;
INFO  [SASI-General:650] 2021-06-12 10:17:06,040 PerSSTableIndexWriter.java:268 - Flushed index segment /data1/data/services/cassandra/data/class/cl_liveinfo-c06c6770c37211ebbae3a7203a91ffe6/md-28-big-SI_i_cl_liveinfo_id.db_13, took 3807 ms.
INFO  [SASI-General:651] 2021-06-12 10:17:08,017 PerSSTableIndexWriter.java:268 - Flushed index segment /data1/data/services/cassandra/data/class/cl_autocheck-87df2b90c37211ebbae3a7203a91ffe6/md-429-big-SI_i_cl_autocheck_id.db_28, took 3169 ms.
INFO  [Service Thread] 2021-06-12 10:17:09,467 GCInspector.java:285 - G1 Young Generation GC in 325ms.  G1 Eden Space: 7230980096 -> 0; G1 Survivor Space: 218103808 -> 595591168;
INFO  [SASI-General:652] 2021-06-12 10:17:10,357 PerSSTableIndexWriter.java:268 - Flushed index segment /data1/data/services/cassandra/data/class/cl_liveinfo-c06c6770c37211ebbae3a7203a91ffe6/md-28-big-SI_i_cl_liveinfo_id.db_14, took 3229 ms.
INFO  [CompactionExecutor:2337] 2021-06-12 10:17:11,779 PerSSTableIndexWriter.java:275 - Scheduling index flush to /data1/data/services/cassandra/data/class/cl_liveinfo-c06c6770c37211ebbae3a7203a91ffe6/md-28-big-SI_i_cl_liveinfo_id.db
INFO  [SASI-General:654] 2021-06-12 10:17:12,158 PerSSTableIndexWriter.java:268 - Flushed index segment /data1/data/services/cassandra/data/class/cl_liveinfo-c06c6770c37211ebbae3a7203a91ffe6/md-28-big-SI_i_cl_liveinfo_id.db_16, took 379 ms.
INFO  [Service Thread] 2021-06-12 10:17:14,061 GCInspector.java:285 - G1 Young Generation GC in 328ms.  G1 Eden Space: 7038042112 -> 0; G1 Old Gen: 20178797256 -> 20367540944; G1 Survivor Space: 595591168 -> 746586112;
INFO  [SASI-General:653] 2021-06-12 10:17:14,455 PerSSTableIndexWriter.java:268 - Flushed index segment /data1/data/services/cassandra/data/class/cl_liveinfo-c06c6770c37211ebbae3a7203a91ffe6/md-28-big-SI_i_cl_liveinfo_id.db_15, took 3247 ms.
INFO  [Service Thread] 2021-06-12 10:17:17,015 GCInspector.java:285 - G1 Young Generation GC in 276ms.  G1 Eden Space: 7222591488 -> 0; G1 Old Gen: 20367540944 -> 20791165640; G1 Survivor Space: 746586112 -> 394264576;
INFO  [Service Thread] 2021-06-12 10:17:20,148 GCInspector.java:285 - G1 Young Generation GC in 229ms.  G1 Eden Space: 8044675072 -> 0; G1 Old Gen: 20791165640 -> 20791166144; G1 Survivor Space: 394264576 -> 494927872;
INFO  [Service Thread] 2021-06-12 10:17:22,120 GCInspector.java:285 - G1 Young Generation GC in 209ms.  G1 Eden Space: 7977566208 -> 0; G1 Survivor Space: 494927872 -> 578813952;
INFO  [Service Thread] 2021-06-12 10:17:25,177 GCInspector.java:285 - G1 Young Generation GC in 236ms.  G1 Eden Space: 7809794048 -> 0; G1 Old Gen: 20791166144 -> 21117620936; G1 Survivor Space: 578813952 -> 343932928;
INFO  [Service Thread] 2021-06-12 10:17:36,034 GCInspector.java:285 - G1 Young Generation GC in 212ms.  G1 Eden Space: 7507804160 -> 0; G1 Old Gen: 21126009552 -> 21203301584; G1 Survivor Space: 562036736 -> 553648128;
INFO  [SASI-General:647] 2021-06-12 10:17:36,210 PerSSTableIndexWriter.java:268 - Flushed index segment /data1/data/services/cassandra/data/class/cl_autocheck-87df2b90c37211ebbae3a7203a91ffe6/md-429-big-SI_i_cl_autocheck_id.db_30, took 3187 ms.
INFO  [SASI-General:649] 2021-06-12 10:17:50,517 PerSSTableIndexWriter.java:268 - Flushed index segment /data1/data/services/cassandra/data/class/cl_autocheck-87df2b90c37211ebbae3a7203a91ffe6/md-429-big-SI_i_cl_autocheck_id.db_31, took 2770 ms.
INFO  [Service Thread] 2021-06-12 10:17:54,560 GCInspector.java:285 - G1 Young Generation GC in 212ms.  G1 Eden Space: 6937378816 -> 0; G1 Old Gen: 21822890720 -> 21818696424; G1 Survivor Space: 494927872 -> 562036736;

从debug日志里可以清楚看到,每次PerSSTableIndexWriter.java:268 - Flushed index segment 都伴随着GC,而GC会造成服务停顿,一般情况下GC不会超过200ms,但是日志里的信息,有的会将近900ms。因此怀疑是GC造成的写入数据失败。

为什么会在id上创建索引:在每个表中为id创建一个索引的是为了在现在的业务数据是有mysql迁移至cassandra的,当时为了回滚查询设置的,在业务中并未使用。

处理方法:删除无用的id索引。

慎重加索引;优先考虑是用物化视图来解决搜索问题。

问题产生的原因及建议:

I think the answer is a lot simpler than what you think – the job/app/task which is loading the data to Cassandra is overloading the nodes leading to the long and frequent GC pauses which is then causing the successive writes to timeout.
The indexing does amplify the writes since an insert requires multiple writes – one write to save the mutation plus another write to save the index entry.
My suggestion is to either (a) throttle the data load, or (b) increase the capacity of your cluster by adding more nodes. Cheers!

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
当使用 Flink 将 Java Bean 包装的数据异步写入 Cassandra,你可以使用 Flink 的 `AsyncFunction` 和 Cassandra 的异步驱动程序来实现。下面是一个示例代码: 首先,确保你已经添加了 Flink 和 Cassandra 的依赖到你的项目中。 ```xml <dependencies> <!-- Flink --> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-core</artifactId> <version>${flink.version}</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_${scala.binary.version}</artifactId> <version>${flink.version}</version> </dependency> <!-- Cassandra --> <dependency> <groupId>com.datastax.cassandra</groupId> <artifactId>cassandra-driver-core</artifactId> <version>${cassandra.driver.version}</version> </dependency> </dependencies> ``` 接下来,你可以创建一个实现了 `AsyncFunction` 接口的类来执行异步写入操作。这个类将接收一个 Java Bean 作为输入,并将其写入Cassandra 中。 ```java import org.apache.flink.streaming.api.functions.async.AsyncFunction; import org.apache.flink.streaming.api.functions.async.ResultFuture; import com.datastax.driver.core.*; public class CassandraAsyncWriter implements AsyncFunction<YourBeanClass, Void> { private Cluster cluster; private Session session; private PreparedStatement statement; public CassandraAsyncWriter() { // 创建 Cassandra 集群连接 cluster = Cluster.builder() .addContactPoint("your-cassandra-host") .withPort(9042) .build(); // 创建 Cassandra 会话 session = cluster.connect("your-keyspace"); // 准备 Cassandra 查询语句 String query = "INSERT INTO your_table (column1, column2, ...) VALUES (?, ?, ...)"; statement = session.prepare(query); } @Override public void asyncInvoke(YourBeanClass input, ResultFuture<Void> resultFuture) { // 使用 Cassandra 驱动程序执行异步写入操作 BoundStatement boundStatement = statement.bind(input.getField1(), input.getField2(), ...); ResultSetFuture resultSetFuture = session.executeAsync(boundStatement); resultSetFuture.addListener(() -> { // 处理异步写入结果,这里可以根据需要设置回调逻辑 resultFuture.complete(Collections.singleton(null)); }); } @Override public void close() { // 关闭 Cassandra 会话和集群连接 statement.close(); session.close(); cluster.close(); } } ``` 最后,你可以在 Flink 的数据流中使用 `AsyncDataStream.unorderedWait` 方法将异步写入操作应用到你的数据流中。 ```java import org.apache.flink.streaming.api.datastream.AsyncDataStream; DataStream<YourBeanClass> input = ...; // 输入数据流 AsyncFunction<YourBeanClass, Void> asyncFunction = new CassandraAsyncWriter(); DataStream<Void> result = AsyncDataStream.unorderedWait(input, asyncFunction, timeout, TimeUnit.MILLISECONDS, capacity); result.print(); // 可选的结果处理,这里只是简单地将结果打印出来 env.execute("Async Cassandra Writer"); ``` 这样,当 Flink 接收到一个 Java Bean 对象时,它会异步地将其写入Cassandra 中。请根据你的实际情况修改示例代码中的表名、字段名和连接信息。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值