CanalClient卡死宿主服务原因分析

canal Server日志

2021-10-12 17:46:28.688 [New I/O server worker #1-3] ERROR c.a.otter.canal.server.netty.handler.SessionHandler - something goes wrong with channel:[id: 0x15324335, /192.168.30.4:11715 => /192.168.6.51:11111], exception=java.io.IOE

xception: Connection reset by peer

at sun.nio.ch.FileDispatcherImpl.write0(Native Method)

at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)

at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)

at sun.nio.ch.IOUtil.write(IOUtil.java:51)

at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:470)

at org.jboss.netty.channel.socket.nio.SocketSendBufferPool$PooledSendBuffer.transferTo(SocketSendBufferPool.java:243)

at org.jboss.netty.channel.socket.nio.NioWorker.write0(NioWorker.java:470)

at org.jboss.netty.channel.socket.nio.NioWorker.writeFromUserCode(NioWorker.java:388)

at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:137)

at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:76)

at org.jboss.netty.channel.Channels.write(Channels.java:611)

at org.jboss.netty.channel.Channels.write(Channels.java:578)

at com.alibaba.otter.canal.server.netty.NettyUtils.write(NettyUtils.java:48)

at com.alibaba.otter.canal.server.netty.handler.SessionHandler.messageReceived(SessionHandler.java:202)

at org.jboss.netty.handler.timeout.IdleStateAwareChannelHandler.handleUpstream(IdleStateAwareChannelHandler.java:48)

at org.jboss.netty.handler.timeout.IdleStateHandler.messageReceived(IdleStateHandler.java:276)

at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302)

at org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:526)

at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:507)

at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:444)

at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)

at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)

at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:350)

at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:281)

at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:201)

at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

jstack查看堆栈日志

jstack -l [pid] > jstack.log

发现好多canal client执行线程,WAITING(parking)状态,等待资源<0x00000006ca337b70>

“canal-execute-thread-8” #1912 prio=5 os_prio=0 tid=0x00007f4e1418d800 nid=0x39ff waiting on condition [0x00007f4dc4b4e000]

java.lang.Thread.State: WAITING (parking)

at sun.misc.Unsafe.park(Native Method)

  • parking to wait for <0x00000006ca337b70> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

at com.alibaba.druid.pool.DruidDataSource.takeLast(DruidDataSource.java:2002)

at com.alibaba.druid.pool.DruidDataSource.getConnectionInternal(DruidDataSource.java:1539)

at com.alibaba.druid.pool.DruidDataSource.getConnectionDirect(DruidDataSource.java:1326)

at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1306)

at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1296)

at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:109)

at org.springframework.jdbc.datasource.DataSourceTransactionManager.doBegin(DataSourceTransactionManager.java:263)

at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:376)

at org.springframework.transaction.interceptor.TransactionAspectSupport.createTransactionIfNecessary(TransactionAspectSupport.java:572)

at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:360)

at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:99)

at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)

at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:747)

at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:689)

at com.keyou.evm.lpm.service.es.ElSearchStoreServiceImpl E n h a n c e r B y S p r i n g C G L I B EnhancerBySpringCGLIB EnhancerBySpringCGLIB4dc5f842.synchronous()

at com.keyou.evm.lpm.sync.StoreHandler.insert(StoreHandler.java:44)

at com.keyou.evm.lpm.sync.StoreHandler.insert(StoreHandler.java:23)

at top.javatool.canal.client.handler.impl.RowDataHandlerImpl.handlerRowData(RowDataHandlerImpl.java:35)

at top.javatool.canal.client.handler.impl.RowDataHandlerImpl.handlerRowData(RowDataHandlerImpl.java:16)

at top.javatool.canal.client.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:48)

at top.javatool.canal.client.handler.impl.AsyncMessageHandlerImpl.lambda$handleMessage$0(AsyncMessageHandlerImpl.java:30)

at top.javatool.canal.client.handler.impl.AsyncMessageHandlerImpl$$Lambda$1571/194399623.run(Unknown Source)

at org.springframework.cloud.sleuth.instrument.async.TraceRunnable.run(TraceRunnable.java:67)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Locked ownable synchronizers:

  • <0x00000006d11c0eb0> (a java.util.concurrent.ThreadPoolExecutor$Worker)

分析结果:Canal-client代码中使用了线程池,线程池同步数据到es的时候,占用了大量的连接池(DruidDataSource获取连接),导致大量的处于WAITING和 TIMED_WATING 状态的线程。

解决方案


临时解决方案:

  1. 调优连接池回收策略:使用非公平锁、自动回收超时连接;

  2. druid默认使用非公平锁。但是,配置文件中现有一个参数maxWait,这个会导致执行setMaxWait()创建一个公平锁来代替非公平锁。

参考文档:https://github.com/alibaba/druid/issues/1160

#是否自动回收超时连接

removeAbandoned=true

#超时时间(以秒数为单位)

removeAbandonedTimeout=180

最终解决方案:

以上出现的问题,其根本原因还是单节点无法一股脑处理大批量的数据更新;所以,最终的方案一定是:使用MQ削峰填谷

这也是为什么在CanalServer端,天生支持很多种Mq的配置的原因。否则,Client客户端难以承受Mysql主库数据大批量变化的带来的影响。

在这里插入图片描述

感悟


正如题目所述,从来没有一个技术栈可以独立运行。

我们可以在各大学习网站看到各种各样的技术视频。但要做好架构的话,需要对该技术的生态有一个全面的了解,否则在未来的维护中会非常被动。

最后

自我介绍一下,小编13年上海交大毕业,曾经在小公司待过,也去过华为、OPPO等大厂,18年进入阿里一直到现在。

深知大多数Java工程师,想要提升技能,往往是自己摸索成长,自己不成体系的自学效果低效漫长且无助。

因此收集整理了一份《2024年Java开发全套学习资料》,初衷也很简单,就是希望能够帮助到想自学提升又不知道该从何学起的朋友,同时减轻大家的负担。

既有适合小白学习的零基础资料,也有适合3年以上经验的小伙伴深入学习提升的进阶课程,基本涵盖了95%以上Java开发知识点,不论你是刚入门Android开发的新手,还是希望在技术上不断提升的资深开发者,这些资料都将为你打开新的学习之门!

如果你觉得这些内容对你有帮助,需要这份全套学习资料的朋友可以戳我获取!!

由于文件比较大,这里只是将部分目录截图出来,每个节点里面都包含大厂面经、学习笔记、源码讲义、实战项目、讲解视频,并且会持续更新!
小伙伴深入学习提升的进阶课程,基本涵盖了95%以上Java开发知识点,不论你是刚入门Android开发的新手,还是希望在技术上不断提升的资深开发者,这些资料都将为你打开新的学习之门!**

如果你觉得这些内容对你有帮助,需要这份全套学习资料的朋友可以戳我获取!!

由于文件比较大,这里只是将部分目录截图出来,每个节点里面都包含大厂面经、学习笔记、源码讲义、实战项目、讲解视频,并且会持续更新!

ey Features Understand common performance and reliability pitfalls in ElasticSearch Use popular monitoring tools such as ElasticSearch-head, BigDesk, Marvel, Kibana, and more This is a step-by-step guide with lots of case studies on solving real-world ElasticSearch cluster issues Book Description ElasticSearch is a distributed search server similar to Apache Solr with a focus on large datasets, a schema-less setup, and high availability. This schema-free architecture allows ElasticSearch to index and search unstructured content, making it perfectly suited for both small projects and large big data warehouses with petabytes of unstructured data. This book is your toolkit to teach you how to keep your cluster in good health, and show you how to diagnose and treat unexpected issues along the way. You will start by getting introduced to ElasticSearch, and look at some common performance issues that pop up when using the system. You will then see how to install and configure ElasticSearch and the ElasticSearch monitoring plugins. Then, you will proceed to install and use the Marvel dashboard to monitor ElasticSearch. You will find out how to troubleshoot some of the common performance and reliability issues that come up when using ElasticSearch. Finally, you will analyze your cluster's historical performance, and get to know how to get to the bottom of and recover from system failures. This book will guide you through several monitoring tools, and utilizes real-world cases and dilemmas faced when using ElasticSearch, showing you how to solve them simply, quickly, and cleanly. What you will learn Explore your cluster with ElasticSearch-head and BigDesk Access the underlying data of the ElasticSearch monitoring plugins using the ElasticSearch API Analyze your cluster's performance with Marvel Troubleshoot some of the common performance and reliability issues that come up when using ElasticSearch Analyze a cluster's historical performance, and get to the bottom of and recover from system failures Use and install various other tools and plugins such as Kibana and Kopf, which is helpful to monitor ElasticSearch About the Author Dan Noble is a software engineer with a passion for writing secure, clean, and articulate code. He enjoys working with a variety of programming languages and software frameworks, particularly Python, Elasticsearch, and frontend technologies. Dan currently works on geospatial web applications and data processing systems. Dan has been a user and advocate of Elasticsearch since 2011. He has given talks about Elasticsearch at various meetup groups, and is the author of the Python Elasticsearch client rawes. Dan was also a technical reviewer for the Elasticsearch Cookbook, Second Edition, by Alberto Paro. Table of Contents Chapter 1. Introduction to Monitoring Elasticsearch Chapter 2. Installation and the Requirements for Elasticsearch Chapter 3. Elasticsearch-head and Bigdesk Chapter 4. Marvel Dashboard Chapter 5. System Monitoring Chapter 6. Troubleshooting Performance and Reliability Issues Chapter 7. Node Failure and Post-Mortem Analysis Chapter 8. Looking Forward
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值