记一次线上异常的排查和定位


title: 记一次线上异常的排查和定位
date: 2019-02-09 20:30:02
tags: debug

某段时间,线上服务不时会爆出dwr数据异常。
初步断定是后端接口返回异常所致。

由于当时我们的服务已经接入ELK,所以第一时间登录kibana后台,按照lucene语法搜索
(Tag:“study_online”) AND (message:“exception”),定位到服务器的异常

异常信息(由于当时的异常日志已经被冲掉了,这是我从网上摘录的):

java.lang.IllegalStateException: Timed out waiting to add Cmd: 1 Opaque: 1147840 Key: com.stubhub.user.business.entity.UserSession|63C21A07311389A1EC361D834BF46E72 Cas: 0 Exp: 7200 Flags: 1 Data Length: 1748(max wait=10000ms) 
at net.spy.memcached.protocol.TCPMemcachedNodeImpl.addOp(TCPMemcachedNodeImpl.java:362) 
at net.spy.memcached.MemcachedConnection.addOperation(MemcachedConnection.java:1267) 
at com.couchbase.client.CouchbaseConnection.addOperation(CouchbaseConnection.java:277) 
at net.spy.memcached.MemcachedConnection.enqueueOperation(MemcachedConnection.java:1185) 
at net.spy.memcached.MemcachedClient.asyncStore(MemcachedClient.java:328) 
at net.spy.memcached.MemcachedClient.set(MemcachedClient.java:929) 
at com.stubhub.common.cache.store.couchbase.CouchbaseStore.put(CouchbaseStore.java:148) 
at com.stubhub.common.cache.store.CompositeCacheStore.put(CompositeCacheStore.java:69) 
at com.stubhub.common.session.impl.UserSessionCacheManagerImpl.putCrossModuleValue(UserSessionCacheManagerImpl.java:48) 
at com.stubhub.user.business.manager.impl.UserSessionCacheMgrImpl.putUserSessionToStore(UserSessionCacheMgrImpl.java:269) 
at com.stubhub.user.business.manager.impl.UserSessionCacheMgrImpl.createUserSession(UserSessionCacheMgrImpl.java:806) 
at sun.reflect.GeneratedMethodAccessor1305.invoke(Unknown Source) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 

从日志中判断是因为memcached超时抛出的异常。
首先查看工程的memcached设置,代码如下

	<bean id="memcachedClient" class="com.xxx.xxx.cache.impl.KeyPrefixSupportedMemcachedClientFactory">
		<property name="servers" value="${memcached_address_list}" />
		<property name="namespace" value="study_" />
		<property name="protocol" value="BINARY" />
		<property name="transcoder">
			<bean class="net.spy.memcached.transcoders.SerializingTranscoder">
				<property name="compressionThreshold" value="16384" />
			</bean>
		</property>
		<property name="maxReconnectDelay" value="60" />
		<property name="opTimeout" value="200"/>
		<property name="opQueueMaxBlockTime" value="400"/>
		<property name="timeoutExceptionThreshold" value="20"/>
		<property name="hashAlg">
			<value type="net.spy.memcached.DefaultHashAlgorithm">KETAMA_HASH</value>
		</property>
		<property name="locatorType">
            <value type="net.spy.memcached.ConnectionFactoryBuilder.Locator">CONSISTENT</value>
        </property>
		<property name="failureMode">
            <value type="net.spy.memcached.FailureMode">Redistribute</value>
        </property>
		<property name="useNagleAlgorithm" value="false" />
	</bean>

可以看到超时时间opTimeout设置的是200毫秒,而opQueueMaxBlockTime是400毫秒
opTimeout是操作超时时间,opQueueMaxBlockTime是指操作的最大阻塞时间。
那究竟是哪个时间超时导致的呢?
可以通过下载memcacheClient源码,搜索“Timed out waiting to add”定位到如下源码

  /*
   * (non-Javadoc)
   *
   * @see net.spy.memcached.MemcachedNode#addOp(net.spy.memcached.ops.Operation)
   */
  public final void addOp(Operation op) {
    try {
      if (!authLatch.await(1, TimeUnit.SECONDS)) {
        op.cancel();
        getLogger().warn("Operation canceled because authentication "
                + "or reconnection and authentication has "
                + "taken more than one second to complete.");
        getLogger().debug("Canceled operation %s", op.toString());
        return;
      }
      if (!inputQueue.offer(op, opQueueMaxBlockTime, TimeUnit.MILLISECONDS)) {
        throw new IllegalStateException("Timed out waiting to add " + op
            + "(max wait=" + opQueueMaxBlockTime + "ms)");
      }
    } catch (InterruptedException e) {
      // Restore the interrupted status
      Thread.currentThread().interrupt();
      throw new IllegalStateException("Interrupted while waiting to add " + op);
    }
  }

从源码中可以知晓,原因是inputQueue.offer没有正常返回导致的,超过了opQueueMaxBlockTime的时间限制

知道了原因,下面就开始思考怎么解决吧

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值