起因
早上grafana 发来邮件警告系统异常过多..
登录 grafana 查看.
排查
登录 kibana 查看相关日志.
发现错误一:
Caused by: redis.clients.jedis.exceptions.JedisConnectionException: java.net.UnknownHostException: redis.marathon.l4lb.thisdcos.directory
at redis.clients.jedis.Connection.connect(Connection.java:207)
at redis.clients.jedis.BinaryClient.connect(BinaryClient.java:93)
at redis.clients.jedis.BinaryJedis.connect(BinaryJedis.java:1767)
at redis.clients.jedis.JedisFactory.makeObject(JedisFactory.java:106)
at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:868)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
at redis.clients.util.Pool.getResource(Pool.java:49)
... 119 common frames omitted
Caused by: java.net.UnknownHostException: redis.marathon.l4lb.thisdcos.directory
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at redis.clients.jedis.Connection.connect(Connection.java:184)
... 126 common frames omitted
dcos vip 映射 ip 出错,导致服务中找不到 redis
再查看 grafana redis 监控,发现 redis io,setnx 命令都异常.如下图
排查相关服务. 发现下面代码
while (!flag && start <= System.currentTimeMillis() + time) {
flag = redisTemplate.execute(new RedisCallback() {
@Override
public Boolean doInRedis(RedisConnection connection) throws DataAccessException {
Jedis jedis = (Jedis) connection.getNativeConnection();
if (jedis.setnx(key, UID) == 1L) {
jedis.expire(key, LOCK_DEATH_TIME);//300秒过期,防止死锁.如果在这步前jvm挂了,会导致一直死锁.
LOCK_MAP.put(RedisLock.this, 1);
setExclusiveOwnerThread(Thread.currentThread());
return true;
}
return false;
}
});
在 setnx 方法的时候没有休眠,导致一直循环..
解决方案,while 加入 sleep
while (!flag && start <= System.currentTimeMillis() + time) {
flag = redisTemplate.execute(new RedisCallback() {
@Override
public Boolean doInRedis(RedisConnection connection) throws DataAccessException {
Jedis jedis = (Jedis) connection.getNativeConnection();
if (jedis.setnx(key, UID) == 1L) {
jedis.expire(key, LOCK_DEATH_TIME);//300秒过期,防止死锁.如果在这步前jvm挂了,会导致一直死锁.
LOCK_MAP.put(RedisLock.this, 1);
setExclusiveOwnerThread(Thread.currentThread());
return true;
}
//循环竞争锁的时候添加休眠
try {
TimeUnit.MILLISECONDS.sleep(20L);
} catch (InterruptedException e) {
log.error("lock error", e);
}
log.info("竞争锁: " + key);
return false;
}
});