一、现象
当使用ES RestHighLevelClient连接到ES时,将报告“对等连接重置”错误,TCP连接中断,服务数据无法写入。
二、报错
Connection reset by peer 客户端和集群连接意外断开
三、报错原因
kube-proxy使用的ipvs模式,ipvs默认会有一个900S超时时间,会将空闲连接重置
ES-pod 宿主机内核参数
pod内核参数
16:08左右telnet ESIP:PORT建立长链接
同步使用ipvsadm -lnc持续查看链接状态
经过15min,即16:23链接被重置回收
原生ES无keepalive参数,依赖的是系统自带的tcp_keepalive,检查宿主机和pod内核参数,net.ipv4.tcp_keepalive_time=7200,即系统默认tcp探活时间 为7200,大于ipvs的900S,链接被IPVS重置回收
四、处理方案
4.1 方案一
修改RestHighLevelClient连接请求的超时间隔。默认值为1000毫秒。您可以将值增加到10000毫秒。
@Configuration
@Conditional(MsgMqDevCondition.class)
public class ElasticSearchDevConfig {
@Value("${es.url}")
private String esPath;
@Value("${es.port}")
private String esPort;
@Bean
public RestHighLevelClient restHighLevelClient(){
RestHighLevelClient client = new RestHighLevelClient(
RestClient.builder(
new HttpHost(esPath,Integer.parseInt(esPort),"http")
).setRequestConfigCallback(new RestClientBuilder.RequestConfigCallback() {
@Override
public RequestConfig.Builder customizeRequestConfig(RequestConfig.Builder
requestConfigBuilder) {
return requestConfigBuilder.setConnectTimeout(1000000)
.setSocketTimeout(1000000);
}
}).setHttpClientConfigCallback((httpAsyncClientBuilder -> {
httpAsyncClientBuilder.disableAuthCaching();//
//keepAliveStrategy
httpAsyncClientBuilder.setKeepAliveStrategy((httpResponse,httpContext) -> TimeUnit.MINUTES.toMillis(3));
//tcp keepalive
httpAsyncClientBuilder.setDefaultIOReactorConfig(IOReactorConfig.custom().setSoKeepAlive
(true).build());
return httpAsyncClientBuilder;
}))
);
return client;
}
4.2 方案二
修改宿主机node系统参数net.ipv4.tcp_keepalive_time < 900S 如果ES client是容器部署的,也可以考虑容器使用initContainer单独配置容器的net.ipv4.tcp_keepalive_time
# InitContainers
initContainers:
- name: init-sysctl
command:
- sysctl
- -W
- net.ipv4.tcp_keepalive time=600
- net.ipv4.tcp_keepalive intvl=30
- net.ipv4.tcp_keepalive_probes=10
image: xxxxx
securityContext:
privileged: true
4.3 方案三
在Spring Boot中创建一个计时器,以定期检查ES的keepalive信号。
@Scheduled(fixedRate = 60000, initialDelay = 60000)
public void keepConnectionAlive() {
log.debug("Trying to ping Elasticsearch");
try {
final long noOfSportsFacilities = restHighLevelClient.status();
log.debug("Ping succeeded for SportsFacilityViewRepository, it contains {} entities",
noOfSportsFacilities);
} catch (Exception e) {
log.debug("Ping failed for SportsFacilityViewRepository");
}
}
4.4 方案四
在代码中捕获异常并重试请求。