昨天遇到一个问题,原本在非 k8s 环境下可以运行的 eureka 集群,上到 k8s 环境后,就无法运行。
这里记录一下解决问题的过程:
kubectl logs -f XXX -n XXX 看日志后,报错:
com.netflix.discovery.shared.transport.TransportException: Cannot execute request on any known server
at com.netflix.discovery.shared.transport.decorator.RetryableEurekaHttpClient.execute(RetryableEurekaHttpClient.java:111) ~[eureka-client-1.6.2.jar:1.6.2]
at com.netflix.discovery.shared.transport.decorator.EurekaHttpClientDecorator.getApplications(EurekaHttpClientDecorator.java:134) ~[eureka-client-1.6.2.jar:1.6.2]
at com.netflix.discovery.shared.transport.decorator.EurekaHttpClientDecorator$6.execute(EurekaHttpClientDecorator.java:137) ~[eureka-client-1.6.2.jar:1.6.2]
at com.netflix.discovery.shared.transport.decorator.SessionedEurekaHttpClient.execute(SessionedEurekaHttpClient.java:77) ~[eureka-client-1.6.2.jar:1.6.2]
at com.netflix.discovery.shared.transport.decorator.EurekaHttpClientDecorator.getApplications(EurekaHttpClientDecorator.java:134) ~[eureka-client-1.6.2.jar:1.6.2]
at com.netflix.discovery.DiscoveryClient.getAndStoreFullRegistry(DiscoveryClient.java:1013) [eureka-client-1.6.2.jar:1.6.2]
at com.netflix.discovery.DiscoveryClient.getAndUpdateDelta(DiscoveryClient.java:1055) [eureka-client-1.6.2.jar:1.6.2]
at com.netflix.discovery.DiscoveryClient.fetchRegistry(DiscoveryClient.java:929) [eureka-client-1.6.2.jar:1.6.2]
at com.netflix.discovery.DiscoveryClient.refreshRegistry(DiscoveryClient.java:1451) [eureka-client-1.6.2.jar:1.6.2]
at com.netflix.discovery.DiscoveryClient$CacheRefreshThread.run(DiscoveryClient.java:1418) [eureka-client-1.6.2.jar:1.6.2]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_80]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_80]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_80]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80]
我的eureka yaml 文件如下:
apiVersion: apps/v1
kind: Deployment
metadata:
name: eureka01-deployment
namespace: wx-prod
spec:
replicas: 1
template:
metadata:
labels:
app: eureka01-prod
regcenter: eureka
track: stable
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: "kubernetes.io/hostname"
labelSelector:
matchExpressions:
- key: regcenter
operator: In
values:
- eureka
containers:
- name: eureka
image: harbor.prod.com/kube-prod/eureka:1.0
imagePullPolicy: Always
resources:
requests:
cpu: "500m"
memory: "1024Mi"
limits:
cpu: "1000m"
memory: "2048Mi"
ports:
- containerPort: 8761
env:
- name: eureka.server.enable-self-preservation
value: "false"
- name: eureka.client.service-url.defaultZone
value: http://wx:wx@eureka02-service.wx-prod.svc.cluster.local:8761/eureka/,http://vass_wx:vass_wx@eureka03-service.wx-prod.svc.cluster.local:8761/eureka/
imagePullSecrets:
- name: harbor-secret-name
selector:
matchLabels:
app: eureka01-prod
---
---
apiVersion: v1
kind: Service
metadata:
name: eureka01-service
namespace: wx-prod
labels:
app: eureka01-svc
spec:
type: NodePort
selector:
app: eureka01-prod
ports:
- port: 8761
targetPort: 8761
nodePort: 30001
看字面意思就是 eureka 之间无法相互找到对方,因为是我新搭建的 k8s 环境所以,同时eureka域名我使用的是 k8s 内部域名,我第一想到的是排查 coreDNS 是否正常工作
排查步骤如下:
参考这位同学的排查步骤: https://blog.csdn.net/alva_xu/article/details/85160552
1、 在 kubectl get pod -n kube-system 中查看 coreDNS 的pod是否正常;
2、 下载 busybox 并在k8s集群内部启动
3、 kubectl exec -ti busybox -- nslookup kubernetes.default 确认是否 域名解析是否有问题
结果证明完全正常。
环境没问题,那么出问题的地方就只可能是环境和代码不匹配了,因此再排查注册中心 bootstrap 文件:
server:
port: ${hostPort}
eureka:
client:
service-url:
defaultZone: http://wx:wx@${eureka.node01.name}:${eureka.node01.port}/eureka/,http://wx:wx@${eureka.node02.name}:${eureka.node02.port}/eureka/
fetch-registry: true
register-with-eureka: true
instance:
#要配置hosts
#hostname: ${eureka.hostname}
instance-id: ${spring.application.name}:${server.port}
prefer-ip-address: true
ip-address: ${ipAddress}
server:
peer-node-read-timeout-ms: 1000
####自我保护,线上设置为true
enable-self-preservation: ${selfPreservation:true}
spring:
application:
name: eureka
security:
basic:
enabled: true
user:
name: wx
password: wx
突然反应过来,如果三个 eureka 都使用相同的 application.name:port 作为注册的 instanceid 那么会不会是导致这个问题的原因呢? 接下来将该文件修改为:
server:
port: 8761
eureka:
client:
service-url:
defaultZone: http://localhost:8761/eureka/
fetch-registry: true
register-with-eureka: true
instance:
instance-id: ${spring.cloud.client.ipAddress}:${server.port}
prefer-ip-address: true
server:
peer-node-read-timeout-ms: 1000
####自我保护,线上设置为true
enable-self-preservation: false
spring:
application:
name: eureka
security:
basic:
enabled: true
user:
name: wx
password: wx
因为 k8s 内部部署时,我使用的是 ClusterIP,每个 eureka 实例的 IP 应该都是不同的,这样使用 IP + Port 作为 instanceID :
启动后发现完美解决, mark 一下。