Flink在Kubernetes环境中taskmanager无法连接jobmanager的错误处理

近期在K8S环境下部署了一套Flink框架,单个Job运行时部署一切正常,多个Job同时运行,系统就会报资源申请不到之类的错误,

如:

java.io.IOException: Failed to fetch BLOB 04fe83f2b1ff5a167fe4e6c321226dc6/p-6521abaef7d048ff4aed8ce5d585b10ccdd308b-2a56a314e6d42d012caad6e3d38b0a85 from flink-jobmanager/xxx.xxx.xxx.xxx:6124 and store it under /tmp/blobStore-2cc077e6-6dee-4180-bc70-e110a5cf6c24/incoming/temp-00000350
    at org.apache.flink.runtime.blob.BlobClient.downloadFromBlobServer(BlobClient.java:167)
    at org.apache.flink.runtime.blob.AbstractBlobCache.getFileInternal(AbstractBlobCache.java:166)
    at org.apache.flink.runtime.blob.PermanentBlobCache.getFile(PermanentBlobCache.java:187)
    at org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.createUserCodeClassLoader(BlobLibraryCacheManager.java:251)
    at org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.getOrResolveClassLoader(BlobLibraryCacheManager.java:228)
    at org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.access$1100(BlobLibraryCacheManager.java:199)
    at org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$DefaultClassLoaderLease.getOrResolveClassLoader(BlobLibraryCacheManager.java:333)
    at org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:983)
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:632)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Could not connect to BlobServer at address flink-jobmanager/xxx.xxx.xxx.xxx:6124
    at org.apache.flink.runtime.blob.BlobClient.<init>(BlobClient.java:102)
    at org.apache.flink.runtime.blob.BlobClient.downloadFromBlobServer(BlobClient.java:137)
    ... 10 more
Caused by: java.net.UnknownHostException: flink-jobmanager
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:607)
    at org.apache.flink.runtime.blob.BlobClient.<init>(BlobClient.java:96)
    ... 11 more
 

一开始以为是k8s的service问题,尝试删除service重新发布,可以临时解决问题

后来随着job的增多,问题依旧。其实问题出在了K8S自身的网络稳定性问题上。尝试在config文件中添加

jobmanager.rpc.address: <clusterIP>

同时加大taskmanager的内存配置

重新部署jobmanager和taskmanager,问题解决

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值