java 1f8b0800,spark docker java kubernetes 获取cpu内核/线程数问题

升级服务从spark2.3.0-hadoop2.8 至 spark2.4.0 hadoop3.0javascript

一往后致使spark streaming kafka消费数据积压html

服务不是传统的部署在yarn上,而是布在kubernetes(1.13.2)上 https://spark.apache.org/docs/latest/running-on-kubernetes.htmljava

由于近期对集群有大操做,觉得是集群的io瓶颈致使的积压,做了几项针对io优化,但没什么效果node

一直盯着服务日志和服务器的负载状况linux

忽然发现一点不对,spark相关服务的cpu占用一直在100%-200%之间,长时间停留在100%chrome

集群相关机器是32核,cpu占用100%能够理解为只用了单核,这里明显有问题docker

猜想数据积压,极可能不是io瓶颈,而是计算瓶颈(服务内部有分词,分类,聚类计算等计算密集操做)apache

程序内部会根据cpu核心做优化服务器

获取环境内核数的方法

def GetCpuCoreNum(): Int = {

Runtime.getRuntime.availableProcessors

}网络

打印内核心数

spark 2.4.0

root@consume-topic-qk-nwd-7d84585f5-kh7z5:/usr/spark-2.4.0# java -version

java version"1.8.0_202"Java(TM) SE Runtime Environment (build1.8.0_202-b08)

Java HotSpot(TM)64-Bit Server VM (build 25.202-b08, mixed mode)

[cuidapeng@wx-k8s-4 ~]$ kb logs consume-topic-qk-nwd-7d84585f5-kh7z5 |more

2019-03-04 15:21:59 WARN NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Cpu core Num1

2019-03-04 15:22:00 INFO SparkContext:54 - Running Spark version 2.4.0

2019-03-04 15:22:00 INFO SparkContext:54 - Submitted application: topic-quick2019-03-04 15:22:00 INFO SecurityManager:54 -Changing view acls to: root2019-03-04 15:22:00 INFO SecurityManager:54 -Changing modify acls to: root2019-03-04 15:22:00 INFO SecurityManager:54 - Changing view acls groupsto:2019-03-04 15:22:00 INFO SecurityManager:54 - Changing modify acls groupsto:2019-03-04 15:22:00 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groupswith view permissions: Set(); users with m

odify permissions: Set(root);groupswith modify permissions: Set()2019-03-04 15:22:00 INFO Utils:54 - Successfully started service 'sparkDriver' on port 33016.2019-03-04 15:22:00 INFO SparkEnv:54 -Registering MapOutputTracker2019-03-04 15:22:01 INFO SparkEnv:54 -Registering BlockManagerMaster2019-03-04 15:22:01 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper forgetting topology information2019-03-04 15:22:01 INFO BlockManagerMasterEndpoint:54 -BlockManagerMasterEndpoint up2019-03-04 15:22:01 INFO DiskBlockManager:54 - Created local directory at /tmp/blockmgr-dc0c496e-e5ab-4d07-a518-440f2336f65c2019-03-04 15:22:01 INFO MemoryStore:54 - MemoryStore started with capacity 4.5GB2019-03-04 15:22:01 INFO SparkEnv:54 -Registering OutputCommitCoordinator2019-03-04 15:22:01 INFO log:192 - Logging initialized @2888ms

Cpu core Num 1 服务变为单核计算,积压的缘由就在这里

果真猜想正确,回滚版本至2.3.0

回滚至spark 2.3.0

root@consume-topic-dt-nwd-67b7fd6dd5-jztpb:/usr/spark-2.3.0# java -version

java version"1.8.0_131"Java(TM) SE Runtime Environment (build1.8.0_131-b11)

Java HotSpot(TM)64-Bit Server VM (build 25.131-b11, mixed mode)

[cuidapeng@wx-k8s-4 ~]$ kb logs consume-topic-dt-nwd-67b7fd6dd5-jztpb | more

2019-03-04 15:16:22 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Cpu core Num32

2019-03-04 15:16:23 INFO SparkContext:54 - Running Spark version 2.3.0

2019-03-04 15:16:23 INFO SparkContext:54 - Submitted application: topic-dt2019-03-04 15:16:23 INFO SecurityManager:54 -Changing view acls to: root2019-03-04 15:16:23 INFO SecurityManager:54 -Changing modify acls to: root2019-03-04 15:16:23 INFO SecurityManager:54 - Changing view acls groupsto:2019-03-04 15:16:23 INFO SecurityManager:54 - Changing modify acls groupsto:2019-03-04 15:16:23 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groupswith view permissions: Set(); users with m

odify permissions: Set(root);groupswith modify permissions: Set()2019-03-04 15:16:23 INFO Utils:54 - Successfully started service 'sparkDriver' on port 40616.2019-03-04 15:16:23 INFO SparkEnv:54 -Registering MapOutputTracker2019-03-04 15:16:23 INFO SparkEnv:54 -Registering BlockManagerMaster2019-03-04 15:16:23 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper forgetting topology information2019-03-04 15:16:23 INFO BlockManagerMasterEndpoint:54 -BlockManagerMasterEndpoint up2019-03-04 15:16:23 INFO DiskBlockManager:54 - Created local directory at /tmp/blockmgr-5dbf1194-477a-4001-8738-3da01b5a3f012019-03-04 15:16:23 INFO MemoryStore:54 - MemoryStore started with capacity 6.2GB2019-03-04 15:16:23 INFO SparkEnv:54 -Registering OutputCommitCoordinator2019-03-04 15:16:24 INFO log:192 - Logging initialized @2867ms

Cpu core Num 32,32是物理机的内核数

阻塞并非io引发的,而是runtime可用core变小致使,spark升级至2.4.0后,服务由32核并发执行变成单核执行

这实际不是spark的问题,而是jdk的问题

很早之前有需求限制docker内的core资源,要求jdk获取到core数docker限制的core数,当时印象是对jdk提了需求将来jdk9,10会实现,jdk8还实现不了,就把docker限制内核数的方案给否了,以分散服务调度的方式做计算资源的限制

对jdk8没想到这一点,却在这里踩了个坑

docker 控制cpu的相关参数

Usage: docker run [OPTIONS] IMAGE [COMMAND] [ARG...]

Run a commandina new container

Options:--cpu-period intLimit CPU CFS (Completely Fair Scheduler) period--cpu-quota intLimit CPU CFS (Completely Fair Scheduler) quota--cpu-rt-period int Limit CPU real-time period inmicroseconds--cpu-rt-runtime int Limit CPU real-time runtime inmicroseconds-c, --cpu-shares intCPU shares (relative weight)--cpus decimalNumber of CPUs--cpuset-cpus string CPUs in which to allow execution (0-3, 0,1)--cpuset-mems string MEMs in which to allow execution (0-3, 0,1)

另一点,服务是由kubernetes调度的,kubernetes在docker之上又做一层资源管理

kubernetes对cpu的控制有两种方案

一种是基于内核的 https://kubernetes.io/blog/2018/07/24/feature-highlight-cpu-manager/

一种是基于百分比的 https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/

手动分配cpu资源

resources:

requests:

cpu:12memory:"24Gi"limits:

cpu:12memory:"24Gi"

更新服务

[cuidapeng@wx-k8s-4 ~]$ kb logs consume-topic-dt-nwd-99cf6d789-6hkcg |more

2019-03-04 16:24:57 WARN NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Cpu core Num12

2019-03-04 16:24:57 INFO SparkContext:54 - Running Spark version 2.4.0

2019-03-04 16:24:58 INFO SparkContext:54 - Submitted application: topic-dt2019-03-04 16:24:58 INFO SecurityManager:54 -Changing view acls to: root2019-03-04 16:24:58 INFO SecurityManager:54 -Changing modify acls to: root2019-03-04 16:24:58 INFO SecurityManager:54 - Changing view acls groupsto:2019-03-04 16:24:58 INFO SecurityManager:54 - Changing modify acls groupsto:2019-03-04 16:24:58 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groupswith view permissions: Set(); users with m

odify permissions: Set(root);groupswith modify permissions: Set()2019-03-04 16:24:58 INFO Utils:54 - Successfully started service 'sparkDriver' on port 36429.2019-03-04 16:24:58 INFO SparkEnv:54 -Registering MapOutputTracker2019-03-04 16:24:58 INFO SparkEnv:54 -Registering BlockManagerMaster2019-03-04 16:24:58 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper forgetting topology information2019-03-04 16:24:58 INFO BlockManagerMasterEndpoint:54 -BlockManagerMasterEndpoint up2019-03-04 16:24:58 INFO DiskBlockManager:54 - Created local directory at /tmp/blockmgr-764f35a8-ea7f-4057-8123-22cbbe2d9a392019-03-04 16:24:58 INFO MemoryStore:54 - MemoryStore started with capacity 6.2GB2019-03-04 16:24:58 INFO SparkEnv:54 -Registering OutputCommitCoordinator2019-03-04 16:24:58 INFO log:192 - Logging initialized @2855ms

Cpu core Num 12 生效

kubernetes(docker) 和spark(jdk)之间core有一个兼容性问题

jdk 1.8.0_131 在docker内 获取的是主机上的内核数

jdk 1.8.0_202 在docker内 获取的是docker被限制的内核数,kubernetes不指定resource默认限制为1

升级至spark2.4.0-hadoop3.0(jdk 1.8.0_202),同时kubernetes同时指定内核数,也能够切换jdk至低版本,但须要从新打docker镜像。

指定内核数

[cuidapeng@wx-k8s-4 ~]$ kb describe node wx-k8s-8

Name: wx-k8s-8Roles:Labels: beta.kubernetes.io/arch=amd64

beta.kubernetes.io/os=linux

kubernetes.io/hostname=wx-k8s-8Annotations: flannel.alpha.coreos.com/backend-data: {"VtepMAC":"26:21:23:bb:3d:62"}

flannel.alpha.coreos.com/backend-type: vxlan

flannel.alpha.coreos.com/kube-subnet-manager: trueflannel.alpha.coreos.com/public-ip: 10.10.3.126kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock

node.alpha.kubernetes.io/ttl: 0volumes.kubernetes.io/controller-managed-attach-detach: trueCreationTimestamp: Thu,24 Jan 2019 14:11:15 +0800Taints:Unschedulable:falseConditions:

Type Status LastHeartbeatTime LastTransitionTime Reason Message---- ------ ----------------- ------------------ ------ -------MemoryPressure False Mon,04 Mar 2019 17:27:16 +0800 Thu, 24 Jan 2019 14:11:15 +0800KubeletHasSufficientMemory kubelet has sufficient memory available

DiskPressure False Mon,04 Mar 2019 17:27:16 +0800 Thu, 24 Jan 2019 14:11:15 +0800KubeletHasNoDiskPressure kubelet has no disk pressure

PIDPressure False Mon,04 Mar 2019 17:27:16 +0800 Thu, 24 Jan 2019 14:11:15 +0800KubeletHasSufficientPID kubelet has sufficient PID available

Ready True Mon,04 Mar 2019 17:27:16 +0800 Thu, 24 Jan 2019 14:24:48 +0800KubeletReady kubelet is posting ready status

Addresses:

InternalIP:10.10.3.126Hostname: wx-k8s-8Capacity:

cpu:32ephemeral-storage: 1951511544Ki

hugepages-1Gi: 0hugepages-2Mi: 0memory: 65758072Ki

pods:110Allocatable:

cpu:32ephemeral-storage: 1798513035973hugepages-1Gi: 0hugepages-2Mi: 0memory: 65655672Ki

pods:110System Info:

Machine ID: c4ef335760624cd8940eddc0cd568982

System UUID: 4C4C4544-0056-3310-8036-B4C04F393632

Boot ID: 02925b6a-8fc8-4399-a12e-54a77f72b4f3

Kernel Version:3.10.0-693.el7.x86_64

OS Image: CentOS Linux7(Core)

Operating System: linux

Architecture: amd64

Container Runtime Version: docker://17.3.2

Kubelet Version: v1.13.2Kube-Proxy Version: v1.13.2PodCIDR:10.244.7.0/24Non-terminated Pods: (15 intotal)

Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE--------- ---- ------------ ---------- --------------- ------------- ---default consume-buzz-8c5cd6f97-z542c 0 (0%) 0 (0%) 4Gi (6%) 5Gi (7%) 6h18m

default consume-daoen-7c946bdf76-kmjp6 0 (0%) 0 (0%) 2Gi (3%) 3Gi (4%) 5h19m

default consume-mengqiu-autohome-koubei-cf5d4cb87-cnp2g 0 (0%) 0 (0%) 2Gi (3%) 3Gi (4%) 5h30m

default consume-mengqiu-car-7c6575f5fc-zskhw 0 (0%) 0 (0%) 2Gi (3%) 3Gi (4%) 5h19m

default consume-mengqiu-dt-5c6d7f8c5c-jf4hr 8 (25%) 0 (0%) 8Gi (12%) 9Gi (14%) 36s

default consume-mengqiu-ec-768c647d7b-5wkss 0 (0%) 0 (0%) 2Gi (3%) 3Gi (4%) 5h30m

default consume-mengqiu-qk-7c6d96c85-24kwp 8 (25%) 0 (0%) 8Gi (12%) 13Gi (20%) 36s

default consume-mengqiu-yp-848c89dd97-6mqsb 0 (0%) 0 (0%) 2Gi (3%) 3Gi (4%) 5h19m

default consume-qb-799c98f996-njczw 2 (6%) 0 (0%) 4Gi (6%) 5Gi (7%) 36s

default consume-xiaohongshu-6cfcd554f6-gdc9g 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5h30m

default consume-yunjiao-article-5bcb58ddcf-lsqj8 0 (0%) 0 (0%) 2Gi (3%) 3Gi (4%) 5h30m

default consume-zhihu-6764ff956-zgt79 0 (0%) 0 (0%) 4Gi (6%) 5Gi (7%) 6h18m

default consume-zjx-6cf67885c-g5h2s 2 (6%) 0 (0%) 4Gi (6%) 5Gi (7%) 36s

kube-system kube-flannel-ds-l594f 100m (0%) 100m (0%) 50Mi (0%) 50Mi (0%) 11d

kube-system kube-proxy-vckxf 0 (0%) 0 (0%) 0 (0%) 0 (0%) 39d

Allocated resources:

(Total limits may be over100percent, i.e., overcommitted.)

Resource Requests Limits-------- -------- ------cpu 20100m (62%) 100m (0%)

memory 45106Mi (70%) 61490Mi (95%)

ephemeral-storage 0 (0%) 0 (0%)

Events:

resources

[rɪˈsɔːsiz]

X

n. [计][环境] 资源;物力(resource的复数)

v. 向…提供资金(resource的第三人称单数)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值