服务器部署 配置jetty运行参数_skywalking—docker镜像构建k8s部署

fe1156ecb7323fb6cf661bd862b284b2.png

前言

skywalking是个非常不错的apm产品,但是在使用过程中有个非常蛋疼的问题,在基于es的存储情况下,es的数据一有问题,就会导致整个skywalking web ui服务不可用,然后需要agent端一个服务一个服务的停用,然后服务重新部署后好,全部走一遍。这种问题同样也会存在skywalking的版本升级迭代中。而且apm 这种过程数据是允许丢弃的,默认skywalking中关于trace的数据记录只保存了90分钟。故博主准备将skywalking的部署容器化,一键部署升级。下文是整个skywalking 容器化部署的过程。

目标:将skywalking的docker镜像运行在k8s的集群环境中提供服务

docker镜像构建

FROM registry.cn-xx.xx.com/keking/jdk:1.8ADD apache-skywalking-apm-incubating/  /opt/apache-skywalking-apm-incubating/RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai  /etc/localtime     && echo 'Asia/Shanghai' >/etc/timezone     && chmod +x /opt/apache-skywalking-apm-incubating/config/setApplicationEnv.sh     && chmod +x /opt/apache-skywalking-apm-incubating/webapp/setWebAppEnv.sh     && chmod +x /opt/apache-skywalking-apm-incubating/bin/startup.sh     && echo "tail -fn 100 /opt/apache-skywalking-apm-incubating/logs/webapp.log" >> /opt/apache-skywalking-apm-incubating/bin/startup.shEXPOSE 8080 10800 11800 12800CMD /opt/apache-skywalking-apm-incubating/config/setApplicationEnv.sh      && sh /opt/apache-skywalking-apm-incubating/webapp/setWebAppEnv.sh      && /opt/apache-skywalking-apm-incubating/bin/startup.sh

在编写Dockerfile时需要考虑几个问题:skywalking中哪些配置需要动态配置(运行时设置)?怎么保证进程一直运行(skywalking 的startup.sh和tomcat中 的startup.sh类似)?

application.yml

#cluster:#  zookeeper:#    hostPort: localhost:2181#    sessionTimeout: 100000naming:  jetty:    #OS real network IP(binding required), for agent to find collector cluster    host: 0.0.0.0    port: 10800    contextPath: /cache:#  guava:  caffeine:remote:  gRPC:    # OS real network IP(binding required), for collector nodes communicate with each other in cluster. collectorN --(gRPC) --> collectorM    host: #real_host    port: 11800agent_gRPC:  gRPC:    #os real network ip(binding required), for agent to uplink data(trace/metrics) to collector. agent--(grpc)--> collector    host: #real_host    port: 11800    # Set these two setting to open ssl    #sslCertChainFile: $path    #sslPrivateKeyFile: $path    # Set your own token to active auth    #authentication: xxxxxxagent_jetty:  jetty:    # OS real network IP(binding required), for agent to uplink data(trace/metrics) to collector through HTTP. agent--(HTTP)--> collector    # SkyWalking native Java/.Net/node.js agents don't use this.    # Open this for other implementor.    host: 0.0.0.0    port: 12800    contextPath: /analysis_register:  default:analysis_jvm:  default:analysis_segment_parser:  default:    bufferFilePath: ../buffer/    bufferOffsetMaxFileSize: 10M    bufferSegmentMaxFileSize: 500M    bufferFileCleanWhenRestart: trueui:  jetty:    # Stay in `localhost` if UI starts up in default mode.    # Change it to OS real network IP(binding required), if deploy collector in different machine.    host: 0.0.0.0    port: 12800    contextPath: /storage:  elasticsearch:    clusterName: #elasticsearch_clusterName    clusterTransportSniffer: true    clusterNodes: #elasticsearch_clusterNodes    indexShardsNumber: 2    indexReplicasNumber: 0    highPerformanceMode: true    # Batch process setting, refer to https://www.elastic.co/guide/en/elasticsearch/client/java-api/5.5/java-docs-bulk-processor.html    bulkActions: 2000 # Execute the bulk every 2000 requests    bulkSize: 20 # flush the bulk every 20mb    flushInterval: 10 # flush the bulk every 10 seconds whatever the number of requests    concurrentRequests: 2 # the number of concurrent requests    # Set a timeout on metric data. After the timeout has expired, the metric data will automatically be deleted.    traceDataTTL: 2880 # Unit is minute    minuteMetricDataTTL: 90 # Unit is minute    hourMetricDataTTL: 36 # Unit is hour    dayMetricDataTTL: 45 # Unit is day    monthMetricDataTTL: 18 # Unit is month#storage:#  h2:#    url: jdbc:h2:~/memorydb#    userName: saconfiguration:  default:    #namespace: xxxxx    # alarm threshold    applicationApdexThreshold: 2000    serviceErrorRateThreshold: 10.00    serviceAverageResponseTimeThreshold: 2000    instanceErrorRateThreshold: 10.00    instanceAverageResponseTimeThreshold: 2000    applicationErrorRateThreshold: 10.00    applicationAverageResponseTimeThreshold: 2000    # thermodynamic    thermodynamicResponseTimeStep: 50    thermodynamicCountOfResponseTimeSteps: 40    # max collection's size of worker cache collection, setting it smaller when collector OutOfMemory crashed.    workerCacheMaxSize: 10000#receiver_zipkin:#  default:#    host: localhost#    port: 9411#    contextPath: /

webapp.yml

ed7e3bf2ee82d8853b6aa62a2fdd242a.png

动态配置:密码,grpc等需要绑定主机的ip都需要运行时设置,这里我们在启动skywalking的startup.sh只之前,先执行了两个设置配置的脚本,通过k8s在运行时设置的环境变量来替换需要动态配置的参数

setApplicationEnv.sh

#!/usr/bin/env shsed -i "s/#elasticsearch_clusterNodes/${elasticsearch_clusterNodes}/g" /opt/apache-skywalking-apm-incubating/config/application.ymlsed -i "s/#elasticsearch_clusterName/${elasticsearch_clusterName}/g" /opt/apache-skywalking-apm-incubating/config/application.ymlsed -i "s/#real_host/${real_host}/g" /opt/apache-skywalking-apm-incubating/config/application.yml

setWebAppEnv.sh

#!/usr/bin/env shsed -i "s/#skywalking_password/${skywalking_password}/g" /opt/apache-skywalking-apm-incubating/webapp/webapp.ymlsed -i "s/#real_host/${real_host}/g" /opt/apache-skywalking-apm-incubating/webapp/webapp.yml

保持进程存在:通过在skywalking 启动脚本startup.sh末尾追加"tail -fn 100 /opt/apache-skywalking-apm-incubating/logs/webapp.log",来让进程保持运行,并不断输出webapp.log的日志

Kubernetes中部署

apiVersion: extensions/v1beta1kind: Deploymentmetadata:  name: skywalking  namespace: uatspec:  replicas: 1  selector:    matchLabels:      app: skywalking  template:    metadata:      labels:        app: skywalking    spec:      imagePullSecrets:      - name: registry-pull-secret      nodeSelector:         apm: skywalking      containers:      - name: skywalking        image: registry.cn-xx.xx.com/keking/kk-skywalking:5.2        imagePullPolicy: Always        env:        - name: elasticsearch_clusterName          value: elasticsearch        - name: elasticsearch_clusterNodes          value: 172.16.16.129:31300        - name: skywalking_password          value: xxx        - name: real_host          valueFrom:            fieldRef:              fieldPath: status.podIP        resources:          limits:            cpu: 1000m            memory: 4Gi          requests:            cpu: 700m            memory: 2Gi---apiVersion: v1kind: Servicemetadata:  name: skywalking  namespace: uat  labels:    app: skywalkingspec:  selector:    app: skywalking  ports:  - name: web-a    port: 8080    targetPort: 8080    nodePort: 31180  - name: web-b    port: 10800    targetPort: 10800    nodePort: 31181  - name: web-c    port: 11800    targetPort: 11800    nodePort: 31182  - name: web-d    port: 12800    targetPort: 12800    nodePort: 31183  type: NodePort

Kubernetes部署脚本中唯一需要注意的就是env中关于pod ip的获取,skywalking中有几个ip必须绑定容器的真实ip,这个地方可以通过环境变量设置到容器里面去

结语

整个skywalking容器化部署从测试到可用大概耗时1天,其中花了个多小时整了下谭兄的skywalking-docker镜像(https://hub.docker.com/r/wutang/skywalking-docker/),发现有个脚本有权限问题(谭兄反馈已解决,还没来的及测试),以及有几个地方自己不是很好控制,便build了自己的docker镜像,其中最大的问题还是解决集群中网络通讯的问题,一开始我把skywalking中的服务ip都设置为0.0.0.0,然后通过集群的nodePort映射出来,这个时候的agent通过集群ip+31181是可以访问到naming服务的,然后通过naming服务获取到的collector gRPC服务缺变成了0.0.0.0:11800, 这个地址agent肯定访问不到collector的,后面通过绑定pod ip的方式解决了这个问题。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值