前言
看到这篇文章的同学相信对K8S已经有一定的了解了,现在很多公司都使用K8S来部署自己的应用,使用K8S来发布应用确实很好,但是日志查看方面还是有诸多的不便,比如主机权限不能随便开放,并且公司层面的K8S 集群内应用数和副本数都比较多,问题定位时不知道从那个副本的日志开始查看,还有一个是容器的重启后想查看重启之前的日志等等问题,造成了在K8S集群查看日志的诸多不便。
所以引申出下面的方案设计
方案说明
此方案 使用 ELK +Filebeat 的架构。
本来 ELK 就能实现日志采集,查询和展示,由于 logstash资源占用比较大(800M+内存),官方使用 GO语言重写了一个Filebeat实现了部分 logstash的功能,资源占用( 30M+) 指数级的资源占用下降,但是日志解析方面还做得不到位,所以我们使用 filebeat采集日志, logstash来进行解析的设计。
Filebeat 使用DaemonSet 部署负责收集每个节点的所有应用日志数据上报 Logstash ,Logstash 对数据进行一次处理,推送到 ES ,最后使用 Kibana进行展示 。
架构图
画了一个简易架构图来说明:
K8S的应用,默认会把标准输出日志,生成一个文件,通过软链接的方式挂载到节点主机的 /var/log/containers 目录,我们在每个节点部署一个Filebeat去采集这个目录下的内容就OK了
安装部署
相关的chart已经上传 github.com 点击此处 ELK+Filbeat 下载原码
或者 码云下载
我们这里使用 helm来安装,helm安装特别简单,没啥说的,主要是chart编写,详情这里就不说了,点击之处了解 ,安装命令在最后,现把主要的编排内容摘抄如下:
ES编排文件
kind: StatefulSet
metadata:
name: enervated-puma-elasticsearch
labels:
heritage: "Tiller"
release: "enervated-puma"
chart: "elasticsearch"
app: "enervated-puma-elasticsearch"
annotations:
esMajorVersion: "7"
spec:
serviceName: enervated-puma-elasticsearch-headless
selector:
matchLabels:
app: "enervated-puma-elasticsearch"
replicas: 1
podManagementPolicy: Parallel
updateStrategy:
type: RollingUpdate
template:
metadata:
name: "enervated-puma-elasticsearch"
labels:
heritage: "Tiller"
release: "enervated-puma"
chart: "elasticsearch"
app: "enervated-puma-elasticsearch"
annotations:
spec:
securityContext:
fsGroup: 1000
runAsUser: 1000
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- "enervated-puma-elasticsearch"
topologyKey: kubernetes.io/hostname
terminationGracePeriodSeconds: 120
volumes:
initContainers:
- name: configure-sysctl
securityContext:
runAsUser: 0
privileged: true
image: "docker.elastic.co/elasticsearch/elasticsearch:7.5.0"
imagePullPolicy: "Always"
command: ["sysctl", "-w", "vm.max_map_count=262144"]
resources:
{}
containers:
- name: "elasticsearch"
securityContext:
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 1000
image: "docker.elastic.co/elasticsearch/elasticsearch:7.5.0"
imagePullPolicy: "Always"
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 3
timeoutSeconds: 5
exec:
command:
- sh
- -c
- |
#!/usr/bin/env bash -e
# If the node is starting up wait for the cluster to be ready (request params: 'wait_for_status=green&timeout=1s' )
# Once it has started only check that the node itself is responding
START_FILE=/tmp/.es_start_file
http () {
local path="${1}"
if [ -n "${ELASTIC_USERNAME}" ] && [ -n "${ELASTIC_PASSWORD}" ]; then
BASIC_AUTH="-u ${ELASTIC_USERNAME}:${ELASTIC_PASSWORD}"
else
BASIC_AUTH=''
fi
curl -XGET -s -k --fail ${BASIC_AUTH} http://127.0.0.1:9200${path}
}
if [ -f "${START_FILE}" ]; then
echo 'Elasticsearch is already running, lets check the node is healthy and there are master nodes available'
http "/_cluster/health?timeout=0s"
else
echo 'Waiting for elasticsearch cluster to become cluster to be ready (request params: "wait_for_status=green&timeout=1s" )'
if http "/_cluster/health?wait_for_status=green&timeout=1s" ; then
touch ${START_FILE}
exit 0
else
echo 'Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )'
exit 1
fi
fi
ports:
- name: http
containerPort: 9200
- name: transport
containerPort: 9300
resources:
limits:
cpu: 1000m
memory: 4Gi
requests:
cpu: 100m
memory: 2Gi
env:
- name: node.name
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: cluster.initial_master_nodes
value: "enervated-puma-elasticsearch-0,"
- name: discovery.seed_hosts
value: "enervated-puma-elasticsearch-headless"
- name: cluster.name
value: "enervated-puma"
- name: network.host
value: "0.0.0.0"
- name: ES_JAVA_OPTS
value: "-Xmx1g -Xms1g"
- name: node.data
value: "true"
- name: node.ingest
value: "true"
- name: node.master
value: "true"
volumeMounts:
logstash编排文件
apiVersion: apps/v1
kind: StatefulSet
metadata:
annotations:
log.mount.launch.status: '[]'
log.mount.policy.launcher: '[]'
meta.helm.sh/release-name: lsh-mcp-logstash
meta.helm.sh/release-namespace: default
creationTimestamp: "2021-03-17T07:49:03Z"
generation: 1
labels:
app: lsh-mcp-logstash-lsh-mcp-logstash
app.kubernetes.io/managed-by: Helm
chart: lsh-mcp-logstash
heritage: Helm
release: lsh-mcp-logstash
name: lsh-mcp-logstash-lsh-mcp-logstash
namespace: default
resourceVersion: "2422055"
selfLink: /apis/apps/v1/namespaces/default/statefulsets/lsh-mcp-logstash-lsh-mcp-logstash
uid: bce7a7ce-9765-493c-b2b8-8fe2bfdbe0da
spec:
podManagementPolicy: Parallel
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: lsh-mcp-logstash-lsh-mcp-logstash
release: lsh-mcp-logstash
serviceName: lsh-mcp-logstash-lsh-mcp-logstash
template:
metadata:
annotations:
configchecksum: 4c78faf0f98fb6aa26019a156449fec2197a7b42cbfc7a666c879aadf4875a8
pipelinechecksum: 9ccfaf557f6835ce7ae626767421d6e50bcd859ffd08564f5271722bcc5f022
creationTimestamp: null
labels:
app: lsh-mcp-logstash-lsh-mcp-logstash
chart: lsh-mcp-logstash
heritage: Helm
release: lsh-mcp-logstash
name: lsh-mcp-logstash-lsh-mcp-logstash
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- lsh-mcp-logstash-lsh-mcp-logstash
topologyKey: kubernetes.io/hostname
containers:
- env:
- name: LS_JAVA_OPTS
value: -Xmx1g -Xms1g
image: docker.elastic.co/logstash/logstash:7.5.0
imagePullPolicy: IfNotPresent
name: lsh-mcp-logstash
resources:
limits:
cpu: "1"
memory: 1536Mi
requests:
cpu: 100m
memory: 500Mi
securityContext:
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 1000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/share/logstash/config/logstash.yml
name: logstashconfig
subPath: logstash.yml
- mountPath: /usr/share/logstash/pipeline
name: logstashpipeline
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1000
runAsUser: 1000
terminationGracePeriodSeconds: 120
volumes:
- configMap:
defaultMode: 420
name: lsh-mcp-logstash-lsh-mcp-logstash-config
name: logstashconfig
- configMap:
defaultMode: 420
name: lsh-mcp-logstash-lsh-mcp-logstash-pipeline
name: logstashpipeline
updateStrategy:
type: RollingUpdate
kibana编排文件
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
log.mount.launch.status: '[]'
log.mount.policy.launcher: '[]'
meta.helm.sh/release-name: lsh-mcp-kibana
meta.helm.sh/release-namespace: default
creationTimestamp: "2021-03-17T01:43:57Z"
generation: 1
labels:
app: lsh-mcp-kibana
app.kubernetes.io/managed-by: Helm
heritage: Helm
release: lsh-mcp-kibana
name: lsh-mcp-kibana-lsh-mcp-kibana
namespace: default
resourceVersion: "2533966"
selfLink: /apis/extensions/v1beta1/namespaces/default/deployments/lsh-mcp-kibana-lsh-mcp-kibana
uid: 449fe647-4767-4f0f-8b06-b53c2c4a30b8
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: lsh-mcp-kibana
release: lsh-mcp-kibana
strategy:
type: Recreate
template:
metadata:
creationTimestamp: null
labels:
app: lsh-mcp-kibana
release: lsh-mcp-kibana
spec:
containers:
- env:
- name: ELASTICSEARCH_HOSTS
value: http://lsh-mcp-elasticsearch-lsh-mcp-elasticsearch:9200
- name: SERVER_HOST
value: 0.0.0.0
- name: NODE_OPTIONS
value: --max-old-space-size=1800
image: docker.elastic.co/kibana/kibana:7.5.0
imagePullPolicy: Always
name: kibana
ports:
- containerPort: 5601
protocol: TCP
readinessProbe:
exec:
command:
- sh
- -c
- |
#!/usr/bin/env bash -e
# Disable nss cache to avoid filling dentry cache when calling curl
# This is required with Kibana Docker using nss < 3.52
export NSS_SDB_USE_CACHE=no
http () {
local path="${1}"
set -- -XGET -s --fail -L
if [ -n "${ELASTICSEARCH_USERNAME}" ] && [ -n "${ELASTICSEARCH_PASSWORD}" ]; then
set -- "$@" -u "${ELASTICSEARCH_USERNAME}:${ELASTICSEARCH_PASSWORD}"
fi
STATUS=$(curl --output /dev/null --write-out "%{http_code}" -k "$@" "http://localhost:5601${path}")
if [[ "${STATUS}" -eq 200 ]]; then
exit 0
fi
echo "Error: Got HTTP code ${STATUS} but expected a 200"
exit 1
}
http "/app/kibana"
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 3
timeoutSeconds: 5
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: 100m
memory: 800Mi
Filebeat编排文件
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
creationTimestamp: "2021-03-01T10:46:09Z"
generation: 1
labels:
app: lsh-mcp-filebeat-lsh-mcp-filebeat
chart: lsh-mcp-filebeat-7.5.0
heritage: Tiller
release: lsh-mcp-filebeat
name: lsh-mcp-filebeat
namespace: default
resourceVersion: "10275"
selfLink: /apis/extensions/v1beta1/namespaces/default/daemonsets/lsh-mcp-filebeat
uid: 78427beb-d2f4-40e0-ad9c-acfb943e0d2a
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
app: lsh-mcp-filebeat-lsh-mcp-filebeat
template:
metadata:
annotations:
configChecksum: 2f66a41b31553cfe7a7e89de6648f4208a638e11dc101b89295807fdd7123ad
creationTimestamp: null
labels:
app: lsh-mcp-filebeat-lsh-mcp-filebeat
chart: lsh-mcp-filebeat-7.5.0
heritage: Tiller
release: lsh-cluster-csm-log-agent
name: lsh-mcp-filebeat-lsh-mcp-filebeat
spec:
containers:
- args:
- -e
- -E
- http.enabled=true
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
image: docker.elastic.co/beats/filebeat:7.5.0
imagePullPolicy: Always
name: lsh-mcp-filebeat
resources:
limits:
cpu: "1"
memory: 400Mi
requests:
cpu: 100m
memory: 100Mi
securityContext:
privileged: false
runAsUser: 0
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/share/filebeat/filebeat.yml
name: filebeat-config
readOnly: true
subPath: filebeat.yml
- mountPath: /usr/share/filebeat/config
name: filebeat-reload-config
- mountPath: /usr/share/filebeat/data
name: data
- mountPath: /var/lib/docker/containers
name: varlibdockercontainers
readOnly: true
- mountPath: /data/docker/containers
name: data-docker-containers
- mountPath: /var/log
name: varlog
readOnly: true
- mountPath: /var/run/docker.sock
name: varrundockersock
readOnly: true
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: lsh-mcp-filebeat-lsh-mcp-filebeat
serviceAccountName: lsh-mcp-filebeat-lsh-mcp-filebeat
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 384
name: lsh-mcp-filebeat-config
name: filebeat-config
- configMap:
defaultMode: 420
name: lsh-mcp-filebeat-reload-config
name: filebeat-reload-config
- hostPath:
path: /var/lib/lsh-mcp-filebeat-lsh-mcp-filebeat-default-data
type: DirectoryOrCreate
name: data
- hostPath:
path: /var/lib/docker/containers
type: ""
name: varlibdockercontainers
- hostPath:
path: /var/log
type: ""
name: varlog
- hostPath:
path: /data/docker/containers
type: ""
name: data-docker-containers
- hostPath:
path: /var/run/docker.sock
type: ""
name: varrundockersock
安装部署中的一些问题总结
ELK的部署不需要修改内容,这里讲讲 Filebeat的采集配置内容, filebeat/values.yaml 文件下的配置解释
filebeat.yml: |
filebeat.inputs:
- type: container
paths:
- /var/log/containers/*.log
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
multiline.type: pattern
multiline.pattern: '^[[:space:]]+(at|\.{3})[[:space:]]+\b|^Caused by:'
multiline.negate: false
multiline.match: after
output.logstash:
hosts: ["lsh-mcp-logstash-lsh-mcp-logstash"]
根据 ES官方文档 容器类型日志采集,类型一定要设置成 type: container,这样才会附带容器的一些信息:比如 pod 的命名空间,labels 值 等等,方便后面搜索过滤日志
multiline.type: pattern
multiline.pattern: '^[[:space:]]+(at|\.{3})[[:space:]]+\b|^Caused by:'
multiline.negate: false
multiline.match: after
multiline 相关配置是多行合并,java等应用一个事件有时会输出成多行日志,不方便查看 ,使用此配置进行合并 详情见官方文档
K8S 标准输入文件的路径:paths : /var/log/containers/*.log
另外有一个需要特别注意的地方就是 ,K8S标准输出文件是一个经过两次软链接汇总到一起的文件,需要把两次软链接指向的目录都通过目录形式挂载到 filebeat pod内,不然会采集不到内容。挂载配置如下
volumeMounts:
- name: data
mountPath: /usr/share/filebeat/data
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: data-docker-containers
mountPath: /data/docker/containers
- name: varlog
mountPath: /var/log
readOnly: true
volumes:
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: varlog
hostPath:
path: /var/log
- name: data-docker-containers
hostPath:
path: /data/docker/containers
此部署的ELK,除filebeat外都是单副本 ,采集所有的K8S pod 标准输出日志,ES以 pod 标准输出文件名称创建 index,如需要更多的副本,请自行修改 values.yaml 中的 replicas: 1
如果数据量大需要考虑添加缓存:kafka reids 等 filebeat数据推送到缓存,logstash从缓存中取出数据处理到Es中
kibanan 的端口设置的是 :30961
如需要修改开放端口号请修改 kibana/values.yaml 下的 nodePort: 30961 为自己想要开放的端口
service:
type: NodePort
loadBalancerIP: ""
port: 5601
nodePort: 30961
以上提到的内容都已经在chart中修改过了,只需要执行下面的安装命令就可以部署
安装命令
使用 helm3 安装
安装 es
helm3 install elasticsearch lsh-mcp-elasticsearch
安装完成后查看 pod 运行情况
安装 logstash
helm3 install logstash lsh-mcp-logstash
安装完成后查看 pod 运行情况
安装 filebeat
helm3 install filebeat lsh-mcp-filebeat
安装完成后查看 pod 运行情况
安装 kibana
helm3 install kibana lsh-mcp-kibana
安装完成后查看 pod 运行情况
如果使用的是 hlem2 则命令是 helm install --name elasticsearch lsh-mcp-elasticsearch
其它同上 以helm 开头 --name 指定安装名称 后面一个参数是文件夹名称
最后的展示效果: