《OpenShift / RHEL / DevSecOps 汇总目录》
说明:本文已经在 OpenShift 4.18 + Tempo Operator 0.14.1-1 + Red Hat build of OpenTelemetry Operator 0.113.0-1 的环境中验证
文章目录
分布式跟踪中的 OpenTelemetry 和 Tempo
OpenTelemetry 是一个中立的、开源的可观测性框架,提供了一套标准化的 API、SDK 和工具,用于生成、收集、处理和导出遥测数据,包括指标(Metrics)、日志(Logs)和追踪(Tracing)。它旨在为不同的编程语言和系统提供统一的可观测性解决方案,使得开发者能够更方便地在应用程序中集成可观测性功能,并将数据发送到各种后端分析系统。
Tempo Tempo 是 Grafana 公司推出的可对云原生应用实现请求跟踪的开源产品。它是一个专门用于处理分布式追踪数据的开源项目,主要聚焦于大规模分布式系统中的追踪数据存储和查询。它旨在提供高效、可扩展的追踪数据后端服务,以帮助用户快速地存储、查询和分析分布式追踪数据,从而更好地理解分布式系统的性能和行为。它和 Jaeger 的功能大体相当,首先可以接收从被跟踪目标通过 OpenTelemetry 或 ZIPKIN 发送的跟踪数据,然后集中保存跟踪数据,最后通过 Jaeger 或 Grafana 界面进行展现。
OpenTelemetry 适用于需要在应用程序中全面集成可观测性功能的场景,无论是单体应用还是分布式系统。它可以帮助开发者在不同的编程语言和框架中实现统一的可观测性数据采集和上报,以便更好地监控和理解应用程序的内部运行状态,进行故障排查、性能优化等。
OpenTelemetry 本身并不直接负责数据的存储,它可以将采集到的分布式追踪数据发送给 Tempo 进行存储和查询。通过在应用程序中集成 OpenTelemetry 的追踪功能,并配置相应的 Exporter,就可以将追踪数据直接发送到 Tempo 后端,实现数据的无缝对接。
Tempo 作为专门的追踪数据存储系统,采用了多种优化技术来提高数据的存储和查询效率。Tempo 主要应用于大规模分布式系统的追踪数据处理场景,特别是对于那些对追踪数据的存储和查询性能有较高要求的系统。例如,在微服务架构中,当需要快速定位和解决跨多个服务的性能问题时,Tempo 可以作为后端存储和查询引擎,为分布式追踪提供高效的支持。
安装依赖环境
安装 Operator
- 使用缺省配置安装由 Red Hat 发布的 Tempo Operator。
- 使用缺省配置安装 Red Hat build of OpenTelemetry Operator。
安装 minio 环境
- 创建一个项目。
$ oc new-project minio
- 在 minio 项目中根据以下 YAML 部署 minio 资源。
$ oc -n minio apply -f - << EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
labels:
app.kubernetes.io/name: minio
name: minio
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: minio
spec:
selector:
matchLabels:
app.kubernetes.io/name: minio
strategy:
type: Recreate
template:
metadata:
labels:
app.kubernetes.io/name: minio
spec:
containers:
- command:
- /bin/sh
- -c
- |
mkdir -p /storage/tempo && \
minio server /storage
env:
- name: MINIO_ACCESS_KEY
value: tempo
- name: MINIO_SECRET_KEY
value: supersecret
image: quay.io/minio/minio
name: minio
ports:
- containerPort: 9000
volumeMounts:
- mountPath: /storage
name: storage
volumes:
- name: storage
persistentVolumeClaim:
claimName: minio
---
apiVersion: v1
kind: Service
metadata:
name: minio
spec:
ports:
- port: 9000
protocol: TCP
targetPort: 9000
selector:
app.kubernetes.io/name: minio
type: ClusterIP
EOF
安装配置 Tempo 环境
配置标准 TempoStack 环境
- 创建一个项目。
$ oc new-project tempo-demo
- 在 tempo-demo 项目中根据以下 YAML 创建能访问 minio 服务的 Secret。
$ oc apply -f - << EOF
apiVersion: v1
kind: Secret
metadata:
name: minio
namespace: tempo-demo
stringData:
endpoint: http://minio.minio.svc:9000
bucket: tempo
access_key_id: tempo
access_key_secret: supersecret
type: Opaque
EOF
- 在 tempo-demo 项目中根据以下 YAML 创建一个 TempoStack 实例,其中用到上一步创建的 Secret。
$ oc apply -f - << EOF
apiVersion: tempo.grafana.com/v1alpha1
kind: TempoStack
metadata:
name: my-tempo
namespace: tempo-demo
spec:
storage:
secret:
name: minio
type: s3
storageSize: 1Gi
resources:
total:
limits:
memory: 2Gi
cpu: 2000m
template:
queryFrontend:
jaegerQuery:
enabled: true
ingress:
route:
termination: edge
type: route
EOF
- 查看 Tempo 相关的 deployment 和 statefulset。
$ oc get deployment,statefulset -n tempo-demo
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/tempo-my-tempo-compactor 1/1 1 1 30m
deployment.apps/tempo-my-tempo-distributor 1/1 1 1 30m
deployment.apps/tempo-my-tempo-querier 1/1 1 1 30m
deployment.apps/tempo-my-tempo-query-frontend 1/1 1 1 30m
NAME READY AGE
statefulset.apps/tempo-my-tempo-ingester 1/1 30m
- 查看 Tempo 相关服务,其中 tempo-my-tempo-distributor 是应用端的 OTEL 将跟踪到的数据推送的目标服务。
$ oc get svc -n tempo-demo -l app.kubernetes.io/name=tempo
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
tempo-my-tempo-compactor ClusterIP 10.217.4.119 <none> 7946/TCP,3200/TCP 16m
tempo-my-tempo-distributor ClusterIP 10.217.5.226 <none> 4317/TCP,3200/TCP 16m
tempo-my-tempo-gossip-ring ClusterIP None <none> 7946/TCP 16m
tempo-my-tempo-ingester ClusterIP 10.217.4.249 <none> 3200/TCP,9095/TCP 16m
tempo-my-tempo-querier ClusterIP 10.217.4.88 <none> 7946/TCP,3200/TCP,9095/TCP 16m
tempo-my-tempo-query-frontend ClusterIP 10.217.4.144 <none> 3200/TCP,9095/TCP,16686/TCP,16687/TCP 16m
tempo-my-tempo-query-frontend-discovery ClusterIP None <none> 3200/TCP,9095/TCP,9096/TCP,16686/TCP,16687/TCP 16m
- 获取名为 tempo-my-tempo-query-frontend 的 Route 访问地址。用浏览器访问 Route 地址,即是 Jaeger UI 界面。
$ oc get route tempo-my-tempo-query-frontend -n tempo-demo -o jsonpath='{.spec.host}'
测试标准 TempoStack 环境
测试应用1
- 在 trace-app 项目中运行一个测试应用。
说明:测试应用使用了手动 Instrumentation 方式获取的跟踪数据,应用代码参见 https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/tracegen/v0.55.0/tracegen/main.go
注意:跟踪数据推送到 Tempo 的 tempo-my-tempo-distributor.tempo-demo.svc:4317 接收服务。
$ oc new-project trace-app
$ oc apply -f - << EOF
apiVersion: batch/v1
kind: Job
metadata:
name: tracegen
namespace: trace-app
spec:
template:
spec:
containers:
- name: tracegen
image: ghcr.io/open-telemetry/opentelemetry-collector-contrib/tracegen:latest
command:
- "./tracegen"
args:
- -otlp-endpoint=tempo-my-tempo-distributor.tempo-demo.svc:4317
- -otlp-insecure
- -duration=1800s
- -workers=1
restartPolicy: Never
backoffLimit: 4
EOF
- 刷新 Jaeger 控制台,选择 tracegen 服务后可以看到对该请求的跟踪情况。
测试应用2
- 在 trace-app 项目中基于以下 YAML 部署测试应用。
说明:测试应用使用了手动 Instrumentation 方式获取的跟踪数据,应用代码参见 https://github.com/rbaumgar/otelcol-demo-app/blob/main/src/main/java/org/acme/opentelemetry/TracedResource.java
注意:其中在 Deployment 中对 OTELCOL_SERVER 参数使用了以上 tempo-my-tempo-distributor 服务的地址。
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: otel-demo-app
name: otel-demo-app
spec:
replicas: 1
selector:
matchLabels:
app: otel-demo-app
template:
metadata:
labels:
app: otel-demo-app
spec:
containers:
- image: quay.io/rbaumgar/otelcol-demo-app-jvm
imagePullPolicy: IfNotPresent
name: otel-demo-app
env:
- name: OTELCOL_SERVER
value: 'http://tempo-my-tempo-distributor.tempo-demo.svc:4317'
---
apiVersion: v1
kind: Service
metadata:
labels:
app: otel-demo-app
name: otel-demo-app
spec:
ports:
- port: 8080
protocol: TCP
targetPort: 8080
name: web
selector:
app: otel-demo-app
type: ClusterIP
---
apiVersion: route.openshift.io/v1
kind: Route
metadata:
labels:
app: otel-demo-app
name: otel-demo-app
spec:
path: /
to:
kind: Service
name: otel-demo-app
port:
targetPort: web
- 执行以下命令访问测试应用。
$ export URL=$(oc get route otel-demo-app -n trace-app -o jsonpath='{.spec.host}')
$ curl $URL/hello
hello
$ curl $URL/sayHello/demo1
hello: demo1
$ curl $URL/sayRemote/demo2
hello: demo2 from http://otel-demo-app-trace-app.apps-crc.testing/
- 刷新 Jaeger 控制台,选择 my-service 服务后可以看到对该请求的跟踪情况。
配置和测试多租户 TempoStack 环境
Tempo 是一个支持多租户的分布式追踪系统。它通过使用一个名为X-Scope-OrgID的header来实现多租户支持。
- 创建一个项目。
$ oc new-project observability
- 在 observability 项目中根据以下 YAML 创建能访问 minio 服务的 Secret。
$ oc apply -f - << EOF
apiVersion: v1
kind: Secret
metadata:
name: minio
namespace: observability
stringData:
endpoint: http://minio.minio.svc:9000
bucket: tempo
access_key_id: tempo
access_key_secret: supersecret
type: Opaque
EOF
- 创建一个带有 tenants 配置的 TempoStack,其中的
tenantId
只要不重复就可以。
$ oc apply -f - << EOF
apiVersion: tempo.grafana.com/v1alpha1
kind: TempoStack
metadata:
name: multitent-tempo
namespace: observability
spec:
storage:
secret:
name: minio
type: s3
storageSize: 1Gi
resources:
total:
limits:
memory: 2Gi
cpu: 2000m
tenants:
mode: openshift
authentication:
- tenantName: dev
tenantId: "1610b0c3-c509-4592-a256-a1871353dbfa"
- tenantName: prod
tenantId: "6094b0f1-711d-4395-82c0-30c2720c6648"
template:
gateway:
enabled: true
queryFrontend:
jaegerQuery:
enabled: true
EOF
- 确认有以下资源被创建。
$ oc get route,svc,deploy,statefulset -n observability
NAME HOST/PORT AME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
route.route.openshift.io/tempo-multitent-tempo-gateway tempo-multitent-tempo-gateway-observability.apps-crc.testing tempo-multitent-tempo-gateway public passthrough None
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/tempo-multitent-tempo-compactor ClusterIP 10.217.4.191 <none> 7946/TCP,3200/TCP 91s
service/tempo-multitent-tempo-distributor ClusterIP 10.217.5.179 <none> 4318/TCP,4317/TCP,3200/TCP 91s
service/tempo-multitent-tempo-gateway ClusterIP 10.217.4.177 <none> 8090/TCP,8081/TCP,8080/TCP 91s
service/tempo-multitent-tempo-gossip-ring ClusterIP None <none> 7946/TCP 91s
service/tempo-multitent-tempo-ingester ClusterIP 10.217.5.197 <none> 3200/TCP,9095/TCP 91s
service/tempo-multitent-tempo-querier ClusterIP 10.217.4.104 <none> 7946/TCP,3200/TCP,9095/TCP 91s
service/tempo-multitent-tempo-query-frontend ClusterIP 10.217.5.83 <none> 3200/TCP,9095/TCP,16685/TCP,16686/TCP,16687/TCP 91s
service/tempo-multitent-tempo-query-frontend-discovery ClusterIP None <none> 3200/TCP,9095/TCP,9096/TCP,16685/TCP,16686/TCP,16687/TCP 91s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/tempo-multitent-tempo-compactor 1/1 1 1 91s
deployment.apps/tempo-multitent-tempo-distributor 1/1 1 1 91s
deployment.apps/tempo-multitent-tempo-gateway 1/1 1 1 91s
deployment.apps/tempo-multitent-tempo-querier 1/1 1 1 91s
deployment.apps/tempo-multitent-tempo-query-frontend 1/1 1 1 91s
NAME READY AGE
statefulset.apps/tempo-multitent-tempo-ingester 1/1 91s
- 创建 2 个 ClusterRole,一个允许 get,一个允许 create。并且属于 system:authenticated 组的用户有 tempostack-traces-reader 角色,而只有名新建的名为 dev-collector 的 ServiceAccount 才有 tempostack-traces-write 的角色。
$ oc apply -f - << EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: tempostack-traces-reader
rules:
- apiGroups:
- 'tempo.grafana.com'
resources:
- dev
resourceNames:
- traces
verbs:
- 'get'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: tempostack-traces-reader
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: tempostack-traces-reader
subjects:
- kind: Group
apiGroup: rbac.authorization.k8s.io
name: system:authenticated
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: dev-collector
namespace: observability
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: tempostack-traces-write
rules:
- apiGroups:
- 'tempo.grafana.com'
resources:
- dev
resourceNames:
- traces
verbs:
- 'create'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: tempostack-traces
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: tempostack-traces-write
subjects:
- kind: ServiceAccount
name: dev-collector
namespace: observability
EOF
- 为 dev 租户创建一个 OpenTelemetryCollector,它将与 Tempo 的 tempo-multitent-tempo-gateway 服务读写数据。
$ oc apply -f - << EOF
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
name: dev
namespace: observability
spec:
serviceAccount: dev-collector
config:
extensions:
bearertokenauth:
filename: "/var/run/secrets/kubernetes.io/serviceaccount/token"
receivers:
otlp:
protocols:
grpc: {}
http: {}
jaeger:
protocols:
thrift_binary: {}
thrift_compact: {}
thrift_http: {}
grpc: {}
exporters:
debug: {}
otlp:
endpoint: tempo-multitent-tempo-gateway.observability.svc:8090
tls:
insecure: false
ca_file: "/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt"
auth:
authenticator: bearertokenauth
headers:
X-Scope-OrgID: "dev"
service:
extensions: [bearertokenauth]
pipelines:
traces:
receivers: [otlp, jaeger]
exporters: [otlp, debug]
EOF
- 查看 dev-collector 服务被创建。
$ oc get svc dev-collector -n observability
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
dev-collector ClusterIP 172.30.216.243 <none> 14250/TCP,4317/TCP,4318/TCP,14268/TCP,6831/UDP,6832/UDP 131m
- 获得 tempo-multitent-tempo-gateway 路由的访问地址,在访问后可获得使用方法提示。
$ oc get route tempo-multitent-tempo-gateway -ojsonpath={.spec.host} -n observability
tempo-multitent-tempo-gateway-observability.apps-crc.testing
$ curl -k https://$(oc get route tempo-multitent-tempo-gateway -ojsonpath={.spec.host} -n observability)
{
"paths": [
"/api/traces/v1/{tenant}/*",
"/{tenant}"
]
}
- 将 {tenant} 替换为 dev 即可,用浏览器打开
tempo-multitent-tempo-gateway-observability.apps-crc.testing/dev
,登录后可以看到 Jaeger UI 界面。
- 运行测试应用,它向
--otlp-endpoint=dev-collector.observability.svc:4317
参数指定的目标发送测试数据。说明:也可使用前面的 “测试应用1”。
$ oc apply -f - << EOF
apiVersion: batch/v1
kind: Job
metadata:
name: telemetrygen
namespace: observability
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: telemetrygen
image: ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:v0.74.0
args: [traces, --otlp-endpoint=dev-collector.observability.svc:4317, --otlp-insecure, --duration=1800s, --rate=4]
EOF
- 在 Jaeger 中可跟踪到应用执行情况。
参考
https://grafana.com/docs/tempo/latest/setup/operator/quickstart/
https://developers.redhat.com/articles/2024/08/14/introducing-tempo-monolithic-mode
https://developers.redhat.com/articles/2023/08/01/how-deploy-new-grafana-tempo-operator-openshift
https://github.com/rbaumgar/otelcol-demo-app/blob/main/OpenTelemetry_with_Tempo.md
https://github.com/grafana/tempo-operator/tree/main/tests/e2e-openshift/multitenancy
https://docs.redhat.com/en/documentation/openshift_container_platform/4.17/html-single/red_hat_build_of_opentelemetry/index
https://docs.redhat.com/en/documentation/openshift_container_platform/4.17/html-single/distributed_tracing/index#distributed-tracing-platform-tempo