搭建一个 SkyWalking 集群环境,步骤如下:
第一步,搭建一个 Elasticsearch 集群。
第二步, 搭建一个注册中心的集群。目前 SkyWalking 支持 Zookeeper、Kubernetes、Consul、Nacos 作为注册中心。
第三步,搭建一个 SkyWalking OAP 服务的集群, 将 SkyWalking OAP 服务注册到注册中心上。
第四步,启动一个 Spring Boot 应用,并配置 SkyWalking Agent。另外,在设置 SkyWaling Agent 的 SW_AGENT_COLLECTOR_BACKEND_SERVICES 地址时,需要设置多个 SkyWalking OAP 服务的地址数组。
第五步,搭建一个 SkyWalking UI 服务,另外,在设置 SkyWalking UI 的 collector.ribbon.listOfServers 地址时,也可设置多个 SkyWalking OAP 服务的地址数组。
软件版本
- apache-skywalking-apm-8.7.0.tar.gz
- apache-zookeeper-3.6.1-bin.tar.gz
- elasticsearch-6.5.4.tar.gz
- jdk-8u202-linux-x64.tar.gz
下载地址
https://archive.apache.org/dist/skywalking/8.7.0/apache-skywalking-apm-8.7.0.tar.gz
https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.5.4.tar.gz
http://archive.apache.org/dist/zookeeper/zookeeper-3.6.1/apache-zookeeper-3.6.1-bin.tar.gz
skywalking各端口作用:
- 5000:skywalking web UI
- 5005:skywalking grpc自身用,agent用
- 5006:skywalking web ui用来查询
部署步骤
JDK安装
# 部署jdk-8u202-linux-x64.tar.gz
$ tar zxf jdk-8u202-linux-x64.tar.gz -C /opt # 解压jdk安装包
# 添加环境变量
$ cat << EOF >> /etc/profile
export JAVA_HOME=/opt/jdk1.8.0_202
export JRE_HOME=/opt/jdk1.8.0_202/jre
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
EOF
$ source /etc/profile
$ java -version # 如出现如下则安装成功
openjdk version "1.8.0_202"
OpenJDK Runtime Environment (build 1.8.0_262-b10)
OpenJDK 64-Bit Server VM (build 25.262-b10, mixed mode)
ES 安装
略
zookeeper集群
三台机器分别部署,分别修改server.1/2/3的ip地址即可。
wget http://archive.apache.org/dist/zookeeper/zookeeper-3.6.1/apache-zookeeper-3.6.1-bin.tar.gz
tar zxf apache-zookeeper-3.6.1-bin.tar.gz -C /opt
mv apache-zookeeper-3.6.1-bin /opt/zookeeper
useradd -s /bin/bash -U zookeeper
mkdir -p /opt/data/zookeeper-data/
chown -Rf zookeeper.zookeeper /opt/data/zookeeper-data/
chown -Rf zookeeper.zookeeper /opt/zookeeper
echo '1' > /opt/data/zookeeper-data/myid
cat << EOF > /opt/zookeeper/conf/zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/opt/data/zookeeper-data
dataLogDir=/opt/zookeeper/logs
clientPort=2181
maxClientCnxns=60
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
server.1=10.0.33.193:2888:3888
server.2=10.0.33.194:2888:3888
server.3=10.0.33.195:2888:3888
EOF
cat << EOF > /usr/lib/systemd/system/zookeeper.service
[Unit]
Description=zookeeper.service
After=network.target
[Service]
User=zookeeper
Group=zookeeper
Type=forking
ExecStart=/opt/zookeeper/bin/zkServer.sh start
ExecStop=/opt/zookeeper/bin/zkServer.sh stop
ExecReload=/opt/zookeeper/bin/zkServer.sh restart
[Install]
EOF
systemctl start zookeeper.service
systemctl enable zookeeper.service
systemctl status zookeeper.service
安装skywalking
(启动skywalking前保证elastricsearch集群,zookeeper集群已启动成功)
- Backend(也称oap。包括core、cluster、storage、query配置)
- UI(也称webapp)
三台机器均部署
wget https://archive.apache.org/dist/skywalking/8.7.0/apache-skywalking-apm-8.7.0.tar.gz
tar -zxvf apache-skywalking-apm-8.7.0.tar.gz -C /opt
vim /opt/apache-skywalking-apm-bin/config/application.yml
cluster:
zookeeper: # 表示使用zookeeper,standalone表示使用单节点
nameSpace: ${SW_NAMESPACE:""}
hostPort: ${SW_CLUSTER_ZK_HOST_PORT:10.0.33.193:2182,10.0.33.194:2182,10.0.33.195:2182}
baseSleepTimeMs: ${SW_CLUSTER_ZK_SLEEP_TIME:1000} # initial amount of time to wait between retries
maxRetries: ${SW_CLUSTER_ZK_MAX_RETRIES:3} # max number of times to retry
enableACL: ${SW_ZK_ENABLE_ACL:false} # disable ACL in default
schema: ${SW_ZK_SCHEMA:digest} # only support digest schema
expression: ${SW_ZK_EXPRESSION:skywalking:skywalking}
core:
selector: ${SW_CORE:default}
default:
role: ${SW_CORE_ROLE:Mixed} # Mixed/Receiver/Aggregator
restHost: ${SW_CORE_REST_HOST:0.0.0.0}
restPort: ${SW_CORE_REST_PORT:12800}
restContextPath: ${SW_CORE_REST_CONTEXT_PATH:/}
restMinThreads: ${SW_CORE_REST_JETTY_MIN_THREADS:1}
restMaxThreads: ${SW_CORE_REST_JETTY_MAX_THREADS:200}
restIdleTimeOut: ${SW_CORE_REST_JETTY_IDLE_TIMEOUT:30000}
restAcceptorPriorityDelta: ${SW_CORE_REST_JETTY_DELTA:0}
restAcceptQueueSize: ${SW_CORE_REST_JETTY_QUEUE_SIZE:0}
httpMaxRequestHeaderSize: ${SW_CORE_HTTP_MAX_REQUEST_HEADER_SIZE:8192}
gRPCHost: ${SW_CORE_GRPC_HOST:0.0.0.0}
gRPCPort: ${SW_CORE_GRPC_PORT:5006}
maxConcurrentCallsPerConnection: ${SW_CORE_GRPC_MAX_CONCURRENT_CALL:0}
maxMessageSize: ${SW_CORE_GRPC_MAX_MESSAGE_SIZE:0}
gRPCThreadPoolQueueSize: ${SW_CORE_GRPC_POOL_QUEUE_SIZE:-1}
gRPCThreadPoolSize: ${SW_CORE_GRPC_THREAD_POOL_SIZE:-1}
gRPCSslEnabled: ${SW_CORE_GRPC_SSL_ENABLED:false}
gRPCSslKeyPath: ${SW_CORE_GRPC_SSL_KEY_PATH:""}
gRPCSslCertChainPath: ${SW_CORE_GRPC_SSL_CERT_CHAIN_PATH:""}
gRPCSslTrustedCAPath: ${SW_CORE_GRPC_SSL_TRUSTED_CA_PATH:""}
downsampling:
- Hour
- Day
enableDataKeeperExecutor: ${SW_CORE_ENABLE_DATA_KEEPER_EXECUTOR:true} #
dataKeeperExecutePeriod: ${SW_CORE_DATA_KEEPER_EXECUTE_PERIOD:5} #
recordDataTTL: ${SW_CORE_RECORD_DATA_TTL:3} # Unit is day
metricsDataTTL: ${SW_CORE_METRICS_DATA_TTL:7} # Unit is day
l1FlushPeriod: ${SW_CORE_L1_AGGREGATION_FLUSH_PERIOD:500}
storageSessionTimeout: ${SW_CORE_STORAGE_SESSION_TIMEOUT:70000}
enableDatabaseSession: ${SW_CORE_ENABLE_DATABASE_SESSION:true}
topNReportPeriod: ${SW_CORE_TOPN_REPORT_PERIOD:10} # top_n record worker report cycle, unit is minute
activeExtraModelColumns: ${SW_CORE_ACTIVE_EXTRA_MODEL_COLUMNS:false}
serviceNameMaxLength: ${SW_SERVICE_NAME_MAX_LENGTH:70}
instanceNameMaxLength: ${SW_INSTANCE_NAME_MAX_LENGTH:70}
endpointNameMaxLength: ${SW_ENDPOINT_NAME_MAX_LENGTH:150}
searchableTracesTags: ${SW_SEARCHABLE_TAG_KEYS:http.method,status_code,db.type,db.instance,mq.queue,mq.topic,mq.broker}
searchableLogsTags: ${SW_SEARCHABLE_LOGS_TAG_KEYS:level}
searchableAlarmTags: ${SW_SEARCHABLE_ALARM_TAG_KEYS:level}
prepareThreads: ${SW_CORE_PREPARE_THREADS:2}
enableEndpointNameGroupingByOpenapi: ${SW_CORE_ENABLE_ENDPOINT_NAME_GROUPING_BY_OPAENAPI:true}
storage:
elasticsearch:
nameSpace: ${SW_NAMESPACE:"sk-es-new"}
clusterNodes: ${SW_STORAGE_ES_CLUSTER_NODES:10.0.63.22:5010,10.0.63.23:5011,10.0.63.24:5012}
protocol: ${SW_STORAGE_ES_HTTP_PROTOCOL:"http"}
connectTimeout: ${SW_STORAGE_ES_CONNECT_TIMEOUT:500}
socketTimeout: ${SW_STORAGE_ES_SOCKET_TIMEOUT:30000}
user: ${SW_ES_USER:""}
password: ${SW_ES_PASSWORD:""}
trustStorePath: ${SW_STORAGE_ES_SSL_JKS_PATH:""}
trustStorePass: ${SW_STORAGE_ES_SSL_JKS_PASS:""}
secretsManagementFile: ${SW_ES_SECRETS_MANAGEMENT_FILE:""} #
dayStep: ${SW_STORAGE_DAY_STEP:1} # Represent the number of days in the one minute/hour/day index.
indexShardsNumber: ${SW_STORAGE_ES_INDEX_SHARDS_NUMBER:1} # Shard number of new indexes
indexReplicasNumber: ${SW_STORAGE_ES_INDEX_REPLICAS_NUMBER:1} # Replicas number of new indexes
superDatasetDayStep: ${SW_SUPERDATASET_STORAGE_DAY_STEP:-1}
superDatasetIndexShardsFactor: ${SW_STORAGE_ES_SUPER_DATASET_INDEX_SHARDS_FACTOR:5} #
superDatasetIndexReplicasNumber: ${SW_STORAGE_ES_SUPER_DATASET_INDEX_REPLICAS_NUMBER:0} #
indexTemplateOrder: ${SW_STORAGE_ES_INDEX_TEMPLATE_ORDER:0} # the order of index template
bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:5000} #
flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:15}
concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:2} # the number of concurrent requests
resultWindowMaxSize: ${SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE:10000}
metadataQueryMaxSize: ${SW_STORAGE_ES_QUERY_MAX_SIZE:5000}
segmentQueryMaxSize: ${SW_STORAGE_ES_QUERY_SEGMENT_SIZE:200}
profileTaskQueryMaxSize: ${SW_STORAGE_ES_QUERY_PROFILE_TASK_SIZE:200}
oapAnalyzer: ${SW_STORAGE_ES_OAP_ANALYZER:"{\"analyzer\":{\"oap_analyzer\":{\"type\":\"stop\"}}}"} # the oap analyzer.
oapLogAnalyzer: ${SW_STORAGE_ES_OAP_LOG_ANALYZER:"{\"analyzer\":{\"oap_log_analyzer\":{\"type\":\"standard\"}}}"} # the oap log analyzer. It could be customized by the ES analyzer configuration to support more language log formats, such as Chinese log, Japanese log and etc.
advanced: ${SW_STORAGE_ES_ADVANCED:""}
agent-analyzer:
selector: ${SW_AGENT_ANALYZER:default}
default:
sampleRate: ${SW_TRACE_SAMPLE_RATE:10000} #
slowDBAccessThreshold: ${SW_SLOW_DB_THRESHOLD:default:200,mongodb:100} #
forceSampleErrorSegment: ${SW_FORCE_SAMPLE_ERROR_SEGMENT:true} #
segmentStatusAnalysisStrategy: ${SW_SEGMENT_STATUS_ANALYSIS_STRATEGY:FROM_SPAN_STATUS}
noUpstreamRealAddressAgents: ${SW_NO_UPSTREAM_REAL_ADDRESS:6000,9000}
slowTraceSegmentThreshold: ${SW_SLOW_TRACE_SEGMENT_THRESHOLD:-1} #
meterAnalyzerActiveFiles: ${SW_METER_ANALYZER_ACTIVE_FILES:} #
log-analyzer:
selector: ${SW_LOG_ANALYZER:default}
default:
lalFiles: ${SW_LOG_LAL_FILES:default}
malFiles: ${SW_LOG_MAL_FILES:""}
event-analyzer:
selector: ${SW_EVENT_ANALYZER:default}
default:
receiver-sharing-server:
selector: ${SW_RECEIVER_SHARING_SERVER:default}
default:
# For Jetty server
restHost: ${SW_RECEIVER_SHARING_REST_HOST:0.0.0.0}
restPort: ${SW_RECEIVER_SHARING_REST_PORT:0}
restContextPath: ${SW_RECEIVER_SHARING_REST_CONTEXT_PATH:/}
restMinThreads: ${SW_RECEIVER_SHARING_JETTY_MIN_THREADS:1}
restMaxThreads: ${SW_RECEIVER_SHARING_JETTY_MAX_THREADS:200}
restIdleTimeOut: ${SW_RECEIVER_SHARING_JETTY_IDLE_TIMEOUT:30000}
restAcceptorPriorityDelta: ${SW_RECEIVER_SHARING_JETTY_DELTA:0}
restAcceptQueueSize: ${SW_RECEIVER_SHARING_JETTY_QUEUE_SIZE:0}
httpMaxRequestHeaderSize: ${SW_RECEIVER_SHARING_HTTP_MAX_REQUEST_HEADER_SIZE:8192}
# For gRPC server
gRPCHost: ${SW_RECEIVER_GRPC_HOST:0.0.0.0}
gRPCPort: ${SW_RECEIVER_GRPC_PORT:0}
maxConcurrentCallsPerConnection: ${SW_RECEIVER_GRPC_MAX_CONCURRENT_CALL:0}
maxMessageSize: ${SW_RECEIVER_GRPC_MAX_MESSAGE_SIZE:0}
gRPCThreadPoolQueueSize: ${SW_RECEIVER_GRPC_POOL_QUEUE_SIZE:0}
gRPCThreadPoolSize: ${SW_RECEIVER_GRPC_THREAD_POOL_SIZE:0}
gRPCSslEnabled: ${SW_RECEIVER_GRPC_SSL_ENABLED:false}
gRPCSslKeyPath: ${SW_RECEIVER_GRPC_SSL_KEY_PATH:""}
gRPCSslCertChainPath: ${SW_RECEIVER_GRPC_SSL_CERT_CHAIN_PATH:""}
authentication: ${SW_AUTHENTICATION:""}
receiver-register:
selector: ${SW_RECEIVER_REGISTER:default}
default:
receiver-trace:
selector: ${SW_RECEIVER_TRACE:default}
default:
receiver-jvm:
selector: ${SW_RECEIVER_JVM:default}
default:
receiver-clr:
selector: ${SW_RECEIVER_CLR:default}
default:
receiver-profile:
selector: ${SW_RECEIVER_PROFILE:default}
default:
receiver-zabbix:
selector: ${SW_RECEIVER_ZABBIX:-}
default:
port: ${SW_RECEIVER_ZABBIX_PORT:10051}
host: ${SW_RECEIVER_ZABBIX_HOST:0.0.0.0}
activeFiles: ${SW_RECEIVER_ZABBIX_ACTIVE_FILES:agent}
service-mesh:
selector: ${SW_SERVICE_MESH:default}
default:
envoy-metric:
selector: ${SW_ENVOY_METRIC:default}
default:
acceptMetricsService: ${SW_ENVOY_METRIC_SERVICE:true}
alsHTTPAnalysis: ${SW_ENVOY_METRIC_ALS_HTTP_ANALYSIS:""}
alsTCPAnalysis: ${SW_ENVOY_METRIC_ALS_TCP_ANALYSIS:""}
k8sServiceNameRule: ${K8S_SERVICE_NAME_RULE:"${pod.metadata.labels.(service.istio.io/canonical-name)}"}
prometheus-fetcher:
selector: ${SW_PROMETHEUS_FETCHER:-}
default:
enabledRules: ${SW_PROMETHEUS_FETCHER_ENABLED_RULES:"self"}
maxConvertWorker: ${SW_PROMETHEUS_FETCHER_NUM_CONVERT_WORKER:-1}
kafka-fetcher:
selector: ${SW_KAFKA_FETCHER:-}
default:
bootstrapServers: ${SW_KAFKA_FETCHER_SERVERS:localhost:9092}
namespace: ${SW_NAMESPACE:""}
partitions: ${SW_KAFKA_FETCHER_PARTITIONS:3}
replicationFactor: ${SW_KAFKA_FETCHER_PARTITIONS_FACTOR:2}
enableNativeProtoLog: ${SW_KAFKA_FETCHER_ENABLE_NATIVE_PROTO_LOG:false}
enableNativeJsonLog: ${SW_KAFKA_FETCHER_ENABLE_NATIVE_JSON_LOG:false}
isSharding: ${SW_KAFKA_FETCHER_IS_SHARDING:false}
consumePartitions: ${SW_KAFKA_FETCHER_CONSUME_PARTITIONS:""}
kafkaHandlerThreadPoolSize: ${SW_KAFKA_HANDLER_THREAD_POOL_SIZE:-1}
kafkaHandlerThreadPoolQueueSize: ${SW_KAFKA_HANDLER_THREAD_POOL_QUEUE_SIZE:-1}
receiver-meter:
selector: ${SW_RECEIVER_METER:default}
default:
receiver-otel:
selector: ${SW_OTEL_RECEIVER:-}
default:
enabledHandlers: ${SW_OTEL_RECEIVER_ENABLED_HANDLERS:"oc"}
enabledOcRules: ${SW_OTEL_RECEIVER_ENABLED_OC_RULES:"istio-controlplane"}
receiver_zipkin:
selector: ${SW_RECEIVER_ZIPKIN:-}
default:
host: ${SW_RECEIVER_ZIPKIN_HOST:0.0.0.0}
port: ${SW_RECEIVER_ZIPKIN_PORT:9411}
contextPath: ${SW_RECEIVER_ZIPKIN_CONTEXT_PATH:/}
jettyMinThreads: ${SW_RECEIVER_ZIPKIN_JETTY_MIN_THREADS:1}
jettyMaxThreads: ${SW_RECEIVER_ZIPKIN_JETTY_MAX_THREADS:200}
jettyIdleTimeOut: ${SW_RECEIVER_ZIPKIN_JETTY_IDLE_TIMEOUT:30000}
jettyAcceptorPriorityDelta: ${SW_RECEIVER_ZIPKIN_JETTY_DELTA:0}
jettyAcceptQueueSize: ${SW_RECEIVER_ZIPKIN_QUEUE_SIZE:0}
instanceNameRule: ${SW_RECEIVER_ZIPKIN_INSTANCE_NAME_RULE:[spring.instance_id,node_id]}
receiver_jaeger:
selector: ${SW_RECEIVER_JAEGER:-}
default:
gRPCHost: ${SW_RECEIVER_JAEGER_HOST:0.0.0.0}
gRPCPort: ${SW_RECEIVER_JAEGER_PORT:14250}
receiver-browser:
selector: ${SW_RECEIVER_BROWSER:default}
default:
sampleRate: ${SW_RECEIVER_BROWSER_SAMPLE_RATE:10000}
receiver-log:
selector: ${SW_RECEIVER_LOG:default}
default:
query:
selector: ${SW_QUERY:graphql}
graphql:
path: ${SW_QUERY_GRAPHQL_PATH:/graphql}
alarm:
selector: ${SW_ALARM:default}
default:
telemetry:
selector: ${SW_TELEMETRY:none}
none:
prometheus:
host: ${SW_TELEMETRY_PROMETHEUS_HOST:0.0.0.0}
port: ${SW_TELEMETRY_PROMETHEUS_PORT:1234}
sslEnabled: ${SW_TELEMETRY_PROMETHEUS_SSL_ENABLED:false}
sslKeyPath: ${SW_TELEMETRY_PROMETHEUS_SSL_KEY_PATH:""}
sslCertChainPath: ${SW_TELEMETRY_PROMETHEUS_SSL_CERT_CHAIN_PATH:""}
configuration:
zookeeper: # 表示使用zookeeper做高可用
period: ${SW_CONFIG_ZK_PERIOD:60} # Unit seconds, sync period. Default fetch every 60 seconds.
nameSpace: ${SW_CONFIG_ZK_NAMESPACE:/default}
hostPort: ${SW_CONFIG_ZK_HOST_PORT:localhost:2182}
# Retry Policy
baseSleepTimeMs: ${SW_CONFIG_ZK_BASE_SLEEP_TIME_MS:1000} # initial amount of time to wait between retries
maxRetries: ${SW_CONFIG_ZK_MAX_RETRIES:3} # max number of times to retry
exporter:
selector: ${SW_EXPORTER:-}
grpc:
targetHost: ${SW_EXPORTER_GRPC_HOST:127.0.0.1}
targetPort: ${SW_EXPORTER_GRPC_PORT:9870}
health-checker:
selector: ${SW_HEALTH_CHECKER:-}
default:
checkIntervalSeconds: ${SW_HEALTH_CHECKER_INTERVAL_SECONDS:5}
configuration-discovery:
selector: ${SW_CONFIGURATION_DISCOVERY:default}
default:
disableMessageDigest: ${SW_DISABLE_MESSAGE_DIGEST:false}
receiver-event:
selector: ${SW_RECEIVER_EVENT:default}
default:
skywalking web ui配置文件修改
cat > /opt/apache-skywalking-apm-bin/webapp/webapp.yml << EOF
server:
port: 5000
spring:
cloud:
gateway:
routes:
- id: oap-route
uri: lb://oap-service
predicates:
- Path=/graphql/**
discovery:
client:
simple:
instances:
oap-service:
- uri: http://10.0.33.193:12800
- uri: http://10.0.33.194:12800
- uri: http://10.0.33.195:12800
mvc:
throw-exception-if-no-handler-found: true
web:
resources:
add-mappings: true
management:
server:
base-path: /manage
启动
/opt/apache-skywalking-apm-bin/bin/startup.sh
SkyWalking OAP started successfully!
SkyWalking Web Application started successfully!
测试
java
-javaagent:/opt/apache-skywalking-apm-bin/agent/skywalking-agent.jar
-Dskywalking.agent.namespace=prod
-Dskywalking.agent.service_name=spring-boot-docker
-Dskywalking.collector.backend_service=10.0.33.193:5006,10.0.33.194:5006,10.0.33.195:5006
-jar /root/spring-boot-docker/target/spring-boot-docker-0.1.0.jar
客户端配置
在各个微服务侧配配置java 启动参数:
-Dskywalking.collector.backend_service=10.0.33.193:5006,10.0.33.194:5006,10.0.33.195:5006