SkyWalking链路追踪上下文TraceContext的traceId生成的实现原理剖析

结论先行

【结论】
SkyWalking通过字节码增强技术实现,结合依赖注入和控制反转思想,以SkyWalking方式将追踪身份traceId编织到链路追踪上下文TraceContext中。

是不是很有趣,很有意思!!!

【收获】
skywalking-agent启用的插件列表plugins/要有所取舍与衡量,组件开启的越多对链路追踪和拓扑的越复杂,影响面越大,未知不可控的因素也会增多。

背景

发现问题

生产环境,发现同一个链路追踪traceId出现在不同时间段的N个请求,都串在一起,影响链路追踪复原和拓扑展示。

@Configuration
public class ThreadPoolConfig {

    @Bean(name = "eventThreadPool")
    public ThreadPoolExecutor commonThreadPool() {
//        int corePoolSize = Runtime.getRuntime().availableProcessors();
        ThreadPoolExecutor executor = new ThreadPoolExecutor(
                1, // 分析问题时有意设置的,让问题能100%复现
                1,
                1,
                TimeUnit.SECONDS,
                new ArrayBlockingQueue<>(50000),
                new NamedThreadFactory("wanda_event"),
                new ThreadPoolExecutor.CallerRunsPolicy());
        return executor;
    }
}

分析问题

我们需要找出线程池的线程中的追踪身份traceId是怎么生成的?

【说明】

  • 使用的skywalking-agent.jar版本是8.13.0,使用默认的插件列表plugins/配置,包括apm-guava-eventbus-plugin
  • 没有启用引导插件列表bootstrap-plugins/,将其复制到plugins/,包括apm-jdk-threadpool-plugin,SkyWalking默认不启用引导插件列表,因为其影响面较大,对应用性能和追踪数据都可能产生较大影响;

【思考】

  • 追踪身份traceId是在请求根节点创建,且不可变,后续在请求生命周期中都是透传。所以,抓住生成traceId的源头很关键
  • 生成traceId的源头在哪里?需要从实现层面掌握traceId生成逻辑
  • 一个应用实例中包含很多线程,还需考虑生成traceId线程名称

综上所述,以新的追踪身份traceId生成 + 线程名称作为核心排查思路

追踪身份traceId生成的实现原理剖析

org.apache.skywalking:java-agent:9.1.0
以当前最新版本v9.1.0源代码作为剖析对象,两个版本的代码几乎一样。

TraceContext.traceId()

org.apache.skywalking.apm.toolkit.trace.TraceContext#traceId

请求链路追踪上下文TraceContext,调用TraceContext.traceId()获取追踪身份traceId

package org.apache.skywalking.apm.toolkit.trace;

import java.util.Optional;

/**
 * Try to access the sky-walking tracer context. The context is not existed, always. only the middleware, component, or
 * rpc-framework are supported in the current invoke stack, in the same thread, the context will be available.
 * <p>
 */
public class TraceContext {

    /**
     * Try to get the traceId of current trace context.
     * 尝试获取当前追踪上下文的追踪身份traceId
     *
     * @return traceId, if it exists, or empty {@link String}.
     */
    public static String traceId() {
        return "";
    }

    /**
     * Try to get the segmentId of current trace context.
     *
     * @return segmentId, if it exists, or empty {@link String}.
     */
    public static String segmentId() {
        return "";
    }

    /**
     * Try to get the spanId of current trace context. The spanId is a negative number when the trace context is
     * missing.
     *
     * @return spanId, if it exists, or empty {@link String}.
     */
    public static int spanId() {
        return -1;
    }

    /**
     * Try to get the custom value from trace context.
     *
     * @return custom data value.
     */
    public static Optional<String> getCorrelation(String key) {
        return Optional.empty();
    }

    /**
     * Put the custom key/value into trace context.
     *
     * @return previous value if it exists.
     */
    public static Optional<String> putCorrelation(String key, String value) {
        return Optional.empty();
    }

}

1.链路追踪上下文的traceId是如何设置进去的?

在GitHub skywalking:java-agent项目仓库里搜索org.apache.skywalking.apm.toolkit.trace.TraceContext
repo:apache/skywalking-java org.apache.skywalking.apm.toolkit.trace.TraceContext language:Java
在这里插入图片描述
在IDEA skywalking:java-agent项目源代码里搜索org.apache.skywalking.apm.toolkit.trace.TraceContext
在这里插入图片描述

【结论】
SkyWalking通过字节码增强技术实现,结合依赖注入和控制反转思想,以SkyWalking方式将追踪身份traceId编织到链路追踪上下文TraceContext中。

数据更新是不是又多了一种实现方式。。。
在这里插入图片描述

TraceContextActivation

org.apache.skywalking.apm.toolkit.activation.trace.TraceContextActivation

链路追踪上下文激活TraceContextActivation,通过TraceIDInterceptor拦截TraceContext.traceId(),将追踪身份traceId设置到链路追踪上下文TraceContext

package org.apache.skywalking.apm.toolkit.activation.trace;

import net.bytebuddy.description.method.MethodDescription;
import net.bytebuddy.matcher.ElementMatcher;
import org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.ClassStaticMethodsEnhancePluginDefine;
import org.apache.skywalking.apm.agent.core.plugin.match.ClassMatch;
import org.apache.skywalking.apm.agent.core.plugin.match.NameMatch;
import org.apache.skywalking.apm.agent.core.plugin.interceptor.StaticMethodsInterceptPoint;

import static net.bytebuddy.matcher.ElementMatchers.named;

/**
 * Active the toolkit class "TraceContext". Should not dependency or import any class in
 * "skywalking-toolkit-trace-context" module. Activation's classloader is diff from "TraceContext", using direct will
 * trigger classloader issue.
 * <p>
 */
public class TraceContextActivation extends ClassStaticMethodsEnhancePluginDefine {

    // 追踪身份traceId拦截类
    public static final String TRACE_ID_INTERCEPT_CLASS = "org.apache.skywalking.apm.toolkit.activation.trace.TraceIDInterceptor";
    public static final String SEGMENT_ID_INTERCEPT_CLASS = "org.apache.skywalking.apm.toolkit.activation.trace.SegmentIDInterceptor";
    public static final String SPAN_ID_INTERCEPT_CLASS = "org.apache.skywalking.apm.toolkit.activation.trace.SpanIDInterceptor";
    // 增强类-追踪上下文
    public static final String ENHANCE_CLASS = "org.apache.skywalking.apm.toolkit.trace.TraceContext";
    // 获取追踪身份traceId的静态方法名称
    public static final String ENHANCE_TRACE_ID_METHOD = "traceId";
    public static final String ENHANCE_SEGMENT_ID_METHOD = "segmentId";
    public static final String ENHANCE_SPAN_ID_METHOD = "spanId";
    public static final String ENHANCE_GET_CORRELATION_METHOD = "getCorrelation";
    public static final String INTERCEPT_GET_CORRELATION_CLASS = "org.apache.skywalking.apm.toolkit.activation.trace.CorrelationContextGetInterceptor";
    public static final String ENHANCE_PUT_CORRELATION_METHOD = "putCorrelation";
    public static final String INTERCEPT_PUT_CORRELATION_CLASS = "org.apache.skywalking.apm.toolkit.activation.trace.CorrelationContextPutInterceptor";

    /**
     * @return the target class, which needs active.
     */
    @Override
    protected ClassMatch enhanceClass() {
        // 增强类
        return NameMatch.byName(ENHANCE_CLASS);
    }

    /**
     * @return the collection of {@link StaticMethodsInterceptPoint}, represent the intercepted methods and their
     * interceptors.
     */
    @Override
    public StaticMethodsInterceptPoint[] getStaticMethodsInterceptPoints() {
        // 静态方法拦截点
        return new StaticMethodsInterceptPoint[] {
            new StaticMethodsInterceptPoint() {
                @Override
                public ElementMatcher<MethodDescription> getMethodsMatcher() {
                    // 获取追踪身份traceId的静态方法名称
                    return named(ENHANCE_TRACE_ID_METHOD);
                }

                @Override
                public String getMethodsInterceptor() {
                    // 追踪身份traceId拦截类
                    return TRACE_ID_INTERCEPT_CLASS;
                }

                @Override
                public boolean isOverrideArgs() {
                    return false;
                }
            },
            // ...
        };
    }
}

TraceIDInterceptor

org.apache.skywalking.apm.toolkit.activation.trace.TraceIDInterceptor

追踪身份拦截器TraceIDInterceptor,调用ContextManager.getGlobalTraceId()获取追踪身份traceId,将其返回给TraceContext.traceId()
在这里插入图片描述

package org.apache.skywalking.apm.toolkit.activation.trace;

import java.lang.reflect.Method;
import org.apache.skywalking.apm.agent.core.logging.api.ILog;
import org.apache.skywalking.apm.agent.core.logging.api.LogManager;
import org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.StaticMethodsAroundInterceptor;
import org.apache.skywalking.apm.agent.core.context.ContextManager;
import org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.MethodInterceptResult;

public class TraceIDInterceptor implements StaticMethodsAroundInterceptor {

    private static final ILog LOGGER = LogManager.getLogger(TraceIDInterceptor.class);

    @Override
    public void beforeMethod(Class clazz, Method method, Object[] allArguments, Class<?>[] parameterTypes,
        MethodInterceptResult result) {
        // 获取第一个全局追踪身份traceId,将其定义为方法返回值
        result.defineReturnValue(ContextManager.getGlobalTraceId());
    }

    @Override
    public Object afterMethod(Class clazz, Method method, Object[] allArguments, Class<?>[] parameterTypes,
        Object ret) {
        // 返回追踪身份traceId
        return ret;
    }

    @Override
    public void handleMethodException(Class clazz, Method method, Object[] allArguments, Class<?>[] parameterTypes,
        Throwable t) {
        LOGGER.error("Failed to getDefault trace Id.", t);
    }
}

ContextManager.getGlobalTraceId()

org.apache.skywalking.apm.agent.core.context.ContextManager#getGlobalTraceId

链路追踪上下文管理器ContextManager
ContextManager.getGlobalTraceId()是获取第一个全局追踪身份traceId,其调用AbstractTracerContext.getReadablePrimaryTraceId()获取全局追踪身份traceId
在这里插入图片描述

package org.apache.skywalking.apm.agent.core.context;

import java.util.Objects;
import org.apache.skywalking.apm.agent.core.boot.BootService;
import org.apache.skywalking.apm.agent.core.boot.ServiceManager;
import org.apache.skywalking.apm.agent.core.context.trace.AbstractSpan;
import org.apache.skywalking.apm.agent.core.context.trace.TraceSegment;
import org.apache.skywalking.apm.agent.core.logging.api.ILog;
import org.apache.skywalking.apm.agent.core.logging.api.LogManager;
import org.apache.skywalking.apm.agent.core.sampling.SamplingService;
import org.apache.skywalking.apm.util.StringUtil;

import static org.apache.skywalking.apm.agent.core.conf.Config.Agent.OPERATION_NAME_THRESHOLD;

/**
 * {@link ContextManager} controls the whole context of {@link TraceSegment}. Any {@link TraceSegment} relates to
 * single-thread, so this context use {@link ThreadLocal} to maintain the context, and make sure, since a {@link
 * TraceSegment} starts, all ChildOf spans are in the same context. <p> What is 'ChildOf'?
 * https://github.com/opentracing/specification/blob/master/specification.md#references-between-spans
 *
 * <p> Also, {@link ContextManager} delegates to all {@link AbstractTracerContext}'s major methods.
 */
public class ContextManager implements BootService {
    private static final String EMPTY_TRACE_CONTEXT_ID = "N/A";
    private static final ILog LOGGER = LogManager.getLogger(ContextManager.class);
    // 追踪上下文的线程本地变量
    private static ThreadLocal<AbstractTracerContext> CONTEXT = new ThreadLocal<AbstractTracerContext>();
    private static ThreadLocal<RuntimeContext> RUNTIME_CONTEXT = new ThreadLocal<RuntimeContext>();
    private static ContextManagerExtendService EXTEND_SERVICE;

    private static AbstractTracerContext getOrCreate(String operationName, boolean forceSampling) {
        AbstractTracerContext context = CONTEXT.get();
        if (context == null) {
            if (StringUtil.isEmpty(operationName)) {
                if (LOGGER.isDebugEnable()) {
                    LOGGER.debug("No operation name, ignore this trace.");
                }
                context = new IgnoredTracerContext();
            } else {
                if (EXTEND_SERVICE == null) {
                    EXTEND_SERVICE = ServiceManager.INSTANCE.findService(ContextManagerExtendService.class);
                }
                context = EXTEND_SERVICE.createTraceContext(operationName, forceSampling);

            }
            CONTEXT.set(context);
        }
        return context;
    }

    /**
     * 获取第一个全局追踪身份traceId
     * @return the first global trace id when tracing. Otherwise, "N/A".
     */
    public static String getGlobalTraceId() {
        // 追踪上下文
        AbstractTracerContext context = CONTEXT.get();
        // 获取全局追踪身份traceId
        return Objects.nonNull(context) ? context.getReadablePrimaryTraceId() : EMPTY_TRACE_CONTEXT_ID;
    }

    /**
     * @return the current segment id when tracing. Otherwise, "N/A".
     */
    public static String getSegmentId() {
        AbstractTracerContext context = CONTEXT.get();
        return Objects.nonNull(context) ? context.getSegmentId() : EMPTY_TRACE_CONTEXT_ID;
    }

    /**
     * @return the current span id when tracing. Otherwise, the value is -1.
     */
    public static int getSpanId() {
        AbstractTracerContext context = CONTEXT.get();
        return Objects.nonNull(context) ? context.getSpanId() : -1;
    }

    // ...

}

AbstractTracerContext.getReadablePrimaryTraceId()

org.apache.skywalking.apm.agent.core.context.AbstractTracerContext#getReadablePrimaryTraceId

追踪上下文定义接口AbstractTracerContext
本方法获取全局追踪身份traceId

package org.apache.skywalking.apm.agent.core.context;

import org.apache.skywalking.apm.agent.core.context.trace.AbstractSpan;

/**
 * The <code>AbstractTracerContext</code> represents the tracer context manager.
 * 表示追踪上下文管理器
 */
public interface AbstractTracerContext {
    /**
     * Get the global trace id, if needEnhance. How to build, depends on the implementation.
     * 获取全局追踪身份traceId
     *
     * @return the string represents the id.
     */
    String getReadablePrimaryTraceId();

    /**
     * Prepare for the cross-process propagation. How to initialize the carrier, depends on the implementation.
     *
     * @param carrier to carry the context for crossing process.
     */
    void inject(ContextCarrier carrier);

    /**
     * Build the reference between this segment and a cross-process segment. How to build, depends on the
     * implementation.
     *
     * @param carrier carried the context from a cross-process segment.
     */
    void extract(ContextCarrier carrier);

    /**
     * Capture a snapshot for cross-thread propagation. It's a similar concept with ActiveSpan.Continuation in
     * OpenTracing-java How to build, depends on the implementation.
     *
     * @return the {@link ContextSnapshot} , which includes the reference context.
     */
    ContextSnapshot capture();

    /**
     * Build the reference between this segment and a cross-thread segment. How to build, depends on the
     * implementation.
     *
     * @param snapshot from {@link #capture()} in the parent thread.
     */
    void continued(ContextSnapshot snapshot);

    /**
     * Get the current segment id, if needEnhance. How to build, depends on the implementation.
     *
     * @return the string represents the id.
     */
    String getSegmentId();

    /**
     * Get the active span id, if needEnhance. How to build, depends on the implementation.
     *
     * @return the string represents the id.
     */
    int getSpanId();

    /**
     * Create an entry span
     *
     * @param operationName most likely a service name
     * @return the span represents an entry point of this segment.
     */
    AbstractSpan createEntrySpan(String operationName);

    /**
     * Create a local span
     *
     * @param operationName most likely a local method signature, or business name.
     * @return the span represents a local logic block.
     */
    AbstractSpan createLocalSpan(String operationName);

    /**
     * Create an exit span
     *
     * @param operationName most likely a service name of remote
     * @param remotePeer    the network id(ip:port, hostname:port or ip1:port1,ip2,port, etc.). Remote peer could be set
     *                      later, but must be before injecting.
     * @return the span represent an exit point of this segment.
     */
    AbstractSpan createExitSpan(String operationName, String remotePeer);

    /**
     * @return the active span of current tracing context(stack)
     */
    AbstractSpan activeSpan();

    /**
     * Finish the given span, and the given span should be the active span of current tracing context(stack)
     *
     * @param span to finish
     * @return true when context should be clear.
     */
    boolean stopSpan(AbstractSpan span);

    /**
     * Notify this context, current span is going to be finished async in another thread.
     *
     * @return The current context
     */
    AbstractTracerContext awaitFinishAsync();

    /**
     * The given span could be stopped officially.
     *
     * @param span to be stopped.
     */
    void asyncStop(AsyncSpan span);

    /**
     * Get current correlation context
     */
    CorrelationContext getCorrelationContext();

    /**
     * Get current primary endpoint name
     */
    String getPrimaryEndpointName();
}

AbstractTracerContext有两个子类IgnoredTracerContextTracingContext

IgnoredTracerContext.getReadablePrimaryTraceId()

org.apache.skywalking.apm.agent.core.context.IgnoredTracerContext#getReadablePrimaryTraceId

可忽略的追踪上下文IgnoredTracerContext
本方法返回"Ignored_Trace"

package org.apache.skywalking.apm.agent.core.context;

import java.util.LinkedList;
import java.util.List;
import org.apache.skywalking.apm.agent.core.context.trace.AbstractSpan;
import org.apache.skywalking.apm.agent.core.context.trace.NoopSpan;
import org.apache.skywalking.apm.agent.core.profile.ProfileStatusContext;

/**
 * The <code>IgnoredTracerContext</code> represent a context should be ignored. So it just maintains the stack with an
 * integer depth field.
 * <p>
 * All operations through this will be ignored, and keep the memory and gc cost as low as possible.
 */
public class IgnoredTracerContext implements AbstractTracerContext {
    private static final NoopSpan NOOP_SPAN = new NoopSpan();
    private static final String IGNORE_TRACE = "Ignored_Trace";

    private final CorrelationContext correlationContext;
    private final ExtensionContext extensionContext;
    private final ProfileStatusContext profileStatusContext;

    private int stackDepth;

    public IgnoredTracerContext() {
        this.stackDepth = 0;
        this.correlationContext = new CorrelationContext();
        this.extensionContext = new ExtensionContext();
        this.profileStatusContext = ProfileStatusContext.createWithNone();
    }

    // ...

    @Override
    public String getReadablePrimaryTraceId() {
        // 获取全局追踪身份traceId
        return IGNORE_TRACE;
    }

    @Override
    public String getSegmentId() {
        return IGNORE_TRACE;
    }

    @Override
    public int getSpanId() {
        return -1;
    }

    // ...

}

TracingContext.getReadablePrimaryTraceId()

org.apache.skywalking.apm.agent.core.context.TracingContext#getReadablePrimaryTraceId

链路追踪上下文TracingContext
本方法返回DistributedTraceIdid字段属性
在这里插入图片描述

package org.apache.skywalking.apm.agent.core.context;

import java.util.LinkedList;
import java.util.List;
import java.util.concurrent.atomic.AtomicIntegerFieldUpdater;
import java.util.concurrent.locks.ReentrantLock;
import org.apache.skywalking.apm.agent.core.boot.ServiceManager;
import org.apache.skywalking.apm.agent.core.conf.Config;
import org.apache.skywalking.apm.agent.core.conf.dynamic.watcher.SpanLimitWatcher;
import org.apache.skywalking.apm.agent.core.context.ids.DistributedTraceId;
import org.apache.skywalking.apm.agent.core.context.ids.PropagatedTraceId;
import org.apache.skywalking.apm.agent.core.context.trace.AbstractSpan;
import org.apache.skywalking.apm.agent.core.context.trace.AbstractTracingSpan;
import org.apache.skywalking.apm.agent.core.context.trace.EntrySpan;
import org.apache.skywalking.apm.agent.core.context.trace.ExitSpan;
import org.apache.skywalking.apm.agent.core.context.trace.ExitTypeSpan;
import org.apache.skywalking.apm.agent.core.context.trace.LocalSpan;
import org.apache.skywalking.apm.agent.core.context.trace.NoopExitSpan;
import org.apache.skywalking.apm.agent.core.context.trace.NoopSpan;
import org.apache.skywalking.apm.agent.core.context.trace.TraceSegment;
import org.apache.skywalking.apm.agent.core.context.trace.TraceSegmentRef;
import org.apache.skywalking.apm.agent.core.logging.api.ILog;
import org.apache.skywalking.apm.agent.core.logging.api.LogManager;
import org.apache.skywalking.apm.agent.core.profile.ProfileStatusContext;
import org.apache.skywalking.apm.agent.core.profile.ProfileTaskExecutionService;
import org.apache.skywalking.apm.util.StringUtil;

import static org.apache.skywalking.apm.agent.core.conf.Config.Agent.CLUSTER;

/**
 * The <code>TracingContext</code> represents a core tracing logic controller. It build the final {@link
 * TracingContext}, by the stack mechanism, which is similar with the codes work.
 * <p>
 * In opentracing concept, it means, all spans in a segment tracing context(thread) are CHILD_OF relationship, but no
 * FOLLOW_OF.
 * <p>
 * In skywalking core concept, FOLLOW_OF is an abstract concept when cross-process MQ or cross-thread async/batch tasks
 * happen, we used {@link TraceSegmentRef} for these scenarios. Check {@link TraceSegmentRef} which is from {@link
 * ContextCarrier} or {@link ContextSnapshot}.
 */
public class TracingContext implements AbstractTracerContext {
    private static final ILog LOGGER = LogManager.getLogger(TracingContext.class);
    private long lastWarningTimestamp = 0;

    /**
     * @see ProfileTaskExecutionService
     */
    private static ProfileTaskExecutionService PROFILE_TASK_EXECUTION_SERVICE;

    /**
     * The final {@link TraceSegment}, which includes all finished spans.
     * 追踪段,同一线程内的所有调用
     */
    private TraceSegment segment;

    /**
     * Active spans stored in a Stack, usually called 'ActiveSpanStack'. This {@link LinkedList} is the in-memory
     * storage-structure. <p> I use {@link LinkedList#removeLast()}, {@link LinkedList#addLast(Object)} and {@link
     * LinkedList#getLast()} instead of {@link #pop()}, {@link #push(AbstractSpan)}, {@link #peek()}
     */
    private LinkedList<AbstractSpan> activeSpanStack = new LinkedList<>();

    /**
     * @since 8.10.0 replace the removed "firstSpan"(before 8.10.0) reference. see {@link PrimaryEndpoint} for more details.
     */
    private PrimaryEndpoint primaryEndpoint = null;

    /**
     * A counter for the next span.
     */
    private int spanIdGenerator;

    /**
     * The counter indicates
     */
    @SuppressWarnings("unused") // updated by ASYNC_SPAN_COUNTER_UPDATER
    private volatile int asyncSpanCounter;
    private static final AtomicIntegerFieldUpdater<TracingContext> ASYNC_SPAN_COUNTER_UPDATER =
        AtomicIntegerFieldUpdater.newUpdater(TracingContext.class, "asyncSpanCounter");
    private volatile boolean isRunningInAsyncMode;
    private volatile ReentrantLock asyncFinishLock;

    private volatile boolean running;

    private final long createTime;

    /**
     * profile status
     */
    private final ProfileStatusContext profileStatus;
    @Getter(AccessLevel.PACKAGE)
    private final CorrelationContext correlationContext;
    @Getter(AccessLevel.PACKAGE)
    private final ExtensionContext extensionContext;

    //CDS watcher
    private final SpanLimitWatcher spanLimitWatcher;

    /**
     * Initialize all fields with default value.
     */
    TracingContext(String firstOPName, SpanLimitWatcher spanLimitWatcher) {
        this.segment = new TraceSegment();
        this.spanIdGenerator = 0;
        isRunningInAsyncMode = false;
        createTime = System.currentTimeMillis();
        running = true;

        // profiling status
        if (PROFILE_TASK_EXECUTION_SERVICE == null) {
            PROFILE_TASK_EXECUTION_SERVICE = ServiceManager.INSTANCE.findService(ProfileTaskExecutionService.class);
        }
        this.profileStatus = PROFILE_TASK_EXECUTION_SERVICE.addProfiling(
            this, segment.getTraceSegmentId(), firstOPName);

        this.correlationContext = new CorrelationContext();
        this.extensionContext = new ExtensionContext();
        this.spanLimitWatcher = spanLimitWatcher;
    }

    /**
     * 获取全局追踪身份traceId
     * @return the first global trace id.
     */
    @Override
    public String getReadablePrimaryTraceId() {
        // 获取分布式的追踪身份的id字段属性
        return getPrimaryTraceId().getId();
    }

    private DistributedTraceId getPrimaryTraceId() {
        // 获取追踪段相关的分布式的追踪身份
        return segment.getRelatedGlobalTrace();
    }

    @Override
    public String getSegmentId() {
        return segment.getTraceSegmentId();
    }

    @Override
    public int getSpanId() {
        return activeSpan().getSpanId();
    }

    // ...

}

DistributedTraceId

org.apache.skywalking.apm.agent.core.context.ids.DistributedTraceId#id

分布式的追踪身份DistributedTraceId,表示一个分布式调用链路。

package org.apache.skywalking.apm.agent.core.context.ids;

import lombok.EqualsAndHashCode;
import lombok.Getter;
import lombok.RequiredArgsConstructor;
import lombok.ToString;

/**
 * The <code>DistributedTraceId</code> presents a distributed call chain.
 * 表示一个分布式调用链路。
 * <p>
 * This call chain has a unique (service) entrance,
 * <p>
 * such as: Service : http://www.skywalking.com/cust/query, all the remote, called behind this service, rest remote, db
 * executions, are using the same <code>DistributedTraceId</code> even in different JVM.
 * <p>
 * The <code>DistributedTraceId</code> contains only one string, and can NOT be reset, creating a new instance is the
 * only option.
 */
@RequiredArgsConstructor
@ToString
@EqualsAndHashCode
public abstract class DistributedTraceId {
    @Getter
    private final String id;
}

DistributedTraceId有两个子类PropagatedTraceIdNewDistributedTraceId

PropagatedTraceId

org.apache.skywalking.apm.agent.core.context.ids.PropagatedTraceId

传播的追踪身份PropagatedTraceId,表示从对等端传播的DistributedTraceId
在这里插入图片描述

package org.apache.skywalking.apm.agent.core.context.ids;

/**
 * The <code>PropagatedTraceId</code> represents a {@link DistributedTraceId}, which is propagated from the peer.
 */
public class PropagatedTraceId extends DistributedTraceId {
    public PropagatedTraceId(String id) {
        // 透传追踪身份traceId
        super(id);
    }
}

NewDistributedTraceId

org.apache.skywalking.apm.agent.core.context.ids.NewDistributedTraceId

新的分布式的追踪身份NewDistributedTraceId,是具有新生成的id的DistributedTraceId
默认构造函数调用GlobalIdGenerator.generate()生成新的全局id,即追踪身份traceId
在这里插入图片描述

package org.apache.skywalking.apm.agent.core.context.ids;

/**
 * The <code>NewDistributedTraceId</code> is a {@link DistributedTraceId} with a new generated id.
 */
public class NewDistributedTraceId extends DistributedTraceId {
    public NewDistributedTraceId() {
        // 生成新的全局id,即追踪身份traceId
        super(GlobalIdGenerator.generate());
    }
}

GlobalIdGenerator.generate()

org.apache.skywalking.apm.agent.core.context.ids.GlobalIdGenerator#generate

全局id生成器GlobalIdGenerator
本方法用于生成一个新的全局id,是真正生成追踪身份traceId的地方
在这里插入图片描述

package org.apache.skywalking.apm.agent.core.context.ids;

import java.util.UUID;

import org.apache.skywalking.apm.util.StringUtil;

public final class GlobalIdGenerator {
    // 应用实例进程身份id
    private static final String PROCESS_ID = UUID.randomUUID().toString().replaceAll("-", "");
    // 线程的id序列号的上下文
    private static final ThreadLocal<IDContext> THREAD_ID_SEQUENCE = ThreadLocal.withInitial(
        () -> new IDContext(System.currentTimeMillis(), (short) 0));

    private GlobalIdGenerator() {
    }

    /**
     * 生成一个新的id。
     * Generate a new id, combined by three parts.
     * <p>
     * The first one represents application instance id.
     * 第一部分,表示应用实例进程身份id
     * <p>
     * The second one represents thread id.
     * 第二部分,表示线程身份id
     * <p>
     * The third one also has two parts, 1) a timestamp, measured in milliseconds 2) a seq, in current thread, between
     * 0(included) and 9999(included)
     * 第三部分,也有两个部分, 1) 一个时间戳,单位是毫秒ms 2) 在当前线程中的一个序列号,位于[0,9999]之间
     *
     * @return unique id to represent a trace or segment
     * 表示追踪或追踪段的唯一id
     */
    public static String generate() {
        return StringUtil.join(
            '.',
            PROCESS_ID,
            String.valueOf(Thread.currentThread().getId()),
            String.valueOf(THREAD_ID_SEQUENCE.get().nextSeq())
        );
    }

    private static class IDContext {
        private long lastTimestamp;
        private short threadSeq;

        // Just for considering time-shift-back only.
        private long lastShiftTimestamp;
        private int lastShiftValue;

        private IDContext(long lastTimestamp, short threadSeq) {
            this.lastTimestamp = lastTimestamp;
            this.threadSeq = threadSeq;
        }

        private long nextSeq() {
            return timestamp() * 10000 + nextThreadSeq();
        }

        private long timestamp() {
            long currentTimeMillis = System.currentTimeMillis();

            if (currentTimeMillis < lastTimestamp) {
                // Just for considering time-shift-back by Ops or OS. @hanahmily 's suggestion.
                if (lastShiftTimestamp != currentTimeMillis) {
                    lastShiftValue++;
                    lastShiftTimestamp = currentTimeMillis;
                }
                return lastShiftValue;
            } else {
                lastTimestamp = currentTimeMillis;
                return lastTimestamp;
            }
        }

        private short nextThreadSeq() {
            if (threadSeq == 10000) {
                threadSeq = 0;
            }
            return threadSeq++;
        }
    }
}

案例实战

实践出真知识!!!
若不了解其底层实现原理,是很难想到这些切面的拦截点。
monitor/watch/trace 相关 - Arthas 命令列表

// 【切面的拦截点】生成新的追踪身份traceId + wanda_event开头的线程
stack org.apache.skywalking.apm.agent.core.context.ids.NewDistributedTraceId <init> '@java.lang.Thread@currentThread().getName().startsWith("wanda_event")'

watch org.apache.skywalking.apm.agent.core.context.ids.NewDistributedTraceId <init> '{target, returnObj}' '@java.lang.Thread@currentThread().getName().startsWith("wanda_event")' -x 6

// 【切面的拦截点】获取全局追踪身份traceId + wanda_event开头的线程
stack org.apache.skywalking.apm.agent.core.context.AbstractTracerContext getReadablePrimaryTraceId '@java.lang.Thread@currentThread().getName().startsWith("wanda_event")'

watch org.apache.skywalking.apm.agent.core.context.AbstractTracerContext getPrimaryTraceId '{target, returnObj}' '@java.lang.Thread@currentThread().getName().startsWith("wanda_event")' -x 6

【案例1】wanda事件线程的traceId是谁新生成的?

这些操作是否合理?

使用Arthas的stack命令,可以查看生成新的全局traceId的调用栈。
通过调用栈,traceId是由guava事件总线的订阅者Subscriber.invokeSubscriberMethod触发生成的。

在这里插入图片描述

[arthas@7]$ stack org.apache.skywalking.apm.agent.core.context.ids.NewDistributedTraceId <init> '@java.lang.Thread@currentThread().getName().startsWith("wanda_event")'
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 432 ms, listenerId: 5
ts=2024-03-05 11:52:45;thread_name=wanda_event-thread-1;id=f6;is_daemon=false;priority=5;TCCL=org.springframework.boot.loader.LaunchedURLClassLoader@8dfe921
    @org.apache.skywalking.apm.agent.core.context.ids.NewDistributedTraceId.<init>()
        at org.apache.skywalking.apm.agent.core.context.trace.TraceSegment.<init>(TraceSegment.java:74)
        at org.apache.skywalking.apm.agent.core.context.TracingContext.<init>(TracingContext.java:122)
        at org.apache.skywalking.apm.agent.core.context.ContextManagerExtendService.createTraceContext(ContextManagerExtendService.java:91)
        at org.apache.skywalking.apm.agent.core.context.ContextManager.getOrCreate(ContextManager.java:60)
        at org.apache.skywalking.apm.agent.core.context.ContextManager.createLocalSpan(ContextManager.java:123)
        // guava-eventbus-plugin
        // 调用方法拦截器
        at org.apache.skywalking.apm.plugin.guava.eventbus.EventBusSubscriberInterceptor.beforeMethod(EventBusSubscriberInterceptor.java:38)
        at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInterWithOverrideArgs.intercept(InstMethodsInterWithOverrideArgs.java:75)
        // 原生方法
        at com.google.common.eventbus.Subscriber.invokeSubscriberMethod(Subscriber.java:-1)
        at com.google.common.eventbus.Subscriber$SynchronizedSubscriber.invokeSubscriberMethod(Subscriber.java:145)
        at com.google.common.eventbus.Subscriber$1.run(Subscriber.java:73)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

其是由apm-guava-eventbus-plugin插件的EventBusSubscriberInstrumentation操作改变字节码。
在这里插入图片描述

【案例2】在wanda事件线程追踪段中,查看在哪些地方获取traceId?

这些操作是否合理?

LeoaoJsonLayout.addCustomDataToJsonMap(LeoaoJsonLayout.java:29)方法中有调用TraceContext.traceId()

[arthas@7]$ stack org.apache.skywalking.apm.agent.core.context.AbstractTracerContext getReadablePrimaryTraceId '@java.lang.Thread@currentThread().getName().startsWith("wanda_event")'
Press Q or Ctrl+C to abort.
Affect(class count: 2 , method count: 1) cost in 423 ms, listenerId: 3
ts=2024-03-04 21:03:59;thread_name=wanda_event-thread-1;id=140;is_daemon=false;priority=5;TCCL=org.springframework.boot.loader.LaunchedURLClassLoader@67fe380b
    @org.apache.skywalking.apm.agent.core.context.TracingContext.getReadablePrimaryTraceId()
        at org.apache.skywalking.apm.agent.core.context.ContextManager.getGlobalTraceId(ContextManager.java:77)
        at org.apache.skywalking.apm.toolkit.activation.trace.TraceIDInterceptor.beforeMethod(TraceIDInterceptor.java:35)
        at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.StaticMethodsInter.intercept(StaticMethodsInter.java:73)
        at org.apache.skywalking.apm.toolkit.trace.TraceContext.traceId(TraceContext.java:-1)
        // SkyWalking核心链路是上面👆🏻
        // 调用TraceContext.traceId()的地点
        at com.leoao.lpaas.logback.LeoaoJsonLayout.addCustomDataToJsonMap(LeoaoJsonLayout.java:29)
        at ch.qos.logback.contrib.json.classic.JsonLayout.toJsonMap(null:-1)
        at ch.qos.logback.contrib.json.classic.JsonLayout.toJsonMap(null:-1)
        at ch.qos.logback.contrib.json.JsonLayoutBase.doLayout(null:-1)
        at ch.qos.logback.core.encoder.LayoutWrappingEncoder.encode(LayoutWrappingEncoder.java:115)
        at ch.qos.logback.core.OutputStreamAppender.subAppend(OutputStreamAppender.java:230)
        at ch.qos.logback.core.rolling.RollingFileAppender.subAppend(RollingFileAppender.java:235)
        at ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:102)
        at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
        at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
        at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
        at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
        at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)
        at ch.qos.logback.classic.Logger.filterAndLog_1(Logger.java:398)
        at ch.qos.logback.classic.Logger.info(Logger.java:583)
        // 输出打印日志
        // log.info("receive event persistUserPositionEvent=[{}]", event);
        at com.lefit.wanda.domain.event.listener.PersistUserPositionEventListener.change(PersistUserPositionEventListener.java:23)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-2)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.google.common.eventbus.Subscriber.invokeSubscriberMethod$original$ToNcZpNk(Subscriber.java:88)
        at com.google.common.eventbus.Subscriber.invokeSubscriberMethod$original$ToNcZpNk$accessor$utMvob4N(Subscriber.java:-1)
        at com.google.common.eventbus.Subscriber$auxiliary$8fYqzzq0.call(null:-1)
        at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInterWithOverrideArgs.intercept(InstMethodsInterWithOverrideArgs.java:85)
        at com.google.common.eventbus.Subscriber.invokeSubscriberMethod(Subscriber.java:-1)
        at com.google.common.eventbus.Subscriber$SynchronizedSubscriber.invokeSubscriberMethod(Subscriber.java:145)
        at com.google.common.eventbus.Subscriber$1.run(Subscriber.java:73)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

在这里插入图片描述

【收获】
skywalking-agent启用的插件列表plugins/要有所取舍与衡量,组件开启的越多对链路追踪和拓扑的越复杂,影响面越大,未知不可控的因素也会增多。

参考引用


祝大家玩得开心!ˇˍˇ

简放,杭州

  • 17
    点赞
  • 26
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 6
    评论
Skywalking链路追踪是一种用于监测和追踪分布式系统中请求路径的方法。它能够帮助开发人员了解系统中的微服务之间的调用关系,以及发现潜在的性能瓶颈和故障点。 当前主流的分布式链路追踪系统中,Skywalking是其中一个非常受欢迎的开源系统。它提供了丰富的功能和灵活的配置选项,使得开发人员能够轻松地集成和使用它来进行链路追踪Skywalking的基本知识包括了服务监控的三个要素,即请求的入口、请求的出口以及请求的耗时。通过监控这些要素,我们可以获得对分布式系统中请求的全局视图,并且可以跟踪请求路径中的每个节点。 要使用Skywalking进行链路追踪,我们首先需要在系统中部署Skywalking oap服务,并将其注册到nacos上。只要至少保持一个Skywalking oap服务在运行,就能够进行链路追踪。通过配置Skywalking agent,我们可以在各个微服务中埋点,从而捕获和发送调用链数据到Skywalking oap服务。 总结起来,Skywalking链路追踪是一种用于监测和追踪分布式系统中请求路径的方法,它能够帮助开发人员了解系统中的微服务之间的调用关系,以及发现潜在的性能瓶颈和故障点。Skywalking是当前非常受欢迎的开源链路追踪系统,通过部署Skywalking oap服务和在各个微服务中埋点,我们可以实现对分布式系统中请求的全局视图和详细追踪数据的收集。<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* *2* [全网最全的Skywalking链路追踪](https://blog.csdn.net/scmagic/article/details/123429815)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 50%"] - *3* [链路追踪SkyWalking](https://blog.csdn.net/qq_41910252/article/details/122746979)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 50%"] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 6
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

简放视野

深度思考,简放视野。

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值