skywalking源码3--启动服务

GRPCChannelManager

功能:监听网络状态,通知其它服务重连

步骤:

  1. 创建1个定时线程(每30s),根据是否reconnect=true来创建新的grpc managedChannel。
  2. 通知所有监听的服务,这些服务会拿这个新的managedChannel重新创建grpc stub,这样就实现了网络重连。
  3. 同时,如果这些服务出现了异常,会调用GRPCChannelManager#reportError方法重置reconnect=true,这样GRPCChannelManager就能感知到网络故障了。

注册的监听服务如下:
在这里插入图片描述

ServiceManagementClient

功能:定时发送心跳给OAP,接收OAP下发的命令

步骤:

  1. 创建1个定时线程(每30s),往OAP发送心跳。
  2. 如果正常,将会收到OAP的命令,然后转交给CommandService执行。
  3. 如果异常,告诉GRPCChannelManager网络异常了。

CommandService

功能:不断获取命令执行,其它服务可通过receiveCommand方法提交命令。

步骤:

  1. 创建1个线程,不断从LinkedBlockingQueue中获取命令,交给CommandExecutorService进行分发(同一个命令不会重复执行,根据命令编号来确认)。
  2. CommandExecutorService根据命令类型分发给不同的CommandExecutor。

命令和命令执行器如下:

  1. ProfileTaskCommand => ProfileTaskCommandExecutor
  2. ConfigurationDiscoveryCommand => ConfigurationDiscoveryCommandExecutor
  3. 其它命令 => NoopCommandExecutor(啥也没干)

TraceSegmentServiceClient

功能:把TraceSegment发给OAP

步骤:

  1. 创建1个数据池DataCarrier(默认5个队列,每个队列大小为300)。
  2. 把数据池的队列绑定到消费线程上(默认是1个,即它要消费所有队列)。
  3. 这些消费线程共用1个TraceSegmentServiceClient,把TraceSegment发给OAP。如果发送成功,会收到OAP下发的命令,继而转交给CommandService执行。如果发送失败,会告诉GRPCChannelManager网络异常了。
    @Override
    public void boot() {
        lastLogTime = System.currentTimeMillis();
        segmentUplinkedCounter = 0;
        segmentAbandonedCounter = 0;
        // 创建数据池,默认有5个消费队列,每个消费队列大小为300
        carrier = new DataCarrier<>(CHANNEL_SIZE, BUFFER_SIZE, BufferStrategy.IF_POSSIBLE);
        // 参数2定义了有几个消费线程,每个线程会消费自己所分配的队列。由于第1个参数是自身,所以这些线程最终都会共用TraceSegmentServiceClient把Trace数据发送给OAP
        carrier.consume(this, 1);
    }

	// 把Trace数据发送给OAP
    @Override
    public void consume(List<TraceSegment> data) {
        if (CONNECTED.equals(status)) {
            final GRPCStreamServiceStatus status = new GRPCStreamServiceStatus(false);
            // 客户端grpc流,客户端分批次发送请求数据,服务端接完所有数据后统一响应一次
            StreamObserver<SegmentObject> upstreamSegmentStreamObserver = serviceStub.withDeadlineAfter(
                Config.Collector.GRPC_UPSTREAM_TIMEOUT, TimeUnit.SECONDS
            ).collect(new StreamObserver<Commands>() {
                @Override
                public void onNext(Commands commands) {
                    ServiceManager.INSTANCE.findService(CommandService.class)
                                           .receiveCommand(commands);
                }

                @Override
                public void onError(
                    Throwable throwable) {
                    status.finished();
                    if (LOGGER.isErrorEnable()) {
                        LOGGER.error(
                            throwable,
                            "Send UpstreamSegment to collector fail with a grpc internal exception."
                        );
                    }
                    ServiceManager.INSTANCE
                        .findService(GRPCChannelManager.class)
                        .reportError(throwable);
                }

                @Override
                public void onCompleted() {
                    status.finished();
                }
            });

            try {
                for (TraceSegment segment : data) {
                    // 把segment转换成proto数据
                    SegmentObject upstreamSegment = segment.transform();
                    // GRPC发送到OAP
                    upstreamSegmentStreamObserver.onNext(upstreamSegment);
                }
            } catch (Throwable t) {
                LOGGER.error(t, "Transform and send UpstreamSegment to collector fail.");
            }

            // 告诉GRPC流已经完全写入进去了,回调上面的StreamObserver
            upstreamSegmentStreamObserver.onCompleted();
            // 强制等待所有的traceSegment都发送完成
            status.wait4Finish();
            segmentUplinkedCounter += data.size();
        } else {
            segmentAbandonedCounter += data.size();
        }

        printUplinkStatus();
    }
	
	//在1个TraceSegment结束的时候,会调用到此方法。TracingContext.ListenerManager.notifyFinish(finishedSegment);
    @Override
    public void afterFinished(TraceSegment traceSegment) {
        if (traceSegment.isIgnore()) {
            return;
        }
        // 往数据池灌traceSegment
        if (!carrier.produce(traceSegment)) {
            if (LOGGER.isDebugEnable()) {
                LOGGER.debug("One trace segment has been abandoned, cause by buffer is full.");
            }
        }
    }

DataCarrier代码如下

public class DataCarrier<T> {
	private Channels<T> channels;

    public DataCarrier consume(Class<? extends IConsumer<T>> consumerClass, int num, long consumeCycle) {
        if (driver != null) {
            driver.close(channels);
        }
        driver = new ConsumeDriver<T>(this.name, this.channels, consumerClass, num, consumeCycle);
        //把队列绑定到几个消费线程上
        driver.begin(channels);
        return this;
    }
}

ProfileTaskChannelService

功能:1. 定时获取OAP新建的Trace Profiling任务,返回ProfileTaskCommand。2. 定时发送线程快照给OAP

步骤

  1. 新建1个定时线程(默认20s),线程去获取OAP端的Trace Profiling任务,返回ProfileTaskCommand
  2. 把ProfileTaskCommand交给CommandService执行
  3. CommandService会把ProfileTaskCommand交给ProfileTaskCommandExecutor,ProfileTaskCommandExecutor负责把ProfileTaskCommand转换为ProfileTask,最后把ProfileTask交给ProfileTaskExecutionService真正的执行
  4. 新建1个定时线程(默认500ms),从BlockingQueue< TracingThreadSnapshot>队列中取线程快照,交给ProfileSnapshotSender服务发给OAP。

ProfileTaskExecutionService

功能:真正地执行ProfileTask。

步骤:

  1. 先结束上1个ProfileTask
  2. new ProfileTaskExecutionContext(ProfileTask),更新全局引用AtomicReference< ProfileTaskExecutionContext> taskExecutionContext
  3. new ProfileThread(ProfileTaskExecutionContext)
  4. 把ProfileThread提交给线程池开始运行
  5. ProfileThread会从ProfileTaskExecutionContext中获取所有的slots,即AtomicReferenceArray profilingSegmentSlots,默认有5个slot,所以最多能采集5个线程。
    这个profilingSegmentSlots是何时插入值的呢?
    在agent拦截入口方法前(比如tomcat),如果请求是被1个新线程处理,那么这个线程会去new TracingContext(先从全局引用taskExecutionContext中拿到当前的ProfileTaskExecutionContext,然后把当前线程封装成ThreadProfiler,根据请求端点和最大采样次数来判断本次是否插入profilingSegmentSlots)。
  6. 遍历profilingSegmentSlots,利用ThreadProfiler来构建快照,主要是获取线程堆栈,然后往ProfileTaskChannelService中添加,这样线程的快照信息就可以发送给OAP了。
  7. 在采样持续时间达到后取消线程
public class ProfileTaskCommandExecutor implements CommandExecutor {

    @Override
    public void execute(BaseCommand command) throws CommandExecutionException {
        final ProfileTaskCommand profileTaskCommand = (ProfileTaskCommand) command;

        // build profile task
        final ProfileTask profileTask = new ProfileTask();
        profileTask.setTaskId(profileTaskCommand.getTaskId());
        // 采样的端点
        profileTask.setFirstSpanOPName(profileTaskCommand.getEndpointName());
        // 采样持续时间
        profileTask.setDuration(profileTaskCommand.getDuration());
        // 最小采样时间门限(当前时间-请求进入的时间必须大于此值,才认为这个请求是需要采样的)
        profileTask.setMinDurationThreshold(profileTaskCommand.getMinDurationThreshold());
        // 采样间隔
        profileTask.setThreadDumpPeriod(profileTaskCommand.getDumpPeriod());
        // 最大采样数
        profileTask.setMaxSamplingCount(profileTaskCommand.getMaxSamplingCount());
        // 采样开始时间
        profileTask.setStartTime(profileTaskCommand.getStartTime());
        profileTask.setCreateTime(profileTaskCommand.getCreateTime());

        // send to executor
        ServiceManager.INSTANCE.findService(ProfileTaskExecutionService.class).addProfileTask(profileTask);
    }

}

public class ProfileTaskExecutionService implements BootService, TracingThreadListener {
	 // 缓存
 	 private final AtomicReference<ProfileTaskExecutionContext> taskExecutionContext = new AtomicReference<>();
     
     public void addProfileTask(ProfileTask task) {
        // update last command create time
        if (task.getCreateTime() > lastCommandCreateTime) {
            lastCommandCreateTime = task.getCreateTime();
        }

        // check profile task limit
        final CheckResult dataError = checkProfileTaskSuccess(task);
        if (!dataError.isSuccess()) {
            LOGGER.warn(
                "check command error, cannot process this profile task. reason: {}", dataError.getErrorReason());
            return;
        }

        // add task to list
        profileTaskList.add(task);

        // 在指定的startTime开始执行
        long timeToProcessMills = task.getStartTime() - System.currentTimeMillis();
        PROFILE_TASK_SCHEDULE.schedule(() -> processProfileTask(task), timeToProcessMills, TimeUnit.MILLISECONDS);
    }
    
    private synchronized void processProfileTask(ProfileTask task) {
        // make sure prev profile task already stopped
        stopCurrentProfileTask(taskExecutionContext.get());

        // make stop task schedule and task context
        final ProfileTaskExecutionContext currentStartedTaskContext = new ProfileTaskExecutionContext(task);
        taskExecutionContext.set(currentStartedTaskContext);

        // start profiling this task
        currentStartedTaskContext.startProfiling(PROFILE_EXECUTOR);
		
		// 在持续时间达到后取消线程运行
        PROFILE_TASK_SCHEDULE.schedule(
            () -> stopCurrentProfileTask(currentStartedTaskContext), task.getDuration(), TimeUnit.MINUTES);
    }

public class ProfileTaskExecutionContext {
	private final ProfileTask task;
	private volatile AtomicReferenceArray<ThreadProfiler> profilingSegmentSlots;
    
    public ProfileTaskExecutionContext(ProfileTask task) {
        this.task = task;
        profilingSegmentSlots = new AtomicReferenceArray<>(Config.Profile.MAX_PARALLEL);
    }

    public ProfileStatusReference attemptProfiling(TracingContext tracingContext,
                                                   String traceSegmentId,
                                                   String firstSpanOPName) {
        // check has available slot
        final int usingSlotCount = currentProfilingCount.get();
        if (usingSlotCount >= Config.Profile.MAX_PARALLEL) {
            return ProfileStatusReference.createWithNone();
        }

        // check first operation name matches
        if (!Objects.equals(task.getFirstSpanOPName(), firstSpanOPName)) {
            return ProfileStatusReference.createWithNone();
        }

        // if out limit started profiling count then stop add profiling
        if (totalStartedProfilingCount.get() > task.getMaxSamplingCount()) {
            return ProfileStatusReference.createWithNone();
        }

        // try to occupy slot
        if (!currentProfilingCount.compareAndSet(usingSlotCount, usingSlotCount + 1)) {
            return ProfileStatusReference.createWithNone();
        }

        final ThreadProfiler threadProfiler = new ThreadProfiler(
            tracingContext, traceSegmentId, Thread.currentThread(), this);
        int slotLength = profilingSegmentSlots.length();
        for (int slot = 0; slot < slotLength; slot++) {
            if (profilingSegmentSlots.compareAndSet(slot, null, threadProfiler)) {
                return threadProfiler.profilingStatus();
            }
        }
        return ProfileStatusReference.createWithNone();
    }
}
    
public class ProfileThread implements Runnable {
    public ProfileThread(ProfileTaskExecutionContext taskExecutionContext) {
        this.taskExecutionContext = taskExecutionContext;
        profileTaskExecutionService = ServiceManager.INSTANCE.findService(ProfileTaskExecutionService.class);
        profileTaskChannelService = ServiceManager.INSTANCE.findService(ProfileTaskChannelService.class);
    }
    
    @Override
    public void run() {
        try {
            profiling(taskExecutionContext);
        } catch (InterruptedException e) {
            // ignore interrupted
            // means current task has stopped
        } catch (Exception e) {
            LOGGER.error(e, "Profiling task fail. taskId:{}", taskExecutionContext.getTask().getTaskId());
        } finally {
            // finally stop current profiling task, tell execution service task has stop
            profileTaskExecutionService.stopCurrentProfileTask(taskExecutionContext);
        }

    }

    private void profiling(ProfileTaskExecutionContext executionContext) throws InterruptedException {

        int maxSleepPeriod = executionContext.getTask().getThreadDumpPeriod();

        // run loop when current thread still running
        long currentLoopStartTime = -1;
        while (!Thread.currentThread().isInterrupted()) {
            currentLoopStartTime = System.currentTimeMillis();

            // each all slot采集插槽,profilingSegmentSlots什么时候插入呢?
            //在agent拦截入口方法前(比如tomcat),new TracingContext时会插入slot到profilingSegmentSlots(通过Thread.currentThread()获取线程栈信息)
            AtomicReferenceArray<ThreadProfiler> profilers = executionContext.threadProfilerSlots();
            int profilerCount = profilers.length();
            for (int slot = 0; slot < profilerCount; slot++) {
                ThreadProfiler currentProfiler = profilers.get(slot);
                if (currentProfiler == null) {
                    continue;
                }

                switch (currentProfiler.profilingStatus().get()) {
                    case PENDING:
                        /**
                         if (System.currentTimeMillis() - tracingContext.createTime() > executionContext.getTask()
.getMinDurationThreshold()),更新状态为PROFILING
                        */
                        currentProfiler.startProfilingIfNeed();
                        break;

                    case PROFILING:
                        // 构建线程快照,然后往ProfileTaskChannelService中添加,这样就能被发送给OAP了
                        TracingThreadSnapshot snapshot = currentProfiler.buildSnapshot();
                        if (snapshot != null) {
                            profileTaskChannelService.addProfilingSnapshot(snapshot);
                        } else {
                            // tell execution context current tracing thread dump failed, stop it
                            executionContext.stopTracingProfile(currentProfiler.tracingContext());
                        }
                        break;

                }
            }

            // sleep to next period
            // if out of period, sleep one period
            long needToSleep = (currentLoopStartTime + maxSleepPeriod) - System.currentTimeMillis();
            needToSleep = needToSleep > 0 ? needToSleep : maxSleepPeriod;
            Thread.sleep(needToSleep);
        }
    }
 }

ProfileSnapshotSender

功能:发送线程快照给OAP

ConfigurationDiscoveryService

功能:定时拉取远端的配置,配置有变化的话交给对应的watcher处理。

步骤:

  1. 开启1个定时线程(默认20s),拉取OAP最新配置,OAP返回1个ConfigurationDiscoveryCommand交给CommandService。如果配置没有任何变化,那么ConfigurationDiscoveryCommand的UUID是一样的。
  2. CommandService最终会调用ConfigurationDiscoveryService#handleConfigurationDiscoveryCommand方法,根据uuid来判断是否有配置变化,如果无则直接返回,否则下一步
  3. 把ConfigurationDiscoveryCommand转成kv格式
  4. 遍历所有的key,找到对这个key感兴趣的watcher,如果key值和watcher默认值不同,说明有变化,watcher更新默认值。其它服务可通过registerAgentConfigChangeWatcher方法注册watcher
    在这里插入图片描述
  • 15
    点赞
  • 21
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值