


Zipkin 简介

Zipkin 是基于 Dapper 论文实现,由 Twitter 开源的分布式追踪系统,通过收集分布式服务执行时间的信息来达到追踪服务调用链路、以及分析服务执行延迟等目的。







  • Event类型

    cs:Client Send 请求
    sr:Server Receive到请求
    ss:Server 处理完成、并Send Response
    cr:Client Receive 到响应
  • 什么时候生成
    客户端发送Request、接受到Response、服务器端接受到Request、发送 Response时生成。Annotation属于某个Span,需把新生成的Annotation添加到当前上下文里Span的annotations数组里

  • thrift数据结构

     * Associates an event that explains latency with a timestamp.
     * Unlike log statements, annotations are often codes: for example "sr".
    struct Annotation {
       * Microseconds from epoch.
       * This value should use the most precise value possible. For example,
       * gettimeofday or syncing nanoTime against a tick of currentTimeMillis.
      1: i64 timestamp
       * Usually a short tag indicating an event, like "sr" or "finagle.retry".
      2: string value
       * The host that recorded the value, primarily for query by service name.
      3: optional Endpoint host
      // don't reuse 4: optional i32 OBSOLETE_duration         // how long did the operation take? microseconds



  • 什么时候生成?

  • Thrift数据结构

     * Binary annotations are tags applied to a Span to give it context. For
     * example, a binary annotation of "http.uri" could the path to a resource in a
     * RPC call.
     * Binary annotations of type STRING are always queryable, though more a
     * historical implementation detail than a structural concern.
     * Binary annotations can repeat, and vary on the host. Similar to Annotation,
     * the host indicates who logged the event. This allows you to tell the
     * difference between the client and server side of the same key. For example,
     * the key "http.uri" might be different on the client and server side due to
     * rewriting, like "/api/v1/myresource" vs "/myresource. Via the host field,
     * you can see the different points of view, which often help in debugging.
     struct BinaryAnnotation {
       * Name used to lookup spans, such as "http.uri" or "finagle.version".
      1: string key,
       * Serialized thrift bytes, in TBinaryProtocol format.
       * For legacy reasons, byte order is big-endian. See THRIFT-3217.
      2: binary value,
       * The thrift type of value, most often STRING.
       * annotation_type shouldn't vary for the same key.
      3: AnnotationType annotation_type,
       * The host that recorded value, allowing query by service name or address.
       * There are two exceptions: when key is "ca" or "sa", this is the source or
       * destination of an RPC. This exception allows zipkin to display network
       * context of uninstrumented services, such as browsers or databases.
      4: optional Endpoint host
  • AnnotationType 结构

    * A subset of thrift base types, except BYTES.
    enum AnnotationType {
        BOOL,  BYTES,I16,  I32,  I64,  DOUBLE,  STRING


表示一次完整RPC调用,是由一组Annotation和BinaryAnnotation组成。是追踪服务调用的基本结构,多span形成树形结构组合成一次Trace追踪记录。Span是有父子关系的,比如:Client A、Client A -> B、B ->C、C -> D、分别会产生4个Span。Client A接收到请求会时生成一个Span A、Client A -> B发请求时会再生成一个Span A-B,并且Span A是 Span A-B的父节点

  • 什么时候生成

    • 服务接受到 Request时,若当前Request没有关联任何Span,便生成一个Span,包括:Span ID、TraceID
    • 向下游服务发送Request时,需生成一个Span,并把新生成的Span的父节点设置成上一步生成的Span
  • Thrift结构

    * A trace is a series of spans (often RPC calls) which form a latency tree.
    * Spans are usually created by instrumentation in RPC clients or servers, but
    * can also represent in-process activity. Annotations in spans are similar to
    * log statements, and are sometimes created directly by application developers
    * to indicate events of interest, such as a cache miss.
    * The root span is where parent_id = Nil; it usually has the longest duration
    * in the trace.
    * Span identifiers are packed into i64s, but should be treated opaquely.
    * String encoding is fixed-width lower-hex, to avoid signed interpretation.
    struct Span {
    * Unique 8-byte identifier for a trace, set on all spans within it.
    1: i64 trace_id
    * Span name in lowercase, rpc method for example. Conventionally, when the
    * span name isn't known, name = "unknown".
    3: string name,
    * Unique 8-byte identifier of this span within a trace. A span is uniquely
    * identified in storage by (trace_id, id).
    4: i64 id,
    * The parent's; absent if this the root span in a trace.
    5: optional i64 parent_id,
    * Associates events that explain latency with a timestamp. Unlike log
    * statements, annotations are often codes: for example SERVER_RECV("sr").
    * Annotations are sorted ascending by timestamp.
    6: list<Annotation> annotations,
    * Tags a span with context, usually to support query or aggregation. For
    * example, a binary annotation key could be "http.uri".
    8: list<BinaryAnnotation> binary_annotations
    * True is a request to store this span even if it overrides sampling policy.
    9: optional bool debug = 0
    * Epoch microseconds of the start of this span, absent if this an incomplete
    * span.
    * This value should be set directly by instrumentation, using the most
    * precise value possible. For example, gettimeofday or syncing nanoTime
    * against a tick of currentTimeMillis.
    * For compatibilty with instrumentation that precede this field, collectors
    * or span stores can derive this via Annotation.timestamp.
    * For example, SERVER_RECV.timestamp or CLIENT_SEND.timestamp.
    * Timestamp is nullable for input only. Spans without a timestamp cannot be
    * presented in a timeline: Span stores should not output spans missing a
    * timestamp.
    * There are two known edge-cases where this could be absent: both cases
    * exist when a collector receives a span in parts and a binary annotation
    * precedes a timestamp. This is possible when..
    *  - The span is in-flight (ex not yet received a timestamp)
    *  - The span's start event was lost
    10: optional i64 timestamp,
    * Measurement in microseconds of the critical path, if known.
    * This value should be set directly, as opposed to implicitly via annotation
    * timestamps. Doing so encourages precision decoupled from problems of
    * clocks, such as skew or NTP updates causing time to move backwards.
    * For compatibility with instrumentation that precede this field, collectors
    * or span stores can derive this by subtracting Annotation.timestamp.
    * For example, SERVER_SEND.timestamp - SERVER_RECV.timestamp.
    * If this field is persisted as unset, zipkin will continue to work, except
    * duration query support will be implementation-specific. Similarly, setting
    * this field non-atomically is implementation-specific.
    * This field is i64 vs i32 to support spans longer than 35 minutes.
    11: optional i64 duration



  • Trace ID:起始(根)服务生成的TraceID
  • Span ID:调用下游服务时所生成的Span ID
  • Parent Span ID:父Span ID
  • Is Sampled:是否需要采样
  • Flags:告诉下游服务,是否是debug Reqeust

Trace Tree组成

一个完整Trace 由一组Span组成,这一组Span必须具有相同的TraceID;Span具有父子关系,处于子节点的Span必须有parent_id,Span由一组 Annotation和BinaryAnnotation组成。整个Trace Tree通过Trace Id、Span ID、parent Span ID串起来的。


  • Web入口处,需把SessionID、UserID(若登陆)、用户IP等信息记录到BinaryAnnotation里
  • 关键子子调用也需用zipkin追踪,比如:订单调用了Mysql,也许把个调用的耗时情况记录到 Annotation里
  • 关键出错日志或者异常也许记录到BinaryAnnotation里



testService(Web服务) -> OrderServ(Thrift) -> StockServ & PayServ(Thrift)。一共有四个服务,testService 调用 OrderServ、OrderServ同时调用 StockServ和PayServ。需生成的Trace信息如下:

  • testService收到Http Reqeust时,需在入口处生成TraceID、SpanID,以及一个Span对象,假若叫Span1。
  • testService向OrderServ发送 Thrift Request时,需新生成一Span2,并把parent ID设置成Span1的spanID。同事需修改Thrift Header,把Span2的spanID、parent ID、TraceID 传递给下游服务。也需生成"cs" Annotation,关联到span2上;当接受到OrderServ的Response时,再生成"cr" Annotation,也关联到span2上。
  • OrderServ接受到Thrift Request后,从Thrift Header里解析到TraceID、parent ID、 Span ID(span2)、并保留到上下文里。同时生成"sr"Annotaition,并关联到span2上;当处理完成发送response时,再生成"ss"Annotation,并关联到span2上。
  • OrderServ向StockServ发送 Thrift Request时,需新生成一Span3,并把parentID设置成上一步(Span2)的span ID。Annotation处理如上。
  • Order Serv向PayServ发送请求时,新生成一Span4,并把parentID设置Span2的span ID。Annotation处理如上


Zipkin 架构

Zipkin architecture

Collector 收集器、Storage 存储、API、UI 用户界面等几部分构成了 Zipkin Server 部分,对应于 GitHub 上 openzipkin/zipkin 这个项目。而收集应用中调用的耗时信息并将其上报的组件与应用共生,并拥有各个语言的实现版本,其中 Java 的实现是 GitHub 上 openzipkin/brave。除了 Java 客户端实现之外,openzipkin 还提供了许多其他语言的实现,其中包括了 go、php、JavaScript、.net、ruby 等,具体列表可以参阅 Zipkin 的 Exiting instrumentations

Zipkin 的工作过程

当用户发起一次调用时,Zipkin 的客户端会在入口处为整条调用链路生成一个全局唯一的 trace id,并为这条链路中的每一次分布式调用生成一个 span id。span 与 span 之间可以有父子嵌套关系,代表分布式调用中的上下游关系。span 和 span 之间可以是兄弟关系,代表当前调用下的两次子调用。一个 trace 由一组 span 组成,可以看成是由 trace 为根节点,span 为若干个子节点的一棵树。

Related image

Span 由调用边界来分隔,在 Zipkin 中,调用边界由以下四个 annotation 来表示:

  • cs - Clent Sent 客户端发送了请求
  • sr - Server Receive 服务端接受到请求
  • ss - Server Send 服务端处理完毕,向客户端发送回应
  • cr - Client Receive 客户端收到结果

显然,通过这四个 annotation 上的时间戳,可以轻易的知道一次完整的调用在不同阶段的耗时,比如:

  • sr - cs 代表了请求在网络上的耗时
  • ss - sr 代表了服务端处理请求的耗时
  • cr - ss 代表了回应在网络上的耗时
  • cr - cs 代表了一次调用的整体耗时

Zipkin 会将 trace 相关的信息在调用链路上传递,并在每个调用边界结束时异步的把当前调用的耗时信息上报给 Zipkin Server。Zipkin Server 在收到 trace 信息后,将其存储起来,Zipkin 支持的存储类型有 inMemory、MySql、Cassandra、以及 ElasticsSearch 几种方式。随后 Zipkin 的 Web UI 会通过 API 访问的方式从存储中将 trace 信息提取出来分析并展示,如下图所示:











