Aeron:Aeron Tooling

一、Aeron Stat

Aeron Stat 输出来自 Aeron 的关键计数器,以及所有活动流和最近活动流的位置和关键计数器。

要使用 Aeron Stat,您必须提供要检查的Media Driver文件夹,例如,如果您将Media Driver context配置为:

final MediaDriver.Context mediaDriverCtx = new MediaDriver.Context()
 .aeronDirectoryName("/dev/shm/md");

那么提供给 AeronStat 的路径如下:

java -cp aeron-all-*.jar -Daeron.dir=/dev/shm/md io.aeron.samples.AeronStat

输出(查看运行中的Archive Replication Client

17:03:52 - Aeron Stat (CnC v0.2.0), pid 2771, heartbeat age 451ms
======================================================================
0:               60,704 - Bytes sent
1:              122,848 - Bytes received
2:                    0 - Failed offers to ReceiverProxy
3:                    0 - Failed offers to SenderProxy
4:                    0 - Failed offers to DriverConductorProxy
5:                    0 - NAKs sent
6:                    0 - NAKs received
7:                1,875 - Status Messages sent
8:                  941 - Status Messages received
9:                1,865 - Heartbeats sent
10:                3,610 - Heartbeats received
11:                    0 - Retransmits sent
12:                    0 - Flow control under runs
13:                    0 - Flow control over runs
14:                    0 - Invalid packets
15:                    0 - Errors
16:                    0 - Short sends
17:                    0 - Failed attempts to free log buffers
18:                    0 - Sender flow control limits, i.e. back-pressure events
19:                    0 - Unblocked Publications
20:                    0 - Unblocked Control Commands
21:                    0 - Possible TTL Asymmetry
22:                    0 - ControllableIdleStrategy status
23:                    0 - Loss gap fills
24:                    0 - Client liveness timeouts
25:                    0 - Resolution changes: driverName=null hostname=archive-client
26:          150,858,350 - Conductor max cycle time doing its work in ns: SHARED
27:                    0 - Conductor work cycle exceeded threshold count: threshold=1000000000ns SHARED
28:          149,104,126 - Sender max cycle time doing its work in ns: SHARED
29:                    0 - Sender work cycle exceeded threshold count: threshold=1000000000ns SHARED
30:          149,144,918 - Receiver max cycle time doing its work in ns: SHARED
31:                    0 - Receiver work cycle exceeded threshold count: threshold=1000000000ns SHARED
32:            1,838,850 - NameResolver max time in ns
33:                    0 - NameResolver exceeded threshold count
36:    1,692,637,432,558 - client-heartbeat: 1
52:                    1 - rcv-channel: aeron:udp?term-length=65536|sparse=true|mtu=1408|endpoint=10.1.0.4:0 10.1.0.4:45494
53:                    1 - rcv-local-sockaddr: 52 10.1.0.4:45494
54:                    1 - snd-channel: aeron:udp?term-length=65536|sparse=true|mtu=1408|endpoint=archive-backup:17000 10.1.0.4:33378
55:                    1 - snd-local-sockaddr: 54 10.1.0.4:33378
56:                  448 - pub-pos (sampled): 15 -1436025328 10 aeron:udp?term-length=65536|sparse=true|mtu=1408|endpoint=archive-backup:17000
57:               33,216 - pub-lmt: 15 -1436025328 10 aeron:udp?term-length=65536|sparse=true|mtu=1408|endpoint=archive-backup:17000
58:                  448 - snd-pos: 15 -1436025328 10 aeron:udp?term-length=65536|sparse=true|mtu=1408|endpoint=archive-backup:17000
59:               32,768 - snd-lmt: 15 -1436025328 10 aeron:udp?term-length=65536|sparse=true|mtu=1408|endpoint=archive-backup:17000
60:                    0 - snd-bpe: 15 -1436025328 10 aeron:udp?term-length=65536|sparse=true|mtu=1408|endpoint=archive-backup:17000
61:                  608 - sub-pos: 14 1817141198 20 aeron:udp?term-length=65536|sparse=true|mtu=1408|endpoint=10.1.0.4:0 @0
62:                  608 - rcv-hwm: 17 1817141198 20 aeron:udp?term-length=65536|sparse=true|mtu=1408|endpoint=10.1.0.4:0
63:                  608 - rcv-pos: 17 1817141198 20 aeron:udp?term-length=65536|sparse=true|mtu=1408|endpoint=10.1.0.4:0
64:                    1 - rcv-channel: aeron:udp?endpoint=10.1.0.4:0 10.1.0.4:33933
65:                    1 - rcv-local-sockaddr: 64 10.1.0.4:33933
66:                6,016 - sub-pos: 19 1817141199 200 aeron:udp?endpoint=10.1.0.4:0 @1280
67:                6,016 - rcv-hwm: 21 1817141199 200 aeron:udp?endpoint=10.1.0.4:0
68:                6,016 - rcv-pos: 21 1817141199 200 aeron:udp?endpoint=10.1.0.4:0
--

Core Counters

RowDescription
Top Line这里最重要的数据是hearbeat age - 这是自 cnc.dat 中上一次Media Driver心跳以来所经过的时间。如果这个数字很大(超过 1000 毫秒),请检查Media Driver是否仍在运行
0当前Media Driver通过 UDP 发送的总字节数,不包括 IP headers。如果该数据没有按照应用程序预期的速度增加,则说明出了问题。
1当前Media Driver通过 UDP 接收到的总字节数,不包括 IP headers。如果该数据没有按照应用程序预期的速度增加,则说明出了问题。
2向Media Driver's Receiver Proxy发出的请求失败;这表明存在背压
3向Media Driver's Sender Proxy发出的请求失败;这表明存在背压
4向Media Driver's Conductor Proxy发出的请求失败;这表明存在背压
5发送 NAK 的总数。这是该Media Driver为请求丢失数据包而发送 NAK 的次数。
6收到的 NAK 总数。这是该Media Driver收到 NAK 的次数,以便向远程Media Driver重放丢失的数据包。
7已发送的状态信息(Status Messages sent.)。这是该Media Driver为流量控制而发送的状态信息数量的运行计数。随着时间的推移,该计数应该会增加。
8收到的状态信息(Status Messages received.)。这是该Media Driver接收到的用于流量控制的状态信息数量的运行计数。随着时间的推移,该计数应该会增加。
9已发送心跳(Heartbeats sent.)。这是当没有数据可发送时,该Media Driver为向另一个Media Driver显示有效性而发送的心跳次数。随着时间的推移,该计数应该会增加。
10收到的心跳(Heartbeats received.)。这是当没有数据可发送时,该Media Driver从另一个Media Driver接收到的心跳次数。随着时间的推移,该计数应该会增加。
11已发送的重传。这是该Media Driver因 NAK 消息而发送的数据包重传次数。在一个健康的网络中(以及运行良好的进程中),该值通常为零或很低。(Retransmits sent. This is how many packet retransmits have been sent by this Media Driver as a result of a NAK message. This will typically stay zero or very low in a healthy network (and with well behaved processes).)
12流量控制不足。这是在当前流量控制窗口下运行的数据包计数。(Flow control under runs. This is the count of packets which under-run the current flow control window for Images)
13流量控制超时。这是超过当前流量控制窗口的数据包计数。(Flow control over runs. This is the count of packets which over-run the current flow control window for Images)
14该Media Driver接收到的无效数据包计数(Count of invalid packets received by this Media Driver)
15该Media Driver观察到的错误计数。ErrorStat(见下文)将提供详细信息。(Count of errors observed by this Media Driver. ErrorStat (see below) will provide details.)
16短发送计数。当Media Driver's Sender代理希望通过网络发送给定缓冲区的数据,但套接字没有从缓冲区中获取所有数据时,就会发生短发送。通常情况下,Aeron 会对此进行恢复。当这种情况增加到一个较低的数字后,要解决的问题就会变得复杂,原因可能是缓冲区大小不正确,也可能是网络设备故障。首先要查看的通常是网络缓冲区大小的设置:aeron.socket.so_rcvbufaeron.socket.so_sndbufaeron.rcv.initial.window.length 必须小于或等于 aeron.socket.so_rcvbuf。正确调整大小是一门艺术,在 RTT 差异较大的网络中尤其具有挑战性。另请参阅 Bandwidth Delay Product。注意:您可能需要更新操作系统中的最大套接字缓冲区大小。(Short send count. A short send happens when the Media Driver's Sender agent expects to send a given buffer of data over the network, but the socket did not take all the data from the buffer. Typically, Aeron will recover from this. When this increases beyond a low number, it can be a complex problem to solve with causes ranging from incorrect buffer sizing to network equipment failure. The first place to look is typically the settings for the network buffer sizes: aeron.socket.so_rcvbuf and aeron.socket.so_sndbuf. aeron.rcv.initial.window.length must be less than or equal to aeron.socket.so_rcvbuf. Correct sizing can be an art, and can be especially challenging in a network with a large RTT variance. See also Bandwidth Delay Product. Note: you may need to update maximum socket buffer sizes in your operating system.)
17Media Driver无法释放日志缓冲区的次数(The number of times the Media Driver could not free a log buffer)
18所有流的背压事件总数。See also Back pressure(Total number of back-pressure events over all streams. See also Back pressure
19客户端在超时时间内commit() or abort() a tryClaim 失败后,publication 被解除阻塞的次数(see Publication TryClaim and Log Buffer Unblocking)。(Count of times a publication has been unblocked after a client failed to commit() or abort() a tryClaim within timeout (see Publication TryClaim and Log Buffer Unblocking))
20客户未能在超时内完成offer后,命令被解除锁定的次数(Count of times a command has been unblocked after a client failed to complete an offer within a timeout)
21通道端点检测到其配置与连接之间可能存在 TTL 不对称的次数(The number of times a channel endpoint detected a possible TTL asymmetry between its config and a connection)
23这是在禁用 NAK 时填补损失缺口的次数(This is the number of times a loss gap has been filled when NAKs have been disabled)
24在未优雅关闭的情况下超时的 Aeron 客户端数量(如该Media Driver的 Aeron 客户端)。(The number of Aeron clients that have timed out without a graceful close (as in Aeron clients of this Media Driver))
25端点重新解析(即名称解析name resolution)导致变更的次数(The number of times the endpoints have been re-resolved (i.e. name resolution) resulting in a change)
26conductor工作周期的最大时间(纳秒)。Found in Aeron 1.33.0+(The maximum time taken in a conductor duty cycle in nanoseconds. Found in Aeron 1.33.0+)
27conductor工作周期时间超过可配置阈值(默认为 1 秒)的次数。Found in Aeron 1.33.0+(The number of times the time spent in a conductor duty cycle exceeded a configurable threshold (1s default). Found in Aeron 1.33.0+)
28sender工作周期的最长时间(纳秒)。(The maximum time taken in a sender duty cycle in nanoseconds.)
29sender工作周期时间超过可配置阈值(默认为 1 秒)的次数。(The number of times the time spent in a sender duty cycle exceeded a configurable threshold (1s default).)
30receiver工作周期的最长时间(纳秒)。(The maximum time taken in a receiver duty cycle in nanoseconds.)
31receiver工作周期时间超过可配置阈值(默认为 1 秒)的次数。(The number of times the time spent in a receiver duty cycle exceeded a configurable threshold (1s default).)
32Name Resolution所需的最长时间(纳秒)。Found in Aeron 1.42.0+(The maximum time taken for Name Resolution in nanoseconds. Found in Aeron 1.42.0+)
33Name Resolution所用时间超过可配置阈值的次数。Found in Aeron 1.42.0+(The number of times the time spent in Name Resolution exceeded a configurable threshold. Found in Aeron 1.42.0+)

Variable Counters

RowDescription
36 in above example; varies来自指定客户端的最后一次客户端心跳的毫秒值。此处的客户端是Media Driver上的 Aeron 客户端。(Epoch millisecond value of the last client heartbeat from the given client. The client in this context is the Aeron Client on the Media Driver.)
52 in above example; variesReceive channel
53 in above example; variesReceive socket address
54 in above example; variesSend channel
55 in above example; variesSend socket address

第 31 至 45 行包含位置值。有关如何理解这些值的更多信息,请参阅 Understanding Aeron Position。带有 @ 的行,如第 32 行中的 sub-pos,指的是订阅的连接位置—在本例中,订阅在位置 0 处连接。

注:Aeron Stat 工具有一个 C 语言版本。它是用 C Media Driver编译和构建的。See C Media Driver.

AeronStat options

ArgDescription
-hShows the help text
watch=true or false如果设置为 true,则每 n 秒刷新一次。如果设置为 false,则运行一次后退出。默认为 true。(If set to true, refreshes every n seconds. If set to false, runs once and exits. Defaults to true.)
delay=seconds指定刷新输出的频率。更新间隔的延迟时间(以秒为单位)。仅当 watch=true 时有效(或未指定 watch)(Specifies how often to refresh the output. Delay in seconds between update. Valid only if watch=true (or watch not specified))
stream={regex}只过滤与 regex 匹配的数据流。例如:stream=101(Filters streams to only those that match the regex. Example: stream=101)
type={regex}筛选输出类型(如计数器类型),只筛选符合以下条件的类型(Filters output type (as in the counter type) to only those that match)
session={regex}Filters sessions to only those that match
channel={regex}Filters channels to only those that match
identity={regex}Filters identity to only those that match

 二、Error Stat

Error Stat 可打印 Aeron 进程中出现的所有错误。与 AeronStat 一样,您必须将 ErrorStat 指向Media Driver目录。

java -cp aeron-all-*.jar -Daeron.dir=/dev/shm/md io.aeron.samples.ErrorStat

当一切按预期运行时,错误统计将产生以下输出:

0 distinct errors observed.

Note: There is a C version of the Error Stat tool. It's compiled and built with the C Media Driver. See C Media Driver.

三、Stream Stat 

Stream Stat 位于 Aeron samples 目录中,可从 aeron-all jar 启动,如下所示。与 AeronStat 一样,必须将 StreamStat 指向Media Driver 目录。

java -cp aeron-all-*.jar -Daeron.dir=/dev/shm/md io.aeron.samples.StreamStat

 Stream stat 提供了媒Media Driver中每个流的视图,包括publisher和sender视图。该视图与 aeron stat 很相似,只是视图是扁平的。为便于在页面上显示,单行 2 被分成下面的第 2-10 行。

Command `n Control file /dev/shm/md/cnc.dat
sessionId=-1245628686 streamId=10 
 channel=aeron:udp?endpoint=localhost:40123 : 
 pub-pos (sampled):3:320 
 pub-lmt:3:8388992 
 snd-pos:3:384 
 snd-lmt:3:131456 
 sub-pos:1:384 
 rcv-hwm:4:384 
 rcv-pos:4:384

四、Backlog Stat

Backlog Stat 是一款突出显示数据流积压情况的工具。它可在 IPC 和 UDP 通道上运行。与 AeronStat 一样,您必须将 BacklogStat 指向Media Driver目录。

java -cp aeron-all-*.jar -Daeron.dir=/dev/shm/md io.aeron.samples.BacklogStat

Sample output:

sessionId=1155221173 streamId=8 channel=aeron:udp?endpoint=10.1.1.1:4000 :
┌─for publisher 77 the last sampled position is 187392 (~0 bytes before back-pressure)
└─sender 77 has to send 0 bytes (2031779 butes remaining in the sender window)

sessionId=-614368527 streamId=9 channel=aeron:udp?endpoint=10.1.1.1:4001 :
┌─for publisher 6333 the last sampled position is 12739208 (~0 bytes before back-pressure)
└─sender 6333 has to send 65373 bytes (2031779 butes remaining in the sender window)

该工具可突出显示指定通道中的数据积压问题。在上面运行的示例中,顶部会话没有积压数据,而底部会话有 65373 字节的未清积压数据。利用这些信息调查网络、进程和/或设计(network, process and/or design)问题。

五、Loss Stat

LossStat 会记录 Aeron 遭受的所有数据丢失事件。请注意,IPC 数据不会丢失,也不会出现在 LossStat 中。与 AeronStat 一样,您必须将 LossStat 指向Media Driver目录。

java -cp aeron-all-*.jar -Daeron.dir=/dev/shm/md io.aeron.samples.LossStat

An example run:

#OBSERVATION_COUNT,TOTAL_BYTES_LOST,FIRST_OBSERVATION,LAST_OBSERVATION,SESSION_ID,STREAM_ID,CHANNEL,SOURCE
688,4167028,2020-08-16 13:53:39.053+0000,2020-08-16 13:53:41.003+0000,1155221173,8,aeron:udp?endpoint=10.1.1.1:4000;10.1.1.2:60950

这将告诉我们以下有关流 8 ⤌⤍ 10.1.1.2:60950 流量上通道 aeron:udp?endpoint=10.1.1.1:4000 的会话 1155221173 的信息:

  • there were 688 data loss events
  • 共影响 4,167,028 个字节
  • the loss first happened at 2020-08-16 16:53:39.053+0000
  • the last loss happened at 2020-08-16 16:53:41.003+0000

有了这些信息,您就可以在这些时间段内调查任何网络或主机问题。请注意,少量损失是相当常见的。

Note: There is a C version of the Loss Stat tool. It's compiled and built with the C Media Driver. See C Media Driver.

六、Log Inspector

Log Inspector 位于 Aeron samples 文件夹中,可从 aeron-all jar 启动,如下所示。您必须将Log Inspector 直接指向一个 LogBuffer 文件。

java -cp aeron-all-*.jar io.aeron.samples.LogInspector <logbuffer file>

日志检查器(Log Inspector )允许我们检查日志缓冲区(Log Buffer )文件,包括:

  • if the log buffer is connected
  • log buffer经过了多少term(how many terms the log buffer has been through (see Log Buffers & Images))
  • log buffer中3个term的状态(the state of the 3 terms in the log buffer)
  • 和术语(term)内的数据,以十六进制转储。其中包括产生数据的会话和数据流的详细信息。(and the data within a term, dumped as hex. This includes details on which session and stream produced the data.)
======================================================================
Thu Dec 31 09:46:19 EST 2020 Inspection dump for 3.logbuffer
======================================================================
   Is Connected: true
Initial term id: -1822262504
     Term Count: 20
   Active index: 2
    Term length: 67108864
     MTU length: 1408
      Page Size: 4096
   EOS Position: 9223372036854775807

default DATA Header{frame-length=0 version=0 flags=11000000 type=1 term-offset=0 session-id=301746870 stream-id=10 term-id=-1822262504 reserved-value=0}

Index 0 Term Meta Data termOffset=67108928 termId=-1822262486 rawTail=-7826557782030548928 position=1275068416
Index 1 Term Meta Data termOffset=67108928 termId=-1822262485 rawTail=-7826557777735581632 position=1342177280
Index 2 Term Meta Data termOffset=1822720 termId=-1822262484 rawTail=-7826557773505900544 position=1344000000

======================================================================
Index 0 Term Data

0: DATA Header{frame-length=0 version=0 flags=00000000 type=0 term-offset=0 session-id=0 stream-id=0 term-id=0 reserved-value=0}
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

======================================================================
Index 1 Term Data

0: DATA Header{frame-length=0 version=0 flags=00000000 type=0 term-offset=0 session-id=0 stream-id=0 term-id=0 reserved-value=0}
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

======================================================================
Index 2 Term Data

0: DATA Header{frame-length=36 version=0 flags=11000000 type=1 term-offset=0 session-id=301746870 stream-id=10 term-id=-1822262484 reserved-value=0}
02004001
64: DATA Header{frame-length=36 version=0 flags=11000000 type=1 term-offset=64 session-id=301746870 stream-id=10 term-id=-1822262484 reserved-value=0}
03004001
...

 

  • 10
    点赞
  • 28
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值