Flink

Flink简介

Apache Flink是一个框架和分布式处理引擎,用于对无界和有界数据流进行有状态计算。Flink设计为在所有常见的集群环境中运行,以内存速度和任何规模执行计算。

官网:https://flink.apache.org/
源码:https://github.com/apache/flink

Flink特点

  1. 流处理特性
    (1)支持高吞吐、低延迟、高性能的流处理
    (2)支持带有事件时间的窗口(Window)操作
    (3)支持有状态计算的Exactly-once语义
    (4)支持高度灵活的窗口(Window)操作,支持基于time、count、session,以及data-driven的窗口操作
    (5)支持具有Backpressure功能的持续流模型
    (6)支持基于轻量级分布式快照(Snapshot)实现的容错
    (7)运行时同时支持Batch on Streaming处理和Streaming处理
    (8) Flink在JVM内部实现了自己的内存管理
    (9)支持迭代计算
    (10)支持程序自动优化:避免特定情况下Shuffle、排序等昂贵操作,中间结果有必要进行缓存

  2. API支持
    (1)对Streaming数据类应用,提供DataStream API
    (2)对批处理类应用,提供DataSet API(支持Java/Scala)

  3. Libraries支持
    支持机器学习(FlinkML)、支持图分析(Gelly)、支持关系数据处理(Table)、支持复杂事件处理(CEP)

  4. 整合支持
    支持Flink on YARN、HDFS、Kafka的输入数据、Apache HBase、Hadoop程序、Tachyon、ElasticSearch、RabbitMQ、Apache Storm、S3、XtreemFS。

  5. 随处部署应用程序
    Apache Flink是一个分布式系统,需要计算资源才能执行应用程序。Flink与所有常见的集群资源管理器(如Hadoop YARN,Apache Mesos和Kubernetes)集成,但也可以设置为作为独立集群运行。

  6. 以任何比例运行应用程序
    Flink旨在以任何规模运行有状态流应用程序。应用程序可以并行化为数千个在集群中分布和同时执行的任务。因此,应用程序可以利用几乎无限量的CPU,主内存,磁盘和网络IO。而且,Flink可以轻松维护非常大的应用程序状态。其异步和增量检查点算法确保对处理延迟的影响最小,同时保证一次性状态一致性。

Storm、Spark、Flink对比

吞吐量

spark是mirco-batch级别的计算,各种优化做的也很好,它的throughputs是最大的。但是需要提一下,有状态计算(如updateStateByKey算子)需要通过额外的rdd来维护状态,导致开销较大,对吞吐量影响也较大。

storm的容错机制需要对每条data进行ack,因此容错开销对throughputs影响巨大,throughputs下降甚至可以达到70%。storm trident是基于micro-batch实现的,throughput中等

flink的容错机制较为轻量,对throughputs影响较小,而且拥有图和调度上的一些优化机制,使得flink可以达到很高 throughputs。

下图是flink官网给出的storm和flink的对比图,我们可以看出storm在打开ack容错机制后,throughputs下降非常明显。而flink在开启checkpoint和关闭的情况下throughputs变化不大,说明flink的容错机制确实代价不高。
在这里插入图片描述

延迟

spark基于micro-batch实现,提高了throughputs,但是付出了latency的代价。一般spark的latency是秒级别的。

storm是native streaming实现,可以轻松的达到几十毫秒级别的latency,在几款框架中它的latency是最低的。storm trident是基于micro-batch实现的,latency较高。

flink也是native streaming实现,也可以达到百毫秒级别的latency。

下图是flink官网给出的和storm的latency对比benchmark。storm可以达到平均5毫秒以内的latency,而flink的平均latency也在30毫秒以内。两者的99%的data都在55毫秒latency内处理完成,表现都很优秀。
在这里插入图片描述

监控方案

集群监控

进程存在性监控

Flink进程分为JobManager(StandaloneSessionClusterEntrypoint)、和TaskManager。可通过脚本,分别监控各进程是否存在。

集群进程性能监控

Flink官方提供了Prometheus 的监控方案,通过修改flink/conf/flink-conf.yaml文件,添加如下配置信息,

# 使用PrometheusReporter类对外提供监控数据
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
# 设置对外提供监控数据的接口,默认为9249,可设置端口范围
metrics.reporter.prom.port: 9249-9250

在Prometheus的yml采集配置中添加如下内容进行采集:

# List填写Flink进程和监控端口,label标签根据如下添加
- targets: ['30.0.0.20:9049','30.0.0.21:9049']
  labels:
    clusterName:'FlinkCluster001'
    job:'flink'

任务监控

任务存在性监控

Flink任务可分为批处理任务流处理任务。通过bin/flink list命令可以查看当前运行的任务。

  1. 流处理任务会一直处于运行状态,可以使用脚本,通过调用bin/flink list命令查看当前运行的任务,监控任务是否存在。
  2. 批处理任务,在运行结束后退出,退出后bin/flink list命令将无法查看到任务,可以从调度上解决,将批处理任务的调用由crontab,改为程序调度,这样,向Flink提交批处理任务的CliFrontend进程会一直存在,由CliFrontend通过监控CliFrontend进程达到监控批处理任务的存在性。

业务监控

Flink作为数据处理引擎,其任务功能离不开数据的输入和输出,可以结合任务实际业务,对输入、输出数据量进行监控。

指标介绍

指标类型

Flink支持Counters, Gauges, Histograms 和 Meters四种指标类型。

Counter

Counter用于计数。

Gauge

Gauge根据需要提供任何类型的值。

Histogram

Histogram衡量长值的分布。

Meter

Meter衡量平均吞吐量。

指标范围(scope)

当上报metric时,metric被打上了标识符,和一系列的key-value对。

该标识符基于3个组成部分:注册度量标准时的用户定义名称,可选的用户定义范围和系统提供的范围。例如,如果A.B是系统范围,C.D用户范围和E名称,则指标的标识符将为A.B.C.D.E。

该标识符由3个部分组成:注册指标时用户定义的名称,可选的用户定义的范围(scope)和系统提供的范围(scope)。例如,如果A.B是系统提供的范围,C.D用户定义的范围,E是用户定义的名称,则指标的标识符将为A.B.C.D.E。

可以通过conf/flink-conf.yaml配置文件的metrics.scope.delimiter配置项调整标识符的分隔符,默认为为.

指标清单

CPU

ScopeInfixMetricsDescriptionType
Job-/TaskManagerStatus.JVM.CPULoad当前JVM的CPU使用率Gauge
Time The CPU time used by the JVM. Gauge

内存

ScopeInfixMetricsDescriptionType
Job-/TaskManagerStatus.JVM.MemoryHeap.UsedThe amount of heap memory currently used (in bytes).Gauge
Heap.CommittedThe amount of heap memory guaranteed to be available to the JVM (in bytes).
JVM申请内存大小
Gauge
Heap.MaxThe maximum amount of heap memory that can be used for memory management (in bytes).
可用于内存管理的最大heap内存
Gauge
NonHeap.UsedThe amount of non-heap memory currently used (in bytes).
当前被使用的non-heap内存
Gauge
NonHeap.CommittedThe amount of non-heap memory guaranteed to be available to the JVM (in bytes).Gauge
NonHeap.MaxThe maximum amount of non-heap memory that can be used for memory management (in bytes).Gauge
Direct.CountThe number of buffers in the direct buffer pool.
直接缓存池的缓存数
Gauge
Direct.MemoryUsedThe amount of memory used by the JVM for the direct buffer pool (in bytes).
JVM使用掉的直接缓存池内存大小,单位byte
Gauge
Direct.TotalCapacityThe total capacity of all buffers in the direct buffer pool (in bytes).
直接缓存池总容量,单位byte
Gauge
Mapped.CountThe number of buffers in the mapped buffer pool.Gauge
Mapped.MemoryUsedThe amount of memory used by the JVM for the mapped buffer pool (in bytes).Gauge
Mapped.TotalCapacityThe number of buffers in the mapped buffer pool (in bytes).Gauge

注:

  1. UsedHeap、MaxHeap、CommittedHeap区别,参见文章:
    https://www.baeldung.com/java-heap-used-committed-max
  2. direct buffer pool、mapped buffer pool介绍,参见文章:https://stackoverflow.com/questions/15657837/what-is-mapped-buffer-pool-direct-buffer-pool-and-how-to-increase-their-size

线程

ScopeInfixMetricsDescriptionType
Job-/TaskManagerStatus.JVM.ThreadsCountThe total number of live threads.Gauge

垃圾回收

ScopeInfixMetricsDescriptionType
Job-/TaskManagerStatus.JVM.GarbageCollector<GarbageCollector>.CountThe total number of collections that have occurred.Gauge
<GarbageCollector>.TimeThe total time spent performing garbage collection.Gauge

类加载(ClassLoader)

ScopeInfixMetricsDescriptionType
Job-/TaskManagerStatus.JVM.ClassLoaderClassesLoadedThe total number of classes loaded since the start of the JVM.Gauge
ClassesUnloadedThe total number of classes unloaded since the start of the JVM.Gauge

网络

ScopeInfixMetricsDescriptionType
TaskManagerStatus.NetworkAvailableMemorySegmentsThe number of unused memory segments.Gauge
TotalMemorySegmentsThe number of allocated memory segments.Gauge
TaskbuffersinputQueueLengthThe number of queued input buffers. (ignores LocalInputChannels which are using blocking subpartitions)Gauge
outputQueueLengthThe number of queued output buffers.Gauge
inPoolUsageAn estimate of the input buffers usage. (ignores LocalInputChannels)Gauge
inputFloatingBuffersUsageAn estimate of the floating input buffers usage, dedicated for credit-based mode. (ignores LocalInputChannels)Gauge
inputExclusiveBuffersUsageAn estimate of the exclusive input buffers usage, dedicated for credit-based mode. (ignores LocalInputChannels)Gauge
outPoolUsageAn estimate of the output buffers usage.Gauge
Network.<Input|Output>.<gate|partition>
(only available if taskmanager.net.detailed-metrics config option is set)
totalQueueLenTotal number of queued buffers in all input/output channels.Gauge
minQueueLenMinimum number of queued buffers in all input/output channels.Gauge
maxQueueLenMaximum number of queued buffers in all input/output channels.Gauge
avgQueueLenAverage number of queued buffers in all input/output channels.Gauge

注:
Flink内存管理机制参见:https://blog.csdn.net/lvwenyuan_1/article/details/103404591

Default shuffle service

ScopeInfixMetricsDescriptionType
TaskManagerStatus.Shuffle.NettyAvailableMemorySegmentsThe number of unused memory segments.Gauge
TotalMemorySegmentsThe number of allocated memory segments.Gauge
TaskShuffle.Netty.Input.BuffersinputQueueLengthThe number of queued input buffers.Gauge
inPoolUsageAn estimate of the input buffers usage.Gauge
Shuffle.Netty.Output.BuffersoutputQueueLengthThe number of queued output buffers.Gauge
outPoolUsageAn estimate of the output buffers usage.Gauge
Shuffle.Netty.<Input|Output>.<gate|partition>
(only available if taskmanager.net.detailed-metrics config option is set)
totalQueueLenTotal number of queued buffers in all input/output channels.Gauge
minQueueLenMinimum number of queued buffers in all input/output channels.Gauge
maxQueueLenMaximum number of queued buffers in all input/output channels.Gauge
avgQueueLenAverage number of queued buffers in all input/output channels.Gauge
TaskShuffle.Netty.InputnumBytesInLocalThe total number of bytes this task has read from a local source.Counter
numBytesInLocalPerSecondThe number of bytes this task reads from a local source per second.Meter
numBytesInRemoteThe total number of bytes this task has read from a remote source.Counter
numBytesInRemotePerSecondThe number of bytes this task reads from a remote source per second.Meter
numBuffersInLocalThe total number of network buffers this task has read from a local source.Counter
numBuffersInLocalPerSecondThe number of network buffers this task reads from a local source per second.Meter
numBuffersInRemoteThe total number of network buffers this task has read from a remote source.Counter
numBuffersInRemotePerSecondThe number of network buffers this task reads from a remote source per second.Meter

注:
Job、Task、Subtask定义参见:https://stackoverflow.com/questions/53610342/difference-between-job-task-and-subtask-in-flink

集群

ScopeMetricsDescriptionType
JobManagernumRegisteredTaskManagersThe number of registered taskmanagers.Gauge
numRunningJobsThe number of running jobs.Gauge
taskSlotsAvailableThe number of available task slots.Gauge
taskSlotsTotalThe total number of task slots.Gauge

可用性

ScopeMetricsDescriptionType
Job (only available on JobManager)restartingTimeThe time it took to restart the job, or how long the current restart has been in progress (in milliseconds).Gauge
uptime The time that the job has been running without interruption. Returns -1 for completed jobs (in milliseconds). Gauge
downtime For jobs currently in a failing/recovering situation, the time elapsed during this outage. Returns 0 for running jobs and -1 for completed jobs (in milliseconds). Gauge
fullRestarts The total number of full restarts since this job was submitted. Attention: Since 1.9.2, this metric also includes fine-grained restarts. Gauge

CheckPointing

ScopeMetricsDescriptionType
Job (only available on JobManager)lastCheckpointDurationThe time it took to complete the last checkpoint (in milliseconds).Gauge
lastCheckpointSizeThe total size of the last checkpoint (in bytes).Gauge
lastCheckpointExternalPathThe path where the last external checkpoint was stored.Gauge
lastCheckpointRestoreTimestampTimestamp when the last checkpoint was restored at the coordinator (in milliseconds).Gauge
lastCheckpointAlignmentBufferedThe number of buffered bytes during alignment over all subtasks for the last checkpoint (in bytes).Gauge
numberOfInProgressCheckpointsThe number of in progress checkpoints.Gauge
numberOfCompletedCheckpointsThe number of successfully completed checkpoints.Gauge
numberOfFailedCheckpointsThe number of failed checkpoints.Gauge
totalNumberOfCheckpointsThe number of total checkpoints (in progress, completed, failed).Gauge
TaskcheckpointAlignmentTimeThe time in nanoseconds that the last barrier alignment took to complete, or how long the current alignment has taken so far (in nanoseconds).Gauge

RocksDB

IO

ScopeMetricsDescriptionType
Job (only available on TaskManager)<source_id>.<source_subtask_index>.<operator_id>.<operator_subtask_index>.latencyThe latency distributions from a given source subtask to an operator subtask (in milliseconds).Histogram
TasknumBytesInLocalAttention: deprecated, use Default shuffle service metrics.Counter
numBytesInLocalPerSecondAttention: deprecated, use Default shuffle service metrics.Meter
numBytesInRemoteAttention: deprecated, use Default shuffle service metrics.Counter
numBytesInRemotePerSecondAttention: deprecated, use Default shuffle service metrics.Meter
numBuffersInLocalAttention: deprecated, use Default shuffle service metrics.Counter
numBuffersInLocalPerSecondAttention: deprecated, use Default shuffle service metrics.Meter
numBuffersInRemoteAttention: deprecated, use Default shuffle service metrics.Counter
numBuffersInRemotePerSecondAttention: deprecated, use Default shuffle service metrics.Meter
numBytesOutThe total number of bytes this task has emitted.Counter
numBytesOutPerSecondThe number of bytes this task emits per second.Meter
numBuffersOutThe total number of network buffers this task has emitted.Counter
numBuffersOutPerSecondThe number of network buffers this task emits per second.Meter
Task/OperatornumRecordsInThe total number of records this operator/task has received.Counter
numRecordsInPerSecondThe number of records this operator/task receives per second.Meter
numRecordsOutThe total number of records this operator/task has emitted.Counter
numRecordsOutPerSecondThe number of records this operator/task sends per second.Meter
numLateRecordsDroppedThe number of records this operator/task has dropped due to arriving late.Counter
currentInputWatermark The last watermark this operator/tasks has received (in milliseconds).

Note: For operators/tasks with 2 inputs this is the minimum of the last received watermarks.

Gauge
OperatorcurrentInput1Watermark The last watermark this operator has received in its first input (in milliseconds).

Note: Only for operators with 2 inputs.

Gauge
currentInput2Watermark The last watermark this operator has received in its second input (in milliseconds).

Note: Only for operators with 2 inputs.

Gauge
currentOutputWatermark The last watermark this operator has emitted (in milliseconds). Gauge
numSplitsProcessedThe total number of InputSplits this data source has processed (if the operator is a data source).Gauge

连接器(Connector)

Kafka Connectors

ScopeMetricsUser VariablesDescriptionType
OperatorcommitsSucceededn/aThe total number of successful offset commits to Kafka, if offset committing is turned on and checkpointing is enabled.Counter
OperatorcommitsFailedn/aThe total number of offset commit failures to Kafka, if offset committing is turned on and checkpointing is enabled. Note that committing offsets back to Kafka is only a means to expose consumer progress, so a commit failure does not affect the integrity of Flink's checkpointed partition offsets.Counter
OperatorcommittedOffsetstopic, partitionThe last successfully committed offsets to Kafka, for each partition. A particular partition's metric can be specified by topic name and partition id.Gauge
OperatorcurrentOffsetstopic, partitionThe consumer's current read offset, for each partition. A particular partition's metric can be specified by topic name and partition id.Gauge

Kinesis Connectors

ScopeMetricsUser VariablesDescriptionType
OperatormillisBehindLateststream, shardIdThe number of milliseconds the consumer is behind the head of the stream, indicating how far behind current time the consumer is, for each Kinesis shard. A particular shard's metric can be specified by stream name and shard id. A value of 0 indicates record processing is caught up, and there are no new records to process at this moment. A value of -1 indicates that there is no reported value for the metric, yet. Gauge
OperatorsleepTimeMillisstream, shardIdThe number of milliseconds the consumer spends sleeping before fetching records from Kinesis. A particular shard's metric can be specified by stream name and shard id. Gauge
OperatormaxNumberOfRecordsPerFetchstream, shardIdThe maximum number of records requested by the consumer in a single getRecords call to Kinesis. If ConsumerConfigConstants.SHARD_USE_ADAPTIVE_READS is set to true, this value is adaptively calculated to maximize the 2 Mbps read limits from Kinesis. Gauge
OperatornumberOfAggregatedRecordsPerFetchstream, shardIdThe number of aggregated Kinesis records fetched by the consumer in a single getRecords call to Kinesis. Gauge
OperatornumberOfDeggregatedRecordsPerFetchstream, shardIdThe number of deaggregated Kinesis records fetched by the consumer in a single getRecords call to Kinesis. Gauge
OperatoraverageRecordSizeBytesstream, shardIdThe average size of a Kinesis record in bytes, fetched by the consumer in a single getRecords call. Gauge
OperatorrunLoopTimeNanosstream, shardIdThe actual time taken, in nanoseconds, by the consumer in the run loop. Gauge
OperatorloopFrequencyHzstream, shardIdThe number of calls to getRecords in one second. Gauge
OperatorbytesRequestedPerFetchstream, shardIdThe bytes requested (2 Mbps / loopFrequencyHz) in a single call to getRecords. Gauge

操作系统资源(System resources)

操作系统资源相关指标,默认是关闭不采集的。

监控 Checkpoint

Flink的web接口提供一个窗口用于监控任务的checkpoint,这些数据在任务被终止后仍然可用。这里提供了四个不同的窗口展示checkpoint信息,分别是Overview, History, Summary, 和 Configuration。下面将依次讲解。
在这里插入图片描述

Overview

Overview 窗口列出了如下这些数据。如果 JobManager 进程挂了,这些数据将丢失。

  • Checkpoint Counts
    • Triggered:Job启动后,被触发的 checkpoint 总数。
    • In Progress:程序中的 checkpoint 总数。
    • Completed:Job启动后,成功完成的 checkpoint 总数。
    • Failed:Job启动后,失败的 checkpoint 总数。
    • Restored:Job启动后,恢复的 checkpoint 数。这个指标同事反映了Job提交后,被重新启动的次数。需要注意,带 savepoint 的首次提交,也被记做一次恢复。同时,如果 JobManager 挂了,计数将被重置。
  • Latest Completed Checkpoint:最后一个成功完成的 checkpoint。点击它,可以获取到 subtask 级别的详细的数据。
  • Latest Failed Checkpoint:最后一个失败的 checkpoint。点击它,可以获取到 subtask 级别的详细的数据。
  • Latest Savepoint:通过外部途径,最后一次触发 savepoint 。点击它,可以获取到 subtask 级别的详细的数据。
  • Latest Restore:这里有两种恢复操作。
    • Restore from Checkpoint:从常规的、周期性的 checkpoint 恢复。
    • Restore from Savepoint:从 savepoint 恢复。

History

Summary

Configuration

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值