Streaming Systems

1 Streaming


picture

What is streaming

Streaming System
two impport dimensions that define the shape of a given dataset

cardinality and constituion

the cardinality of a dataset dicates its size
the most salient aspect of cardinality being where a given dataset is finite or infinite
coarse cardinality in a dataset
Bounded data A type of dataset that is finite in size
Unbounded data
Cardinality imposes additional burdens on consumer
Constitution dictates its physical manifestation
Table
Stream
it’s the constitution pipeline developers directly interact with in most data processing systems today(both batch and steaming)

constitution |how something is made up of different parts

On the Greatly Exaggerated Limitations Of Streaming

Lambda Architecture

the basic idea is that you run a streaming system alongside a batch system,both performing essentially the same calculation.
Unfortunnately,maintaining a Lambda system is a hassle:you neede to build,provision,and maintain two independent versions of your pipline and then alse somehow merge the results from the two piplines at the end.

hassle
a situation that is annoying because it involves doing sth difficult or complicated that needs a lot of effort 困难;麻烦

As someone who spent years working on a strongly consistent streaming engine, I also found the entire principle of the Lambda Architecture a bit unsavory

unsavory > adj unpleasant, or morally offensive

corollary > noun something that results from something else

antiquity > the distant past (= a long time ago), especially before the sixth century

Event Time VS Processing Time

cogently > in a way that is clearly expressed and is likely to persuade people

To speak cogently about unbounded data processing requires a clear understanding of the domains of time involed

Event Time

This is the time at which events acutally occurred

Processing Time

This is the time at which evnets are observed in the system.

skew > verb to cause something to be not straight or exact; to twist or distort ||
adj not straight

In an ideal world,event time and processing time would always be equal,with events being processed immediately as they occur.Reality is not so kind,however,and the skew between event time and processing time is not only nonzero,but often a highly variable function of the characteristics of the underlying input sources,execution engine,and hardware.

  • Shared resouce limitations like network congestion,network partitions,
  • Software causes such as distributed system logic contention,
  • Features of the data themselves,like key distribution,variance in throughput,or variance in disorder

congestion | a situation in which a place is too blocked or crowded,causing difficulties

plot verb to mark or draw something on a piece of paper or a map
noun the story of a book, film, play, etc.

contention | the disagreement that results from opposing arguments
underlying > real but not immediately obvious
在这里插入图片描述

Data Processing Patterns

Bounded Data

Unbounded Data Batch

Unbounded Data Streaming

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值