data pipelines ,streaming analytics,data integration,mission-critical application
storing these event stream durably for later retrieval ; manipulating,processing and reacting to the event streams to different destination technologies as neeeded
industries
event streaming platform , mean
- publish(write) and subscribe to (read),continuous import/export of your data
- store stream
- process stream occur or retrospective
kafka work
1.servers and client communicate tcp
server
2.server is run that span multiple datacenters or cloud regions. brokers is some of servers from storage layer
3.server run kafka connect import an export data as envent strams to integrate with existing systems
4.servers fails work continuous operations
export
fakfa connect
- kafka connect is tool between kafka and other systems . can ingest databse or collect metrics from all your allcation servers into kafka topic . low latency. export job can deliver topics into secondary storage and quey systems or into batch system for offline analysis.
client - this allow you read write and process sterams of event in parallel.it aygmented by clinet by commity of fault-toleraant manner
main concepts and terminology(概念和 术语)
event record the fact 。 form event key ,value ,timstamp and metadata
Event key: “Alice”
Event value: “Made a payment of $200 to Bob”
Event timestamp: “Jun. 25, 2020 at 2:06 p.m.”
producers are publish(write) ,consumers are subscribe to (read and process).
producers and consumer are fully decoupled an agnostic of each otherm which is a key design element to achieve the hign scalability that various guarantees
message delivery semantics
the semantic guarantees kafka provides between producer and consumer
- At most once—Messages may be lost but are never redelivered.
- At least once—Messages are never lost but may be redelivered.
- Exactly once—this is what people actually want, each message is delivered once and only once.
as long as zhiyao
pushlishing a meassage being “committed” to the log .
event are organized and durably stored in topics . topic is similar to folder in a filesystem and event are the files in that folder. event throught a per-topic configuration setting
topic are partioned topic spread over a number of “bucket” located on different brokers
distubute placement of data is important for scalabiltity beacause both write and reand the data form mant brokers at the same time . kafka guarantees that consume of g given topic-partition will always read that partition’s events in exactly the same order as they were written
every topic cdan relicated ,event across geo-regions or datacenters ,so that there are always multiple brokers that have a copy of the data just in case things go wrong
a common producation setting is replication factor of 3, there will always be three copies of your data, the relication is performed at the level of topic -partitions.