For kafka:
topics are partitioned into partitions by key;
partitions are on brokers; each broker can hold partitions from different topics
each consumers group hold different consumers, each consumer receives data from multiple partitions;
each producer/topic writes to multiple partitions.
For yarn:
NodeManager, which is responsible for launching processes on that machine
ResourceManager talks to all of the NodeManagers to tell them what to run
ApplicationMaster, is actually application-specific code that runs in the YARN cluster
Samza supports 2 kinds of processing:
stateless processing: does not retain any state associated with the current message after it has been processed
stateful processing: requires you to record some state about a message even after processing it
Samza supports two notions of time: processing time and embedded source time
Samza guarantee each record is processed at least once
Samza's cordinator supports both embedded library model(kafka) and framework model(flink).
Samza supports both in-order and out-of-order processing.
Each thread runs one or more tasks
reference:http://samza.apache.org/learn/documentation/latest/core-concepts/core-concepts.html