1. bootstrap-server vs zookeeper params
In the previous Kafka version (before 0.9.0), the consumer needed the connection to Zookeeper for committing the offset and for getting topics metadata as well. Starting from 0.9.0, the consumer offeset is saved in a Kafka topic (__consumer_offset) and the connection to Zookeeper is not needed anymore.
What you specify in the --bootstrap-server
parameter is exactly what the name .. says. It's a bootstrap servers list: it means that the consumer connects to brokers you specify and ask for metadata about the topics it wants to consume. It's not limited to consume messages only from brokers listed in the --bootstrap-server
parameter. Let's say you specify "kafka1:9092" as bootstrap server (in a cluster where you have 3 brokers as you said). After connecting, the consumer sends a metadata request for getting information about "my_topic". The "kafka1" server could reply "I am not the leader for partition 0 of my_topic, here the broker which is leader for that kafka2". At this point, the consumer connects to "kafka2" broker for starting to get messages.
2. What is difference between Kafka broker-list
and bootstrap servers
I also hate reading "wall of text like" Kafka documentation :P
As far as I understand:
-
broker-list
- a full list of servers, if any missing producer may not work
- related to producer commands
-
bootstrap-servers
- one is enough to discover all others
- related to consumer commands
- Zookeeper involved
Sorry for being such... brief. Next time I will focus more on details to be more clear. To explain my point of view I will use Kafka 1.0.1 console scripts.
kafka-console-consumer.sh
The console consumer is a tool that reads data from Kafka and outputs it to standard output.
Option Description
------ -----------
--blacklist <String: blacklist> Blacklist of topics to exclude from
consumption.
--bootstrap-server <String: server to REQUIRED (unless old consumer is
connect to> used): The server to connect to.
--consumer-property <String: A mechanism to pass user-defined
consumer_prop> properties in the form key=value to
the consumer.
--consumer.config <String: config file> Consumer config properties file. Note
that [consumer-property] takes
precedence over this config.
--csv-reporter-enabled If set, the CSV metrics reporter will
be enabled
--delete-consumer-offsets If specified, the consumer path in
zookeeper is deleted when starting up
--enable-systest-events Log lifecycle events of the consumer
in addition to logging consumed
messages. (This is specific for
system tests.)
--formatter <String: class> The name of a class to use for
formatting kafka messages for
display. (default: kafka.tools.
DefaultMessageFormatter)
--from-beginning If the consumer does not already have
an established offset to consume
from, start with the earliest
message present in the log rather
than the latest message.
--group <String: consumer group id> The consumer group id of the consumer.
--isolation-level <String> Set to read_committed in order to
filter out transactional messages
which are not committed. Set to
read_uncommittedto read all
messages. (default: read_uncommitted)
--key-deserializer <String:
deserializer for key>
--max-messages <Integer: num_messages> The maximum number of messages to
consume before exiting. If not set,
consumption is continual.
--metrics-dir <String: metrics If csv-reporter-enable is set, and
directory> this parameter isset, the csv
metrics will be output here
--new-consumer Use the new consumer implementation.
This is the default, so this option
is deprecated and will be removed in
a future release.
--offset <String: consume offset> The offset id to consume from (a non-
negative number), or 'earliest'
which means from beginning, or
'latest' which means from end
(default: latest)
--partition <Integer: partition> The partition to consume from.
Consumption starts from the end of
the partition unless '--offset' is
specified.
--property <String: prop> The properties to initialize the
message formatter.
--skip-message-on-error If there is an error when processing a
message, skip it instead of halt.
--timeout-ms <Integer: timeout_ms> If specified, exit if no message is
available for consumption for the
specified interval.
--topic <String: topic> The topic id to consume on.
--value-deserializer <String:
deserializer for values>
--whitelist <String: whitelist> Whitelist of topics to include for
consumption.
--zookeeper <String: urls> REQUIRED (only when using old
consumer): The connection string for
the zookeeper connection in the form
host:port. Multiple URLS can be
given to allow fail-over.
kafka-console-producer.sh
Read data from standard input and publish it to Kafka.
Option Description
------ -----------
--batch-size <Integer: size> Number of messages to send in a single
batch if they are not being sent
synchronously. (default: 200)
--broker-list <String: broker-list> REQUIRED: The broker list string in
the form HOST1:PORT1,HOST2:PORT2.
--compression-codec [String: The compression codec: either 'none',
compression-codec] 'gzip', 'snappy', or 'lz4'.If
specified without value, then it
defaults to 'gzip'
--key-serializer <String: The class name of the message encoder
encoder_class> implementation to use for
serializing keys. (default: kafka.
serializer.DefaultEncoder)
--line-reader <String: reader_class> The class name of the class to use for
reading lines from standard in. By
default each line is read as a
separate message. (default: kafka.
tools.
ConsoleProducer$LineMessageReader)
--max-block-ms <Long: max block on The max time that the producer will
send> block for during a send request
(default: 600
As you can see bootstrap-server parameter occurs only for consumer. On the other side - broker-list is on parameter list only for producer.
Moreover:
kafka-console-consumer.sh --zookeeper localost:2181 --topic bets
Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper].
So as cricket-007 noticed bootstrap-server and zookeeper looks to have similiar purpose. The difference is --zookeeper should points to Zookeeper nodes on the other side --bootstrap-server points Kafka nodes and ports.
Reasuming, bootstrap-server is being used as consumer parameter and broker-list as producer parameter.
reference: