Kafka 生产调优参数:
Producer:
acks: all
buffer.memory: 536870912
compression.type :snappy
retries: 100
max.in.flight.requests.per.connection = 1
batch.size: 10000 字节 不是条数
max.request.size = 2097152
request.timeout.ms = 360000 大于 replica.lag.time.max.ms
metadata.fetch.timeout.ms= 360000
timeout.ms = 360000
linger.ms 5s (生产不用)
max.block.ms 1800000
Broker: CDH
message.max.bytes 2560KB 1条消息的大小
zookeeper.session.timeout.ms 180000
replica.fetch.max.bytes 5M 大于message.max.bytes
num.replica.fetchers 6
replica.lag.max.messages 6000
replica.lag.time.max.ms 15000
log.flush.interval.messages 10000
log.flush.interval.ms 5s
Consumer:
https://issues.apache.org/jira/browse/SPARK-22968
, "max.partition.fetch.bytes" -> (5242880: java.lang.Integer) //default: 1048576
, "request.timeout.ms" -> (90000: java.lang.Integer) //default: 60000
, "session.timeout.ms" -> (60000: java.lang.Integer) //default: 30000
, "heartbeat.interval.ms" -> (5000: java.lang.Integer)
, "receive.buffer.bytes" -> (10485760: java.lang.Integer)
Minor changes required for Kafka 0.10 and the new consumer compared to laughing_man's answer:
Broker: No changes, you still need to increase properties message.max.bytes
and replica.fetch.max.bytes. message.max.bytes has to be equal or smaller(*) than
replica.fetch.max.bytes.
Producer: Increase max.request.size to send the larger message.
Consumer: Increase max.partition.fetch.bytes to receive larger messages.
(*) Read the comments to learn more about message.max.bytes<=replica.fetch.max.bytes
2.消费者的值
ConsumerRecord(
topic = onlinelogs, partition = 0,
offset = 1452002, CreateTime = -1, checksum = 3849965367,
serialized key size = -1, serialized value size = 305,
key = null,
value = {"hostname":"yws76","servicename":"namenode",
"time":"2018-03-21 20:11:30,090","logtype":"INFO",
"loginfo":
"org.apache.hadoop.hdfs.server.namenode.FileJournalManager:
Finalizing edits file /dfs/nn/current/edits_inprogress_0000000000001453017 -> /dfs/nn/current/edits_0000000000001453017-0000000000001453030"})
2.1解释前面讲的曲线图
2.2 key=null;
分区策略,
Key is not null: Utils.abs(key.hashCode) % numPartitions
key=null:
http://www.2bowl.info/kafka%E6%BA%90%E7%A0%81%E8%A7%A3%E8%AF%BB-key%E4%B8%BAnulll%E6%97%B6kafka%E5%A6%82%E4%BD%95%E9%80%89%E6%8B%A9%E5%88%86%E5%8C%BApartition/
3.1
记录自定义kafka的parcel库,CDH安装kafka服务,无法安装过去的排雷过程
http://blog.itpub.net/30089851/viewspace-2136372/
3.2
断电 ,导致Kafka的Topic的损坏
现象: CDH web界面,Kafka进程绿色,我们一般认为绿色就是进程ok,其实不然
生产者和消费者 无法work,抛exception
流程:
去机器上看broker日志
kafka.common.NotAssignedReplicaException:
Leader 186 failed to record follower 191's position -1
since the replica is not recognized to be one of the assigned replicas 186
for partition [__consumer_offsets,3].
1.服务down,broker节点的kafka log目录删除
2.zk的kafka的元数据
3.重新装个kafka和topic
思考:
1.重刷,数据重复怎么办?
HBase put api(insert+update)
2.假如数据是落在HDFS,思考?
Hive 支持update,从哪个版本?加什么参数?
3.分区内保证排序的,多个分区怎样保证排序? 0.11版本
insert
delete
insert --> delete
delete --> insert
Producer:
acks: all
buffer.memory: 536870912
compression.type :snappy
retries: 100
max.in.flight.requests.per.connection = 1
batch.size: 10000 字节 不是条数
max.request.size = 2097152
request.timeout.ms = 360000 大于 replica.lag.time.max.ms
metadata.fetch.timeout.ms= 360000
timeout.ms = 360000
linger.ms 5s (生产不用)
max.block.ms 1800000
Broker: CDH
message.max.bytes 2560KB 1条消息的大小
zookeeper.session.timeout.ms 180000
replica.fetch.max.bytes 5M 大于message.max.bytes
num.replica.fetchers 6
replica.lag.max.messages 6000
replica.lag.time.max.ms 15000
log.flush.interval.messages 10000
log.flush.interval.ms 5s
Consumer:
https://issues.apache.org/jira/browse/SPARK-22968
, "max.partition.fetch.bytes" -> (5242880: java.lang.Integer) //default: 1048576
, "request.timeout.ms" -> (90000: java.lang.Integer) //default: 60000
, "session.timeout.ms" -> (60000: java.lang.Integer) //default: 30000
, "heartbeat.interval.ms" -> (5000: java.lang.Integer)
, "receive.buffer.bytes" -> (10485760: java.lang.Integer)
Minor changes required for Kafka 0.10 and the new consumer compared to laughing_man's answer:
Broker: No changes, you still need to increase properties message.max.bytes
and replica.fetch.max.bytes. message.max.bytes has to be equal or smaller(*) than
replica.fetch.max.bytes.
Producer: Increase max.request.size to send the larger message.
Consumer: Increase max.partition.fetch.bytes to receive larger messages.
(*) Read the comments to learn more about message.max.bytes<=replica.fetch.max.bytes
2.消费者的值
ConsumerRecord(
topic = onlinelogs, partition = 0,
offset = 1452002, CreateTime = -1, checksum = 3849965367,
serialized key size = -1, serialized value size = 305,
key = null,
value = {"hostname":"yws76","servicename":"namenode",
"time":"2018-03-21 20:11:30,090","logtype":"INFO",
"loginfo":
"org.apache.hadoop.hdfs.server.namenode.FileJournalManager:
Finalizing edits file /dfs/nn/current/edits_inprogress_0000000000001453017 -> /dfs/nn/current/edits_0000000000001453017-0000000000001453030"})
2.1解释前面讲的曲线图
2.2 key=null;
分区策略,
Key is not null: Utils.abs(key.hashCode) % numPartitions
key=null:
http://www.2bowl.info/kafka%E6%BA%90%E7%A0%81%E8%A7%A3%E8%AF%BB-key%E4%B8%BAnulll%E6%97%B6kafka%E5%A6%82%E4%BD%95%E9%80%89%E6%8B%A9%E5%88%86%E5%8C%BApartition/
3.1
记录自定义kafka的parcel库,CDH安装kafka服务,无法安装过去的排雷过程
http://blog.itpub.net/30089851/viewspace-2136372/
3.2
断电 ,导致Kafka的Topic的损坏
现象: CDH web界面,Kafka进程绿色,我们一般认为绿色就是进程ok,其实不然
生产者和消费者 无法work,抛exception
流程:
去机器上看broker日志
kafka.common.NotAssignedReplicaException:
Leader 186 failed to record follower 191's position -1
since the replica is not recognized to be one of the assigned replicas 186
for partition [__consumer_offsets,3].
1.服务down,broker节点的kafka log目录删除
2.zk的kafka的元数据
3.重新装个kafka和topic
思考:
1.重刷,数据重复怎么办?
HBase put api(insert+update)
2.假如数据是落在HDFS,思考?
Hive 支持update,从哪个版本?加什么参数?
3.分区内保证排序的,多个分区怎样保证排序? 0.11版本
insert
delete
insert --> delete
delete --> insert