python 通过 pykafka 发送数据和消费数据,遇到两个问题,与大家分享下。
问题1
1. 通过pykafa 发送数据时候,每隔5s才发送一次,几百万数据,效率大大影响。
生产者只有get_producer 方法,当前参数 ack_timeout_ms=1000, linger_ms=5000只有两个。
查询官网 https://pykafka.readthedocs.io/en/latest/api/producer.html
linger_ms 参数 默认为5000,刚好5s,修改为0,来了就发送。
linger_ms (int) – This setting gives the upper bound on the delay for batching: once the producer gets min_queued_messages worth of messages for a broker, it will be sent immediately regardless of this setting. However, if we have fewer than this many messages accumulated for this partition we will ‘linger’ for the specified time waiting for more records to show up. linger_ms=0 indicates no lingering - messages are sent as fast as possible after they are