项目说明
这是一个依据尚硅谷公开课(智慧出行)搭建的项目,仅实现了三个模块,包括“数据生产、消费、建模”。
开发工具及环境说明:Intellij IDEA,Scala,Redis,Kafka
kafka
生产者生产数据:
字段:卡口id,车速
kafka外部配置文件kafka.properties
# 设置生产者属性
# 9092 kafka默认server端口
# bootstrap.servers
# TYPE list DEFAULT localhost:9092
# 用于建立与kafka集群连接的host/port组。
# A list of host/port pairs to use for establishing the initial connection to the Kafka cluster.
bootstrap.servers = 10.25.34.103:9092, 10.25.34.104:9092, 10.25.34.105:9092
# 消息以“键-值”形式存入消息队列
# 在使用Kafka发送接收消息时,producer端需要序列化,consumer端需要反序列化
# 设置“键-值”的序列化方式(涉及跨进程通讯,一定要序列化)
# (在External Libraries的maven的kafka依赖库中查找,copy reference)
# key.serializer(value.serializer):
# TYPE class
# Serializer class for key(value) that implements the Serializer interface.
key.serializer = org.apache.kafka.common.serialization.StringSerializer
value.serializer = org.apache.kafka.common.serialization.StringSerializer
# kafka为了避免信息丢失,会备份信息
# acks :
# producer要求在leader在判定某条消息是否写入完成之前需要收到的确认写入的个数。这个值控制了发送消息的可用性。
# The number of acknowledgments the producer requires the leader to have received before considering a request complete.
# This controls the durability of records that are sent.
# The following settings are allowed:
# acks=0 If set to zero then the producer will not wait for any acknowledgment from the server at all.
# 如果设置为0,则表明producer不要等待server的任何写入确认。记录会立刻添加到socket buffer,然后认为是发送了。
# 这种情况下,无法保证server是否确实收到了消息,同时retries这个配置不起作用,请求返回应答中的offset通常设置为-1
# acks=1 This will mean the leader will write the record to its local log but will respond without awaiting full acknowledgement from all followers.
# 只要leader确认写入本地日志了就可以返回应答了,不需要等待所有follower同步成功。
# 这种情况下,如果leader写入本地之后立马返回确认应答,但是此时follower没有同步这条消息,同时leader如果挂掉,则这条消息丢失了
# acks=all This means the leader will wait for the full set of in-sync replicas to acknowledge the record.
# 这种情况下要求leader在收到所有活跃备份节点确认写入的消息之后才能回馈确认写入的消息给producer。
# 这种方式可以保证只要有一个备份节点活跃,消息就不会丢。这是最强的保证。这个和acks=-1相同
acks = all
# retries:
# 设置重试次数可以在发送失败时进行重试,提高发送的可靠性。
# 注意,这个重试和客户端发生接受错误的重试没有区别。
# 允许重试,而不设置max.in.flight.request.per.connection为1的话,将可能调整消息的发送次序,
# 例如两组批量消息发送到同一个partition,第一个失败了然后重试,但是第二个发送成功了,实际的结果可能是第二个小组在partition中出现的更早。
retries = 0
# 设置主题
kafka.topics = traffic
# 设置消费者属性
# Consumer端核心的配置是group.id(必需)、zookeeper.connect
# group.id
# TYPE String
# 决定该Consumer归属的唯一组ID
# A unique string that identifies the consumer group this consumer belongs to.
# This property is required if the consumer uses either the group management functionality by using subscribe(topic) or the Kafka-based offset management strategy.
group.id = g_traffic1
# enable.auto.commit:
# TYPE boolean DEFAULT true
# If true the consumer's offset will be periodically committed in the background.
enable.auto.commit = true
# 每过30s自动保存
# auto.commit.interval.ms:
# TYPE int DEFAULT 5000
# The frequency in milliseconds that the consumer offsets are auto-committed t