flume+kafka+kafka stream 整合

1.使用背景

先说一下,为什么要使用 Flume + Kafka?

以实时流处理项目为例,由于采集的数据量可能存在峰值和峰谷,假设是一个电商项目,那么峰值通常出现在秒杀时,这时如果直接将 Flume 聚合后的数据输入到 Storm 等分布式计算框架中,可能就会超过集群的处理能力,这时采用 Kafka 就可以起到削峰的作用。Kafka 天生为大数据场景而设计,具有高吞吐的特性,能很好地抗住峰值数据的冲击。

2.整合流程

  1. 启动Zookeeper和Kafka

这里启动一个单节点的 Kafka 作为测试:

# 启动Zookeeper
zkServer.sh start

# 启动kafka
bin/kafka-server-start.sh config/server.properties
  1. 创建主题

创建一个主题 user_friends_raw,之后 Flume 收集到的数据都会发到这个主题上:

# 创建主题
bin/kafka-topics.sh --create \
--zookeeper hadoop101:2181 \
--replication-factor 1   \
--partitions 1 --topic user_friends_raw

# 查看创建的主题
bin/kafka-topics.sh --zookeeper hadoop001:2181 --list

3.编写配置文件

cd /opt/flume/conf/jobkb09
vi kafka-flume-logger.conf

user_friend.sources=userFriendSource
user_friend.channels=userFriendChannel
user_friend.sinks=userFriendSink

user_friend.sources.userFriendSource.type=spooldir
user_friend.sources.userFriendSource.spoolDir=/opt/flume/conf/jobkb09/dataSourceFile/userFriend
user_friend.sources.userFriendSource.deserializer=LINE
user_friend.sources.userFriendSource.deserializer.maxLineLength=320000
user_friend.sources.userFriendSource.includePattern=userFriend_[0-9]{4}-[0-9]{2}-[0-9]{2}.csv
user_friend.sources.userFriendSource.interceptors=head_filter
user_friend.sources.userFriendSource.interceptors.head_filter.type=regex_filter
user_friend.sources.userFriendSource.interceptors.head_filter.regex=^user,friends*
user_friend.sources.userFriendSource.interceptors.head_filter.excludeEvents=true

user_friend.channels.userFriendChannel.type=file
user_friend.channels.userFriendChannel.checkpointDir=/opt/flume/conf/jobkb09/checkPointFile/userFriend
user_friend.channels.userFriendChannel.dataDirs=/opt/flume/conf/jobkb09/dataChannelFile/userFriend

user_friend.sinks.userFriendSink.type=org.apache.flume.sink.kafka.KafkaSink
user_friend.sinks.userFriendSink.batchSize=640
user_friend.sinks.userFriendSink.brokerList=192.168.195.20:9092
user_friend.sinks.userFriendSink.topic=user_friends_raw

user_friend.sources.userFriendSource.channels=userFriendChannel
user_friend.sinks.userFriendSink.channel=userFriendChannel

4.启动flume

 ./bin/flume-ng agent --name user_friend --conf ./conf/ --conf-file ./conf/jobkb09/kafka-flume-logger.conf -Dflume.root.logger=INFO,console

5.IDEA里编写kafka stream代码

public class userfriend{
    public static void main(String[] args) {
        Properties prop = new Properties();
        prop.put(StreamsConfig.APPLICATION_ID_CONFIG, "userfriendapp");
        prop.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "192.168.195.20:9092");
        prop.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG,5000);//5秒自动提交
        prop.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"earliest"); //earlist latest none
        prop.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,"false");
        prop.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
        prop.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());

        //创建流构造器
        StreamsBuilder builder=new StreamsBuilder();
        builder.stream("user_friends_raw").flatMap((k,v)-> {
            List<KeyValue<String,String>> list=new ArrayList<>();
            String[] info = v.toString().split(",");
            if(info.length==2) {
                String[] friends = info[1].split("\\s+");
                if (info[0].trim().length() > 0) {
                    for (String friend : friends) {
                        System.out.println(info[0] + "  " + friend);
                        list.add(new KeyValue<String, String>(null, info[0] + "," + friend));
                    }
                }
            }
            return list;
        }).to("user_friends");



        final Topology topo=builder.build();
        final KafkaStreams streams = new KafkaStreams(topo, prop);

        final CountDownLatch latch=new CountDownLatch(1);
        Runtime.getRuntime().addShutdownHook(new Thread("stream"){

            @Override
            public void run() {
                streams.close();
                latch.countDown();
            }
        });

        streams.start();
        try {
            latch.await();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        System.exit(0);
    }
}

6.拷贝表

cp ./user_friends.csv /opt/flume/conf/jobkb09/dataSourceFile/userFriend/userFriend_2020-11-30.csv

7.消费数据

kafka-console-consumer.sh --bootstrap-server 192.168.195.20:9092 --topic user_friends_raw --from-beginning
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值