上篇:电信客服综合项目 第3天 数据采集随机生成主被叫电话号码、项目打包发布到Linux
1、数据消费 - Kafka控制台消费Flume采集的生产数据
(1)启动zookeeper,再启动kafka集群
//先关闭防火墙
oot@flink102 ~]# systemctl stop firewalld.service
//启动zookeeper
[root@flink102 bin]# ./zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/hadoop/module/zookeeper-3.4.10/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@flink102 bin]#
//启动kafka集群
[root@flink102 kafka-2.11]# bin/kafka-server-start.sh config/server.properties &
//创建topic名为“test”
[root@flink102 kafka-2.11]# bin/kafka-topics.sh --create --zookeeper flink102:2181 --replication-factor 1 --partitions 1 --topic test
Created topic "test".
[root@flink102 kafka-2.11]#
(2)创建kafka消费者
//创建kafka消费者
[root@flink102 kafka-2.11]# bin/kafka-console-consumer.sh --bootstrap-server flink102:9092 --topic test --from-beginning
Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper].
//查看当前的topic有哪些
[root@flink102 kafka]# bin/kafka-topics.sh --zookeeper flink102:2181 --list
__consumer_offsets
ct - marked for deletion
test
(3)在workProject文件目录下创建 flume-kafka.conf文件
//创建 flume-kafka.conf文件
[root@flink102 workProject]# touch flume-kafka.conf
//编辑配置信息
[root@flink102 workProject]# vim flume-kafka.conf
//添加配置参数属性
# define
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# # source
a1.sources.r1.type = exec
a1.sources.r1.command =tail -F -c +0 /opt/workProject/call.log
a1.sources.r1.shell = /bin/bash -c
# # sink
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.bootstrap.servers =flink102:9092
a1.sinks.k1.kafka.topic = test
a1.sinks.k1.kafka.flumeBatchSize = 20
a1.sinks.k1.kafka.producer.acks = 1
a1.sinks.k1.kafka.producer.linger.ms = 1
# # channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
#
# # bind
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
~
(4)查看flunme是否进来
[root@flink102 flume-1.7.0]# ps -ef|grep flume
root 14236 6240 0 13:40 pts/2 00:00:00 grep --color=auto flume
(5)启动flume做数据采集
//启动flume做数据采集
[root@flink102 flume-1.7.0]# bin/flume-ng agent --conf conf/ --name a1 --conf-file /usr/hadoop/module/flume/flume-1.7.0/conf/flume-kafka.conf
启动加载过程
在kafka消费者查看,数据发现有
19565082510 16574556259 20181123141728 2095
16574556259 19602240179 20180817165119 1607
15781588029 17405139883 20180808164241 2686
19683537146 15781588029 20180228013242 2822
19154926260 14397114174 20180504172322 0791
16574556259 15244749863 20180920030546 1573
15305526350 14171709460 20181226062249 0808
19342117869 18840172592 20180125094856 0675
19313925217 15280214634 20180709060540 1261
17405139883 17336673697 20181025082738 0208
16160892861 16574556259 20180614083132 2230
13319935953 15884588694 20181208120518 1623
14410679238 16569963779 20180928093230 0280
14410679238 14171709460 20181205131816 0787
19313925217 19683537146 20180905015911 1940
19342117869 15647679901 20180415060636 1598
19565082510 16569963779 20180201044649 0532
15244749863 19683537146 20180620003146 1733
18101213362 19565082510 20180708051052 2886
19342117869 18101213362 20181226220346 2265
14410679238 15781588029 20181122175159 0698
16569963779 15781588029 20181113071022 0225
15280214634 14397114174 20180503215838 2340
15305526350 17405139883 20181020022236 0970
18840172592 19602240179 20180813062526 1378
17885275338 14397114174 20180716091215 1454
17405139883 19154926260 20180205134927 0657
19342117869 15280214634 20180308005855 0062
15280214634 15884588694 20180511235152 1508
13319935953 15280214634 20181026115750 2459
19602240179 15244749863 20180125113746 2290
19565082510 15884588694 20180206041231 1126
2、数据消费 - Kafka JavaAPI消费Flume采集的生产数据
(1)在idea项目工程里,study-project-ct项目模块下,创建一个子模块为:ct-consumer
创建步骤:
(2)子模块ct-consumer的pom文件依赖
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>BigData</artifactId>
<groupId>org.study.gphone</groupId>
<version>1.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>ct-consumer</artifactId>
<dependencies>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>0.11.0.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka -->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.12</artifactId>
<version>0.11.0.0</version>
</dependency>
<dependency>
<groupId>org.study.gphone</groupId>
<artifactId>ct-common</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
</dependencies>
</project>
(3)子模块ct-consumer代码编写
代码处,下载:
BigData0327.zip
若:需要在idea消费数据,需要study-project-ct工程下,创建kafka一个子模块,并做相关的代码编写。
随后,Linux执行命令:
生产数据
[root@flink102 workProject]# java -jar ct-producer.jar /opt/workProject/contact.log /opt/workProject/call.log
flume采集数据
[root@flink102 flume-1.7.0]# bin/flume-ng agent -c conf/ a1 -f /usr/hadoop/module/flume/flume-1.7.0/conf/flume-kafka.conf
运行idea的Bootstarp类即可,控制台将会打印出
以上就不提供了