20200621sparkstreaming学习笔记flume+zookeerpeer+kafka

最新推荐文章于 2023-05-13 08:31:33 发布

see you in September

最新推荐文章于 2023-05-13 08:31:33 发布

阅读量107

点赞数

分类专栏：大数据之路文章标签： flume zookeeper kafka

本文链接：https://blog.csdn.net/weixin_44611305/article/details/106894607

版权

大数据之路专栏收录该内容

42 篇文章 0 订阅

订阅专栏

弱弱的吐槽一句mac系统的catalina，太容易发烫了，我实在受不了了，然后昨天刷回Mojave了，不仅遇到很多坑，打电话给客服感觉问他们不如自己看文档== 最后搞了一天终于刷回Mojava 10.14.6了，我再也不升级了，电脑资料全没了==

SparkStreaming

安装Flume
export FLUME_HOME=/home/hadoop/app/apache-flume-1.6.0-cdh5.16.2-bin
export PATH=$FLUME_HOME/bin:$PATH

从官网上查询需要配置的东西的资料,1.9的版本应该跟1.7的差不多
https://flume.apache.org/releases/content/1.7.0/FlumeUserGuide.html

example.conf: A single-node Flume configuration

使用Flume的关键就是写配置文件

A）配置Source
B）配置Channel
C）配置Sink
D）把以上三个组件串起来

a1: agent名称
r1: source的名称
k1: sink的名称
c1: channel的名称

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = spark000
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动agent
flume-ng agent \
--name a1 \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/example.conf \
-Dflume.root.logger=INFO,console //把日志打印到控制台

使用telnet进行测试：

telnet spark000 44444

Event: { headers:{} body: 68 65 6C 6C 6F 0D hello. }
Event是FLume数据传输的基本单元
Event = 可选的header + byte array

需求二：监听一个文件
Agent选型：exec source + memory channel + logger sink
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/hadoop/data/data.log
a1.sources.r1.shell = /bin/sh -c

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动agent
flume-ng agent \
--name a1 \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/exec-memory-logger.conf \
-Dflume.root.logger=INFO,console

cd到data 目录
echo hello >> data.log
echo world >> data.log

需求三:
从指定网络端口采集数据输出到控制台
监控一个文件实时采集新增的数据输出到控制台
将A服务器上的日志实时采集到B服务器

Avro Sink
This sink forms one half of Flume’s tiered collection support. Flume events sent to this sink are turned into Avro events and sent to the configured hostname / port pair. The events are taken from the configured Channel in batches of the configured batch size. Required properties are in bold.

Avro Source
Listens on Avro port and receives events from external Avro client streams. When paired with the built-in Avro Sink on another (previous hop) Flume agent, it can create tiered collection topologies. Required properties are in bold.

技术选型：
exec source + memory channel + avro sink
avro source + memory channel + logger sink

exec-memory-avro.conf

exec-memory-avro.sources = exec-source
exec-memory-avro.sinks = avro-sink
exec-memory-avro.channels = memory-channel

exec-memory-avro.sources.exec-source.type = exec
exec-memory-avro.sources.exec-source.command = tail -F /home/hadoop/data/data.log
exec-memory-avro.sources.exec-source.shell = /bin/sh -c

exec-memory-avro.sinks.avro-sink.type = avro
exec-memory-avro.sinks.avro-sink.hostname = spark000
exec-memory-avro.sinks.avro-sink.port = 44444

exec-memory-avro.channels.memory-channel.type = memory

exec-memory-avro.sources.exec-source.channels = memory-channel
exec-memory-avro.sinks.avro-sink.channel = memory-channel

avro-memory-logger.conf

avro-memory-logger.sources = avro-source
avro-memory-logger.sinks = logger-sink
avro-memory-logger.channels = memory-channel

avro-memory-logger.sources.avro-source.type = avro
avro-memory-logger.sources.avro-source.bind = spark000
avro-memory-logger.sources.avro-source.port = 44444

avro-memory-logger.sinks.logger-sink.type = logger

avro-memory-logger.channels.memory-channel.type = memory

avro-memory-logger.sources.avro-source.channels = memory-channel
avro-memory-logger.sinks.logger-sink.channel = memory-channel

先启动avro-memory-logger
flume-ng agent \
--name avro-memory-logger \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/avro-memory-logger.conf \
-Dflume.root.logger=INFO,console

flume-ng agent \
--name exec-memory-avro \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/exec-memory-avro.conf \
-Dflume.root.logger=INFO,console

avro sink控制台不会有数据,它会收集数据到avro source然后avro source输出到控制台

还是cd 到data， echo spark >> data.log
echo flink >> data.log

kafka概述

跟生产者还有消费者很像
和消息系统类似

消息中间件：生产者和消费者

   妈妈：生产者
   你：消费者
   馒头：数据流、消息

       正常情况下：生产一个消费一个
       其他情况：
           一直生产，你吃到某一个馒头时，你卡主(机器故障)，馒头就丢失了
           一直生产，做馒头速度快，你吃来不及，馒头也就丢失了

拿个碗/篮子，馒头做好以后先放到篮子里，你要吃的时候去篮子里面取出来吃

   篮子/框： Kafka
       当篮子满了，馒头就装不下了，咋办？
       多准备几个篮子 === Kafka的扩容

Kafka架构
   producer：生产者，就是生产馒头(老妈)
   consumer：消费者，就是吃馒头的(你)
   broker：篮子
   topic：主题，给馒头带一个标签，topica的馒头是给你吃的，topicb的馒头是给你弟弟吃

配置
首先要弄一个zookeeper

解压到app目录

配置环境变量
export ZK_HOME=/home/hadoop/app/zookeeper-3.4.5-cdh5.16.2
export PATH=$ZK_HOME/bin:$PATH

cd conf

cp zoo_sample.cfg zoo.cfg

改下dataDir存储目录
到app 目录下mkdir 一个zktmp

dataDir=/home/hadoop/app/zktmp/zk

接下来就是kafka
老套路

export KAFKA_HOME=/home/hadoop/app/kafka_2.11-0.9.0.0
export PATH=$KAFKA_HOME/bin:$PATH

到config目录下
vi server.properties
id可以不改，多台机器就要改了，一台机器直接用localhost就行了，
log.dirs=/home/hadoop/app/zktmp/kafka-logs

zookeeper.connect单机也可以不改

在config目录下
kafka-server-start.sh $KAFKA_HOME/config/server.properties

[2020-06-21 21:15:00,597] INFO [Kafka Server 0], started (kafka.server.KafkaServer)

创建topic: zk
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic hello_topic

查看所有topic
kafka-topics.sh --list --zookeeper localhost:2181

查看详细信息
kafka-topics.sh --describe --zookeeper localhost:2181

发送消息: broker
kafka-console-producer.sh --broker-list localhost:9092 --topic hello_topic

消费消息: zk
kafka-console-consumer.sh --zookeeper localhost:2181 --topic hello_topic --from-beginning

测试:在kafka-console-producer.sh --broker-list localhost:9092 --topic hello_topic
这边发送消息 hello world spark，consumer那边会得到这些消息

--from-beginning的使用

查看所有topic的详细信息：kafka-topics.sh --describe --zookeeper localhost:2181
查看指定topic的详细信息：kafka-topics.sh --describe --zookeeper localhost:2181 --topic hello_topic

单节点多broker
server-1.properties
   log.dirs=/home/hadoop/app/tmp/kafka-logs-1
   listeners=PLAINTEXT://:9093
   broker.id=1

server-2.properties
   log.dirs=/home/hadoop/app/tmp/kafka-logs-2
   listeners=PLAINTEXT://:9094
   broker.id=2

server-3.properties
   log.dirs=/home/hadoop/app/tmp/kafka-logs-3
   listeners=PLAINTEXT://:9095
   broker.id=3

kafka-server-start.sh -daemon $KAFKA_HOME/config/server-1.properties &
kafka-server-start.sh -daemon $KAFKA_HOME/config/server-2.properties &
kafka-server-start.sh -daemon $KAFKA_HOME/config/server-3.properties &

kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic my-replicated-topic

kafka-console-producer.sh --broker-list localhost:9093,localhost:9094,localhost:9095 --topic my-replicated-topic

kafka-console-consumer.sh --zookeeper localhost:2181 --topic my-replicated-topic

kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic

note: 用阿里云，一堆坑==，唉，一言难尽

see you in September

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
20200621sparkstreaming学习笔记flume+zookeerpeer+kafka

弱弱的吐槽一句mac系统的catalina，太容易发烫了，我实在受不了了，然后昨天刷回Mojave了，不仅遇到很多坑，打电话给客服感觉问他们不如自己看文档== 最后搞了一天终于刷回Mojava 10.14.6了，我再也不升级了，电脑资料全没了==SparkStreaming安装Flumeexport FLUME_HOME=/home/hadoop/app/apache-flume-1.6.0-cdh5.16.2-binexport PATH=$FLUME_HOME/bin:$PATH从官网
复制链接

扫一扫