20200621sparkstreaming学习笔记flume+zookeerpeer+kafka

弱弱的吐槽一句mac系统的catalina,太容易发烫了,我实在受不了了,然后昨天刷回Mojave了,不仅遇到很多坑,打电话给客服感觉问他们不如自己看文档== 最后搞了一天终于刷回Mojava 10.14.6了,我再也不升级了,电脑资料全没了==

SparkStreaming

安装Flume
export FLUME_HOME=/home/hadoop/app/apache-flume-1.6.0-cdh5.16.2-bin
export PATH=$FLUME_HOME/bin:$PATH


从官网上查询需要配置的东西的资料,1.9的版本应该跟1.7的差不多
https://flume.apache.org/releases/content/1.7.0/FlumeUserGuide.html

example.conf: A single-node Flume configuration

使用Flume的关键就是写配置文件

A) 配置Source
B) 配置Channel
C) 配置Sink
D) 把以上三个组件串起来

a1: agent名称 
r1: source的名称
k1: sink的名称
c1: channel的名称

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = spark000
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1


启动agent
flume-ng agent \
--name a1  \
--conf $FLUME_HOME/conf  \
--conf-file $FLUME_HOME/conf/example.conf \
-Dflume.root.logger=INFO,console    //把日志打印到控制台

使用telnet进行测试: 

telnet spark000 44444


Event: { headers:{} body: 68 65 6C 6C 6F 0D hello. }
Event是FLume数据传输的基本单元
Event =  可选的header + byte array


需求二:监听一个文件
Agent选型:exec source + memory channel + logger sink
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/hadoop/data/data.log
a1.sources.r1.shell = /bin/sh -c

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动agent
flume-ng agent \
--name a1  \
--conf $FLUME_HOME/conf  \
--conf-file $FLUME_HOME/conf/exec-memory-logger.conf \
-Dflume.root.logger=INFO,console

cd到data 目录
echo hello >> data.log
echo world >> data.log

需求三:
从指定网络端口采集数据输出到控制台
监控一个文件实时采集新增的数据输出到控制台
将A服务器上的日志实时采集到B服务器

Avro Sink
This sink forms one half of Flume’s tiered collection support. Flume events sent to this sink are turned into Avro events and sent to the configured hostname / port pair. The events are taken from the configured Channel in batches of the configured batch size. Required properties are in bold.

Avro Source
Listens on Avro port and receives events from external Avro client streams. When paired with the built-in Avro Sink on another (previous hop) Flume agent, it can create tiered collection topologies. Required properties are in bold.

技术选型:
    exec source + memory channel + avro sink
    avro source + memory channel + logger sink

exec-memory-avro.conf

exec-memory-avro.sources = exec-source
exec-memory-avro.sinks = avro-sink
exec-memory-avro.channels = memory-channel

exec-memory-avro.sources.exec-source.type = exec
exec-memory-avro.sources.exec-source.command = tail -F /home/hadoop/data/data.log
exec-memory-avro.sources.exec-source.shell = /bin/sh -c

exec-memory-avro.sinks.avro-sink.type = avro
exec-memory-avro.sinks.avro-sink.hostname = spark000
exec-memory-avro.sinks.avro-sink.port = 44444

exec-memory-avro.channels.memory-channel.type = memory

exec-memory-avro.sources.exec-source.channels = memory-channel
exec-memory-avro.sinks.avro-sink.channel = memory-channel

avro-memory-logger.conf

avro-memory-logger.sources = avro-source
avro-memory-logger.sinks = logger-sink
avro-memory-logger.channels = memory-channel

avro-memory-logger.sources.avro-source.type = avro
avro-memory-logger.sources.avro-source.bind = spark000
avro-memory-logger.sources.avro-source.port = 44444

avro-memory-logger.sinks.logger-sink.type = logger

avro-memory-logger.channels.memory-channel.type = memory

avro-memory-logger.sources.avro-source.channels = memory-channel
avro-memory-logger.sinks.logger-sink.channel = memory-channel


先启动avro-memory-logger
flume-ng agent \
--name avro-memory-logger  \
--conf $FLUME_HOME/conf  \
--conf-file $FLUME_HOME/conf/avro-memory-logger.conf \
-Dflume.root.logger=INFO,console


flume-ng agent \
--name exec-memory-avro  \
--conf $FLUME_HOME/conf  \
--conf-file $FLUME_HOME/conf/exec-memory-avro.conf \
-Dflume.root.logger=INFO,console


avro sink控制台不会有数据,它会收集数据到avro source然后avro source输出到控制台

还是cd 到data, echo spark >> data.log  
echo flink >> data.log


kafka概述

跟生产者还有消费者很像
    和消息系统类似

    消息中间件:生产者和消费者

    妈妈:生产者
    你:消费者
    馒头:数据流、消息

        正常情况下: 生产一个  消费一个
        其他情况:  
            一直生产,你吃到某一个馒头时,你卡主(机器故障), 馒头就丢失了
            一直生产,做馒头速度快,你吃来不及,馒头也就丢失了

        拿个碗/篮子,馒头做好以后先放到篮子里,你要吃的时候去篮子里面取出来吃

    篮子/框: Kafka
        当篮子满了,馒头就装不下了,咋办? 
        多准备几个篮子 === Kafka的扩容

Kafka架构
    producer:生产者,就是生产馒头(老妈)
    consumer:消费者,就是吃馒头的(你)
    broker:篮子
    topic:主题,给馒头带一个标签,topica的馒头是给你吃的,topicb的馒头是给你弟弟吃

配置
首先要弄一个zookeeper

解压到app目录

配置环境变量
export ZK_HOME=/home/hadoop/app/zookeeper-3.4.5-cdh5.16.2
export PATH=$ZK_HOME/bin:$PATH

cd conf

cp zoo_sample.cfg zoo.cfg

改下dataDir存储目录
到app 目录下mkdir 一个zktmp

dataDir=/home/hadoop/app/zktmp/zk

接下来就是kafka
老套路

export KAFKA_HOME=/home/hadoop/app/kafka_2.11-0.9.0.0
export PATH=$KAFKA_HOME/bin:$PATH


到config目录下
vi server.properties 
id可以不改,多台机器就要改了,一台机器直接用localhost就行了,
log.dirs=/home/hadoop/app/zktmp/kafka-logs

zookeeper.connect单机也可以不改

在config目录下
 kafka-server-start.sh $KAFKA_HOME/config/server.properties 

[2020-06-21 21:15:00,597] INFO [Kafka Server 0], started (kafka.server.KafkaServer)


创建topic: zk
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic hello_topic

查看所有topic
kafka-topics.sh --list --zookeeper localhost:2181

查看详细信息
kafka-topics.sh --describe --zookeeper localhost:2181

发送消息: broker
kafka-console-producer.sh --broker-list localhost:9092 --topic hello_topic

消费消息: zk
kafka-console-consumer.sh --zookeeper localhost:2181 --topic hello_topic --from-beginning


测试:在kafka-console-producer.sh --broker-list localhost:9092 --topic hello_topic
这边发送消息 hello world spark,consumer那边会得到这些消息


--from-beginning的使用

查看所有topic的详细信息:kafka-topics.sh --describe --zookeeper localhost:2181
查看指定topic的详细信息:kafka-topics.sh --describe --zookeeper localhost:2181 --topic hello_topic

单节点多broker
server-1.properties
    log.dirs=/home/hadoop/app/tmp/kafka-logs-1
    listeners=PLAINTEXT://:9093
    broker.id=1

server-2.properties
    log.dirs=/home/hadoop/app/tmp/kafka-logs-2
    listeners=PLAINTEXT://:9094
    broker.id=2

server-3.properties
    log.dirs=/home/hadoop/app/tmp/kafka-logs-3
    listeners=PLAINTEXT://:9095
    broker.id=3

kafka-server-start.sh -daemon $KAFKA_HOME/config/server-1.properties &
kafka-server-start.sh -daemon $KAFKA_HOME/config/server-2.properties &
kafka-server-start.sh -daemon $KAFKA_HOME/config/server-3.properties &

kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic my-replicated-topic

kafka-console-producer.sh --broker-list localhost:9093,localhost:9094,localhost:9095 --topic my-replicated-topic


kafka-console-consumer.sh --zookeeper localhost:2181 --topic my-replicated-topic


kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic


note: 用阿里云,一堆坑==,唉,一言难尽

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值