kafka+flume+aws S3安装与配置

6 篇文章 0 订阅

kafka+flume+aws S3安装与配置

原创不易,转载请注明出处 ,xiexie!

一、下载并解压、安装相应的版本

hadoop版本为2.9.2,flume的_本为1.9.0

下载hadoop地址为:https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.9.2/hadoop-2.9.2.tar.gz

下载flume地址:https://mirrors.tuna.tsinghua.edu.cn/apache/flume/1.9.0/apache-flume-1.9.0-bin.tar.gz

二、然后将hadoop2.9.2中的连接S3所需要支持的依赖包拷贝到flume自身指定的lib目录下面;

/hadoop-2.9.2/share/hadoop/common/*.jar

/hadoop-2.9.2/share/hadoop/common/lib/*.jar

以及执行命令find -name ‘aws.jar’ 找到的两个连接aws S3的jar包导入到flume的lib目录下;

三、配置相关文件

(1)配置hadoop中core-site.xml 和hdfs-site.xml两个配置文件

<configuration>
        <property>
                <name>fs.s3a.impl</name>
                <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
        </property>
        <property>
                <name>fs.s3a.access.key</name>
                <value>admin</value>
        </property>
        <property>
                <name>fs.s3a.secret.key</name>
                <value>admin123</value>
        </property>
        <property>
                <name>fs.s3a.connection.ssl.enabled</name>
                <value>true</value>
        </property>
        <property>
                <name>fs.s3a.endpoint</name>
                <value>s3.cn-northwest-1.amazonaws.com.cn</value>
        </property>


</configuration>

将fs.s3a.endpoint改成⾃⼰在aws上⾯的endpointt其中的fs.s3a.secret.key改成⾃⼰的key,
fs.s3a.connection.ssl.enabled为true(公网上使用http连接)

同时,将配置好的core-site.xml 拷贝到flume的conf目录下

(2)同时配置普通的flume需要的文件

#此处只需要复制以下文件的名字即可,不需要做其他的配置,具体的配置需要放置系统的/etc/profile中
cp flume-env.sh.template flume-env.sh
cp flume-conf.properties.template flume-conf.properties

(3)vim /etc/profile文件

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.212.b04-0.el7_6.x86_64/jre
export PATH=$PATH:$JAVA_HOME/bin

export HADOOP_HOME=/hadoop/hadoop-2.9.2
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
#其中有些些不是必须的,比如HADOOP_CONF_DIR
export FLUME_CLASSPATH=$HADOOP_HOME/share/hadoop/hdfs/lib/*

(4)创建flume的启动配置文件kafka-flume-sink2s3.conf

# ------------------- define data source ----------------------
# source alias
agent.sources = source_from_kafka
# channels alias
agent.channels = mem_channel
# sink alias
agent.sinks = s3_sink


# define kafka source
agent.sources.source_from_kafka.type = org.apache.flume.source.kafka.KafkaSource
agent.sources.source_from_kafka.batchSize = 10
# set kafka broker address  
agent.sources.source_from_kafka.kafka.bootstrap.servers = 192.xx.xx.xx:xxxx
# set kafka topic
agent.sources.source_from_kafka.kafka.topics = test_tank007
# set kafka groupid
agent.sources.source_from_kafka.kafka.consumer.group.id = flumeTest1


# defind hdfs sink
agent.sinks.s3_sink.type = hdfs
# set store hdfs path
agent.sinks.s3_sink.hdfs.path = s3a://bucket_name/upload

# set file size to trigger roll
agent.sinks.s3_sink.hdfs.rollSize = 0
agent.sinks.s3_sink.hdfs.rollCount = 0
agent.sinks.s3_sink.hdfs.rollInterval = 5
#agent.sinks.s3_sink.hdfs.threadsPoolSize = 30
agent.sinks.s3_sink.hdfs.fileType = DataStream
agent.sinks.s3_sink.hdfs.writeFormat = Text

# define channel from kafka source to hdfs sink 
agent.channels.mem_channel.type = memory
# channel store size
agent.channels.mem_channel.capacity = 1000
# transaction size
agent.channels.mem_channel.transactionCapacity = 1000
agent.channels.mem_channel.byteCapacity = 800000
agent.echannels.mem_channel.byteCapacityBufferPercentage = 20
agent.echannels.mem_channel.keep-alive = 60

# specify the channel the sink should use  
agent.sources.source_from_kafka.channels = mem_channel
agent.sinks.s3_sink.channel = mem_channel

启动命令flume监控kafka导入到AWS S3桶存储中的命令:

./bin/flume-ng agent --conf ./conf -f ./job/kafka-flume-sink2s3.conf -name agent -Dflume.root.logger=DEBUG,console

四、测试flume导入是否成功:

1.启动kafka的应用向topic为test_tank007中生产数据,此时可以监控到正常的文件在S3中的输出。

2.在S3客户端上监控到有正常的文件输入,具体的flume中各种文件的输入方式和生成方法,需要调整kafka-flume-sink2s3.conf配置文件中相关配置项目。

3.在flume中日志端的正常输出(没报错):

2019-06-22 15:57:24,561 (PollableSourceRunner-KafkaSource-source_from_kafka) [DEBUG - org.apache.kafka.clients.FetchSessionHandler$Builder.build(FetchSessionHandler.java:252)] [Consumer clientId=consumer-1, groupId=flumeTest1] Built incremental fetch (sessionId=601654046, epoch=3740) for node 0. Added 0 partition(s), altered 1 partition(s), removed 0 partition(s) out of 1 partition(s)
2019-06-22 15:57:24,561 (PollableSourceRunner-KafkaSource-source_from_kafka) [DEBUG - org.apache.kafka.clients.consumer.internals.Fetcher.sendFetches(Fetcher.java:216)] [Consumer clientId=consumer-1, groupId=flumeTest1] Sending READ_UNCOMMITTED IncrementalFetchRequest(toSend=(test_tank007-0), toForget=(), implied=()) to broker 172.19.32.68:9092 (id: 0 rack: null)
2019-06-22 15:57:24,562 (PollableSourceRunner-KafkaSource-source_from_kafka) [DEBUG - org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:472)] [Consumer clientId=consumer-1, groupId=flumeTest1] Using older server API v7 to send FETCH {replica_id=-1,max_wait_time=500,min_bytes=1,max_bytes=52428800,isolation_level=0,session_id=601654046,epoch=3740,topics=[{topic=test_tank007,partitions=[{partition=0,fetch_offset=12689,log_start_offset=-1,max_bytes=1048576}]}],forgotten_topics_data=[]} with correlation id 4982 to node 0
2019-06-22 15:57:24,562 (PollableSourceRunner-KafkaSource-source_from_kafka) [DEBUG - org.apache.flume.source.kafka.KafkaSource.doProcess(KafkaSource.java:299)] Waited: 703 
2019-06-22 15:57:24,562 (PollableSourceRunner-KafkaSource-source_from_kafka) [DEBUG - org.apache.flume.source.kafka.KafkaSource.doProcess(KafkaSource.java:300)] Event #: 7
2019-06-22 15:57:24,661 (PollableSourceRunner-KafkaSource-source_from_kafka) [DEBUG - org.apache.kafka.clients.FetchSessionHandler.handleResponse(FetchSessionHandler.java:423)] [Consumer clientId=consumer-1, groupId=flumeTest1] Node 0 sent an incremental fetch response for session 601654046 with 1 response partition(s)
2019-06-22 15:57:24,662 (PollableSourceRunner-KafkaSource-source_from_kafka) [DEBUG - org.apache.kafka.clients.consumer.internals.Fetcher$1.onSuccess(Fetcher.java:242)] [Consumer clientId=consumer-1, groupId=flumeTest1] Fetch READ_UNCOMMITTED at offset 12689 for partition test_tank007-0 returned fetch data (error=NONE, highWaterMark=12690, lastStableOffset = -1, logStartOffset = 0, abortedTransactions = null, recordsSizeInBytes=86)
2019-06-22 15:57:24,662 (PollableSourceRunner-KafkaSource-source_from_kafka) [DEBUG - org.apache.kafka.clients.consumer.internals.Fetcher.prepareFetchRequests(Fetcher.java:914)] [Consumer clientId=consumer-1, groupId=flumeTest1] Added READ_UNCOMMITTED fetch request for partition test_tank007-0 at offset 12690 to node 172.19.32.68:9092 (id: 0 rack: null)
2019-06-22 15:57:24,662 (PollableSourceRunner-KafkaSource-source_from_kafka) [DEBUG - org.apache.kafka.clients.FetchSessionHandler$Builder.build(FetchSessionHandler.java:252)] [Consumer clientId=consumer-1, groupId=flumeTest1] Built incremental fetch (sessionId=601654046, epoch=3741) for node 0. Added 0 partition(s), altered 1 partition(s), removed 0 partition(s) out of 1 partition(s)
2019-06-22 15:57:24,662 (PollableSourceRunner-KafkaSource-source_from_kafka) [DEBUG - org.apache.kafka.clients.consumer.internals.Fetcher.sendFetches(Fetcher.java:216)] [Consumer clientId=consumer-1, groupId=flumeTest1] Sending READ_UNCOMMITTED IncrementalFetchRequest(toSend=(test_tank007-0), toForget=(), implied=()) to broker 172.19.32.68:9092 (id: 0 rack: null)

五、问题总结:

[Flume报 Space for commit to queue couldn’t be acquired. Sinks are likely not keeping up with sources, or the buffer size is too tight]

参见:Flume报 Space for commit to queue couldn’t be acquired. Sinks are likely not keeping up with sources, or the buffer size is too tight


写在最后:看完后觉得不错的小伙伴们就点个赞吧!(中间遇到的问题众多,搞了将近一周时间,欢迎交流。)

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值