阿里云: Flume消费Kafka到OSS

flume配置

# Name the components on this agent
a1.sources = source1
a1.sinks = oss1
a1.channels = c1

# Describe/configure the source
a1.sources.source1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.source1.kafka.bootstrap.servers = xxxxxx:9092
a1.sources.source1.topic = topic名称
a1.sources.source1.groupId = flume
a1.sources.source1.kafka.consumer.timeout.ms = 100
#earliest
#当各分区下有已提交的offset时,从提交的offset开始消费;无提交的offset时,从头开始消费
#latest
#当各分区下有已提交的offset时,从提交的offset开始消费;无提交的offset时,消费新产生的该分区下的数据
#none
#topic各分区都存在已提交的offset时,从offset后开始消费;只要有一个分区不存在已提交的offset,则抛出异常
a1.sources.source1.kafka.consumer.auto.offset.reset = earliest

# Describe the sink
a1.sinks.oss1.type = hdfs
a1.sinks.oss1.hdfs.path = (oss地址)oss://bucket名称/kafka-flume-oss-test/%{topic}/%y-%m-%d
a1.sinks.oss1.hdfs.rollInterval = 30
a1.sinks.oss1.hdfs.rollSize = 0
a1.sinks.oss1.hdfs.rollCount = 0
a1.sinks.oss1.hdfs.fileType = DataStream

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 100000
a1.channels.c1.transactionCapacity = 10000

# Bind the source and sink to the channel
a1.sources.source1.channels = c1
a1.sinks.oss1.channel = c1

其他更加具体的配置去flume官网查看
Apache Flume 文档

遇到的问题

java.lang.ClassNotFoundException: Class com.aliyun.emr.fs.oss.JindoOssFileSystem not found

ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.SinkRunner$PollingRunner.run:158)  - Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.aliyun.emr.fs.oss.JindoOssFileSystem not found
        at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:464)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.aliyun.emr.fs.oss.JindoOssFileSystem not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2369)
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2793)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
        at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:255)
        at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:247)
        at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:727)
        at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
        at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:724)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        ... 1 more
Caused by: java.lang.ClassNotFoundException: Class com.aliyun.emr.fs.oss.JindoOssFileSystem not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2273)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2367)
        ... 16 more

解决方案

  • /opt/apps/extra-jars 目录下有个 smartdata-jindofs-2.7.301.jar ,可能版本不同
  • 将这个jar包复制到flume目录下lib文件夹内
已标记关键词 清除标记
©️2020 CSDN 皮肤主题: 书香水墨 设计师:CSDN官方博客 返回首页