flume source kafka channels memory sinks hdfs

source:

org.apache.flume.source.kafka.KafkaSource

channels:

memory

sinks:

hdfs

kafka_sources.sources  = source1
kafka_sources.channels = channel1
kafka_sources.sinks = sink1

kafka_sources.sources.source1.type = org.apache.flume.source.kafka.KafkaSource
kafka_sources.sources.source1.kafka.bootstrap.servers = 172.168.1.17:9092,172.168.1.16:9092,172.168.1.14:9092
kafka_sources.sources.source1.kafka.topics = first_floor_test
kafka_sources.sources.source1.kafka.consumer.group.id = flume
kafka_sources.sources.source1.kafka.consumer.auto.commit.interval.ms = 60000
kafka_sources.sources.source1.channels = channel1

kafka_sources.channels.channel1.type = memory
kafka_sources.channels.channel1.capacity = 1000000
kafka_sources.channels.channel1.transactionCapacity = 1000000

kafka_sources.sinks.sink1.type = hdfs
kafka_sources.sinks.sink1.hdfs.path = hdfs://172.168.1.17:9000/first_floor/%Y%m%d
kafka_sources.sinks.sink1.hdfs.filePrefix = log
kafka_sources.sinks.sink1.compression.type = snappy
kafka_sources.sinks.sink1.channel = channel1

异常:

2019-11-09 15:23:52,414 (conf-file-poller-0) [ERROR - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:150)] Failed to start agent because dependencies were not found in classpath. Error follows.
java.lang.NoClassDefFoundError: org/apache/hadoop/io/SequenceFile$CompressionType
        at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:246)
        at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
        at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:453)
        at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:106)
        at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:145)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.SequenceFile$CompressionType
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 12 more

 解决方案:

${HADOOP_HOME}share/hadoop/common/hadoop-common-2.7.2.jar

找到这个jar文件,copy到flume安装目录下的lib目录下就ok了

好的,我可以为您提供一些关于使用FlumeKafka采集数据写入HDFS代理的步骤: 1. 安装FlumeKafka: 您需要在您的机器上安装FlumeKafka。安装过程可以参考官方文档或者网上教程。 2. 配置Flume agent: 您需要配置Flume agent来从Kafka读取数据并写入HDFS代理。以下是一个简单的示例Flume配置文件: ``` agent.sources = kafka-source agent.channels = hdfs-channel agent.sinks = hdfs-sink agent.sources.kafka-source.type = org.apache.flume.source.kafka.KafkaSource agent.sources.kafka-source.kafka.bootstrap.servers = localhost:9092 agent.sources.kafka-source.kafka.topics = my-topic agent.sources.kafka-source.batchSize = 1000 agent.sources.kafka-source.batchDurationMillis = 1000 agent.channels.hdfs-channel.type = memory agent.channels.hdfs-channel.capacity = 10000 agent.channels.hdfs-channel.transactionCapacity = 1000 agent.sinks.hdfs-sink.type = hdfs agent.sinks.hdfs-sink.hdfs.path = /path/to/hdfs agent.sinks.hdfs-sink.hdfs.filePrefix = events- agent.sinks.hdfs-sink.hdfs.fileSuffix = .log agent.sinks.hdfs-sink.hdfs.rollInterval = 3600 agent.sinks.hdfs-sink.hdfs.rollSize = 0 agent.sinks.hdfs-sink.hdfs.rollCount = 10000 agent.sinks.hdfs-sink.channel = hdfs-channel ``` 这个配置文件定义了一个名为kafka-sourcesource,它从名为my-topic的Kafka主题中读取数据。数据被写入一个内存通道(memory channel),并由名为hdfs-sink的sink写入HDFS代理。 3. 运行Flume agent: 在您的机器上运行Flume agent,使用以下命令: ``` $ bin/flume-ng agent -n agent -c conf -f /path/to/flume.conf ``` 其中,/path/to/flume.conf是您的Flume配置文件的路径。 以上是使用FlumeKafka采集数据写入HDFS代理的基本步骤,您可以根据您的需求进行修改和调整。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值