案例1
《《《《《《source-hive的log,channel-内存,sink:终端》》》》》》
# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per a1,
# in this case called 'a1'
a1.sources = s1
a1.channels = c1
a1.sinks = k1
# define the source
a1.sources.s1.type = exec
a1.sources.s1.command = tail -F /opt/cdh-5.3.6/hive-0.13.1-cdh5.3.6/logs/hive.log
a1.sources.s1.shell = /bin/sh -c
#define the channel
a1.channels.c1.type = memory
# define the sink
a1.sinks.k1.type = logger
# zuhe
a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1
案例2
《《《《《《source-hive的log,channel-file,sink:终端》》》》》》
# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per a1,
# in this case called 'a1'
a1.sources = s1
a1.channels = c1
a1.sinks = k1
# define the source
a1.sources.s1.type = exec
a1.sources.s1.command = tail -F /opt/cdh-5.3.6/hive-0.13.1-cdh5.3.6/logs/hive.log
a1.sources.s1.shell = /bin/sh -c
#define the channel
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /opt/datas/flume-ch/check
a1.channels.c1.dataDirs = /opt/datas/flume-ch/data
# define the sink
a1.sinks.k1.type = logger
# zuhe
a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1
案例3
《《《《《《source:exec,channel:mem,sink:HDFS》》》》》》
# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per a1,
# in this case called 'a1'
a1.sources = s1
a1.channels = c1
a1.sinks = k1
# define the source
a1.sources.s1.type = exec
a1.sources.s1.command = tail -F /opt/cdh-5.3.6/hive-0.13.1-cdh5.3.6/logs/hive.log
a1.sources.s1.shell = /bin/sh -c
#define the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 1000
# define the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/hdfs
a1.sinks.k1.hdfs.fileType = DataStream
##第一配置找到了hdfs
##第二如果目录不存在会主动创建
# zuhe
a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1
案例4
《《《《《《文件的大小》》》》》》
# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per a1,
# in this case called 'a1'
a1.sources = s1
a1.channels = c1
a1.sinks = k1
# define the source
a1.sources.s1.type = exec
a1.sources.s1.command = tail -F /opt/cdh-5.3.6/hive-0.13.1-cdh5.3.6/logs/hive.log
a1.sources.s1.shell = /bin/sh -c
#define the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 1000
# define the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/size
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.rollSize = 10240
a1.sinks.k1.hdfs.rollCount = 0
# zuhe
a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1
案例5
《《《《《《时间分区》》》》》》
# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per a1,
# in this case called 'a1'
a1.sources = s1
a1.channels = c1
a1.sinks = k1
# define the source
a1.sources.s1.type = exec
a1.sources.s1.command = tail -F /opt/cdh-5.3.6/hive-0.13.1-cdh5.3.6/logs/hive.log
a1.sources.s1.shell = /bin/sh -c
#define the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 1000
# define the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H-%M
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.hdfs.filePrefix = hive-log
# zuhe
a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1
案例6
《《《《《《时间分区》》》》》》
# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per a1,
# in this case called 'a1'
a1.sources = s1
a1.channels = c1
a1.sinks = k1
# define the source
a1.sources.s1.type = exec
a1.sources.s1.command = tail -F /opt/cdh-5.3.6/hive-0.13.1-cdh5.3.6/logs/hive.log
a1.sources.s1.shell = /bin/sh -c
#define the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 1000
# define the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H-%M
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.useLocalTimeStamp = tru
a1.sinks.k1.hdfs.filePrefix = hive-log
# zuhe
a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1
不设置a1.sinks.k1.hdfs.useLocalTimeStamp = true会报如下错误
案例7
《《《《《《监控文件夹》》》》》》
# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per a1,
# in this case called 'a1'
a1.sources = s1
a1.channels = c1
a1.sinks = k1
# define the source
a1.sources.s1.type = spooldir
a1.sources.s1.spoolDir = /opt/datas/flume-ch/spdir
#define the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 1000
# define the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/spdir
a1.sinks.k1.hdfs.fileType = DataStream
# zuhe
a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1
这是轮询模式,不是中断模式。如果检测的目录下没有生成文件,hdfs目录也不会创建,当检测目录下有文件后,HDFS目录也会创建
解决所有文件都会上传的方法:查找配置项,进行配置
# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per a1,
# in this case called 'a1'
a1.sources = s1
a1.channels = c1
a1.sinks = k1
# define the source
a1.sources.s1.type = spooldir
a1.sources.s1.spoolDir = /opt/datas/flume-ch/spdir
a1.sources.s1.ignorePattern = ([^ ]*\.tmp$)
#define the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 1000
# define the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/spdir
a1.sinks.k1.hdfs.fileType = DataStream
# zuhe
a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1
案例8
《《《《《《监控文件夹和文件》》》》》》
既要监控某一个目录,又要动态读取目录中文件的数据?
exec:动态读一个文件
spooling dir :动态读取文件夹
需要自动编译taildir
643

被折叠的 条评论
为什么被折叠?



