大数据技术实践

一、Flume

JDK版本:1.8.0_211

Flume版本:1.8.0

下载:略

配置:

  • 系统环境变量
    • export FLUME_HOME=/usr/local/flume/apache-flume-1.8.0-bin
    • export FLUME_CONF_DIR=$FLUME_HOME/conf
    • PATH加上$FLUME_HOME/bin
  • conf/flume-env.sh配置JAVA_HOME

1、使用Flume接收来自AvroSource的信息

  • conf/avro.conf
    a1.sources=r1
    a1.sinks=k1
    a1.channels=c1
    
    #Describe/configure the source
    a1.sources.r1.type=avro
    a1.sources.r1.channels=c1
    a1.sources.r1.bind=0.0.0.0
    a1.sources.r1.port=4141
    
    #Describe the sink
    a1.sinks.k1.type=logger
    
    #Use a channel which buffers events in memory
    a1.channels.c1.type=memory
    a1.channels.c1.capacity=1000
    a1.channels.c1.transactionCapacity=100
    
    #Bind thr source and sink to the channel
    a1.sources.r1.channels=c1
    a1.sinks.k1.channel=c1

     

  • 启动控制台

    /usr/local/flume/apache-flume-1.8.0-bin/bin/flume-ng agent -c . -f /usr/local/flume/apache-flume-1.8.0-bin/conf/avro.conf -n a1 -Dflume.root.logger=INFO,console

     

  • 再开一个shell,在Flume主目录下创建文件
     

    sh -c 'echo "hello, world" > /usr/local/flume/apache-flume-1.8.0-bin/log.00'

     

  • 执行命令
     

    ./bin/flume-ng avro-client --conf conf -H localhost -p 4141 -F /usr/local/flume/apache-flume-1.8.0-bin/log.00

     

  • 观察主Shell,已输出结果。

2、使用Flume接收来自NetcatSource的信息

  • conf/example.conf
    #example.conf: A single-node Flume configuration
    #Name the components on this agent
    a1.sources=r1
    a1.sinks=k1
    a1.channels=c1
    
    #Describe/configure the source
    a1.sources.r1.type=netcat
    a1.sources.r1.bind=loca1host
    a1.sources.r1.port=44444
    
    #Describe the sink
    a1.sinks.k1.type=logger
    
    #Use a channel which buffers events in memory
    a1.channels.c1.type=memory
    a1.channels.c1.capacity=1000
    a1.channels.c1.transactionCapacity=100
    
    #Bind thr source and sink to the channel
    a1.sources.r1.channels=c1
    a1.sinks.k1.channel=c1

     

  • 启动控制台
    /usr/local/flume/apache-flume-1.8.0-bin/bin/flume-ng agent --conf ./conf --conf-file ./conf/example.conf --name a1 -Dflume.root.logger=INFO,console

     

  • 再开一个shell,输入以下命令
    telnet localhost 44444

     

  • 在该shell中输入的内容会同步到主Shell中

3、Flume接收本地文件

  • conf/exec1.conf
    #Name the components on this agent
    a1.sources=r1
    a1.sinks=k1
    a1.channels=c1
    
    #For each one of the source, the type is defined
    a1.sources.r1.type=exec
    a1.sources.r1.command=tail -F /usr/local/hadoop/hadoop-2.7.7/logs/hadoop-root-datanode-bigdata.log
    #whereis bash
    a1.sources.r1.shell=/usr/bin/bash -c
    
    #Each sink's type must be defined
    a1.sinks.k1.type=hdfs
    a1.sinks.k1.hdfs.path=hdfs://bigdata:9000/flume/%y%m%d/%H
    
    a1.sinks.k1.hdfs.filePrefix=logs-
    a1.sinks.k1.hdfs.round=true
    a1.sinks.k1.hdfs.roundValue=1
    a1.sinks.k1.hdfs.roundUnit=minute
    a1.sinks.k1.hdfs.useLocalTimeStamp=true
    a1.sinks.k1.hdfs.batchSize=100
    a1.sinks.k1.hdfs.fileType=DataStream
    a1.sinks.k1.hdfs.rollInterval=30
    a1.sinks.k1.hdfs.rollSize=134217700
    a1.sinks.k1.hdfs.rollCount=0
    a1.sinks.k1.hdfs.minBlockReplicas=1
    
    
    #Specify the channel should use
    a1.channels.c1.type=memory
    a1.channels.c1.capacity=1000
    a1.channels.c1.transactionCapacity=100
    
    #Bind thr source and sink to the channel
    a1.sources.r1.channels=c1
    a1.sinks.k1.channel=c1
     
    

     

  • 启动控制台
    /usr/local/flume/apache-flume-1.8.0-bin/bin/flume-ng agent --conf ./conf --conf-file ./conf/exec1.conf --name a1 -Dflume.root.logger=INFO,console

     

  • 去HDFS观察结果

4、Flume接收本地文件夹

  • conf/spooldir1.conf
    #agent名, source、channel、sink的名称
    a1.sources = r1
    a1.channels = c1
    a1.sinks = k1
    #具体定义source
    a1.sources.r1.type = spooldir
    a1.sources.r1.spoolDir = /usr/local/flume/apache-flume-1.8.0-bin/logs
    #具体定义channel
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 100
    #具体定义sink
    a1.sinks.k1.type = hdfs
    a1.sinks.k1.hdfs.path = hdfs://bigdata:9000/flume/%Y%m%d
    a1.sinks.k1.hdfs.filePrefix = events-
    a1.sinks.k1.hdfs.fileType = DataStream
    a1.sinks.k1.hdfs.useLocalTimeStamp = true
    #不按照条数生成文件
    a1.sinks.k1.hdfs.rollCount = 0
    #HDFS上的文件达到128M时生成一个文件
    a1.sinks.k1.hdfs.rollSize = 134217700
    #HDFS上的文件达到60秒生成一个文件
    a1.sinks.k1.hdfs.rollInterval = 30
    
    #组装source、channel、sink
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1

     

  • 启动控制台
    /usr/local/flume/apache-flume-1.8.0-bin/bin/flume-ng agent --conf ./conf --conf-file ./conf/spooldir1.conf --name a1 -Dflume.root.logger=INFO,console

     

  • 去HDFS观察结果
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值