Writing from Flume to HDFS

转载 2016年06月01日 17:09:50

Example: Writing from Flume to HDFS

Apache Flume is a service for collecting log data. You can capture events in Flume and store them in HDFS for analysis. For a conceptual description of Flume, see theFlume User Guide. This example is a quick walkthrough to get Flume up and running.

Flume Out of the Box

To use Flume in a fresh Quickstart VM:

  1. Import a new VM instance.
  2. Configure the new VM.
    1. Allocate a minimum of 10023 MB memory.
    2. Allocate 2 CPUs.
    3. Allocate 20MB video memory.
    4. Consider setting the clipboard to bidirectional.
  3. Start the VM.
  4. Launch Cloudera Manager.
  5. In the browser, click the Cloudera Manager link.
  6. Start Hue.
  7. Start Flume.
  8. Use Telnet to test the default Flume implementation.
    1. Open a terminal window.
    2. Install Telnet with the commandsudo yum install telnet.
    3. Launch Telnet with the command telnet localhost 10001.
    4. At the prompt, enter Hello world!.
    5. Open /var/log/flume-ng/flume-cmf-flume-AGENT-quickstart.cloudera.log.
    6. Scroll to the bottom of the log, which should have an entry similar to the following.
      2015-06-05 15:45:55,561 INFO org.apache.flume.sink.LoggerSink: 
      Event: { headers:{} body: 48 65 6C 6C 6F 20 77 6F 72 6C 64 21 0D
               Hello world!. }

Writing from Flume to HDFS

You can configure Flume to write incoming messages to data files stored in HDFS for later processing.

To configure Flume to write to HDFS:

  1. In the VM web browser, open Hue.
  2. Click File Browser.
  3. Create the /flume/events directory.
    1. In the /user/clouderadirectory, click New->Directory.
    2. Create a directory namedflume.
    3. In the flume directory, create a directory named events.
    4. Check the box to the left of theevents directory, then click thePermissions setting.
    5. Enable Write access for Groupand Other users.
    6. Click Submit.
  4. Change the Flume configuration.
    1. Open Cloudera Manager in your web browser.
    2. In the list of services, click Flume.
    3. Click the Configuration tab.
    4. Scroll or search for theConfiguration File item.
    5. Append the following lines to theConfiguration File settings.
      tier1.sinks.sink1.type= HDFS
      tier1.sinks.sink1.fileType=DataStream
      tier1.sinks.sink1.channel      = channel1
      tier1.sinks.sink1.hdfs.path = hdfs://localhost:8020/user/cloudera/flume/events
    6. At the top of the settings list, click Save Changes.
  5. On the far right, choose Actions->Restart to restart Flume.
  6. When the restrart is complete, clickClose.
  7. Click the Home tab. If necessary, start the Yarn service.
  8. In a terminal window, launch Telnet with the command telnet localhost 10001.
  9. At the prompt, enter Hello HDFS!.
  10. In the Hue File Browser, open the/user/cloudera/flume/eventsdirectory.
  11. There will be a file named FlumeDatawith a serial number as the file extension. Click the file name link to view the data sent by Flume to HDFS. The output is similar to the following.
    0000000: 53 45 51 06 21 6f 72 67 2e 61 70 61 63 68 65 2e SEQ.!org.apache.
    0000010: 68 61 64 6f 6f 70 2e 69 6f 2e 4c 6f 6e 67 57 72 hadoop.io.LongWr
    0000020: 69 74 61 62 6c 65 22 6f 72 67 2e 61 70 61 63 68 itable"org.apach
    0000030: 65 2e 68 61 64 6f 6f 70 2e 69 6f 2e 42 79 74 65 e.hadoop.io.Byte
    0000040: 73 57 72 69 74 61 62 6c 65 00 00 00 00 00 00 85 sWritable.......
    0000050: a6 6f 46 0c f4 16 33 a6 eb 43 c2 21 5c 1b 4f 00 .oF...3..C.!\.O.
    0000060: 00 00 18 00 00 00 08 00 00 01 4d c6 1b 01 1f 00 ..........M.....
    0000070: 00 00 0c 48 65 6c 6c 6f 20 48 44 46 53 21 0d    ...Hello HDFS!.

Sentiment Analysis of Input from Flume

Now that Flume is sending data to HDFS, you can apply the Sentiment Analysis example to comments you enter.

All of the source for this example is provided in flumeToHDFS.tar.gz, which contains:
  • flume.config
  • makefile
  • Map.java
  • MrManager.java
  • Reduce.java
  • neg-words.txt
  • pos-words.txt
  • stop-words.txt
  • /shakespeare
    • comedies
    • histories
    • poems
    • tragedies
To test sentiment analysis with Flume input:
  1. Expand flumeToHDFS.tar.gz on the VM.
  2. In a terminal window, navigate to the /flume2hdfs
  3. Launch Telnet with the commandtelnet localhost 10001.
  4. Enter the following lines, hitting Enter after each line.(Telnet returns the response OK to each line).
    I enjoy using CDH. I think CDH is wonderful.
    I like the power and flexibility of CDH.
    I dislike brussels sprouts. I hate mustard greens.
    Flume is a great product. I have several use cases in mind for which it is well suited.
  5. Enter run_flume to start the Sentiment Analysis example via the makefile. The application returns results from all counters, ending with the custom counters and report.
    	org.myorg.Map$Gauge
    		NEGATIVE=2
    		POSITIVE=6
    **********
    Sentiment score = (6.0 - 2.0) / (6.0 + 2.0)
    Sentiment score = 0.5
    
    Positivity score = 6.0/(6.0+2.0)
    Positivity score = 75%
    ********** 
Page generated October 23, 2015.

Flume的体系结构介绍以及Flume入门案例(往HDFS上传数据)

# Flume的体系结构 对java有兴趣的朋友可以加上面说的553175249这个群哦,一起学习,共同进步 . # Flume介绍 Flume是Cloudera提...
  • qq_26418435
  • qq_26418435
  • 2016年06月07日 14:54
  • 798

关于flume 中spooldir传输数据报出HDFS IO error ..... File type DataStream not supported 错误解决

不管在什么地方,什么时候,学习是快速提升自己的能力的一种体现!!!!!!!!!!! 今天在测试flume中spooldir传输数据的时候报出了一个HDFS IO 错误,错误如下图 错误...
  • hui_2016
  • hui_2016
  • 2017年04月20日 10:38
  • 563

flume学习(三):flume将log4j日志数据写入到hdfs

在第一篇文章中我们是将log4j
  • xiao_jun_0820
  • xiao_jun_0820
  • 2014年07月25日 10:59
  • 16872

Flume HDFS Sink使用及源码分析

HDFS Sink介绍 Flume导入数据HDFS,目前只支持创建序列化(sequence)文件和文本(text)文件。还支持这两个文件的压缩。文件可以根据运行的时间,数据的大小和时间的数量来进行周期...
  • qianshangding0708
  • qianshangding0708
  • 2015年11月06日 14:09
  • 3661

flume采集本地数据到hdfs

配置: agent1.sources = spooldirSource agent1.channels = fileChannel agent1.sinks = hdfsSink agent1.so...
  • xiaoshunzi111
  • xiaoshunzi111
  • 2016年08月30日 19:34
  • 1294

log4j+flume+HDFS实现日志存储

log4j  日志生成flume  日志收集系统,收集日志,使用版本apache-flume-1.6.0-bin.tar.gz .HDFS  Hadoop分布式文件系统,存储日志,使用版本had...
  • sum__mer
  • sum__mer
  • 2016年09月08日 21:12
  • 2573

Flume-NG + HDFS + HIVE 日志收集分析

最近做了一个POC,目的是系统日志的收集和分析,此前有使用过splunk,虽然用户体验很好,但一是价格昂贵,二是不适合后期开发(splunk已经推出了SDK,后期开发已经变得非常容易)。在收集TB级别...
  • cnbird2008
  • cnbird2008
  • 2014年02月07日 17:43
  • 15242

【Flume】【源码分析】flume中sink到hdfs,文件系统频繁产生文件,文件滚动配置不起作用?

本人在测试hdfs的sink,发现sink端的文件滚动配置项起不到任何作用,配置如下: a1.sinks.k1.type=hdfs a1.sinks.k1.channel=c1 a1.sinks.k1...
  • chiweitree
  • chiweitree
  • 2015年01月28日 16:28
  • 5724

Flume中的HDFS Sink配置

Flume中的HDFS Sink配置参数说明 type:hdfs path:hdfs的路径,需要包含文件系统标识,比如:hdfs://namenode/flume/webdata/ fi...
  • u012689336
  • u012689336
  • 2016年09月30日 16:22
  • 1177

Flume:本地文件到HDFS

Flume下载地址apache-flume-1.6.0-bin.tar.gz http://pan.baidu.com/s/1o81nR8e s832apache-flume-1.5.2-bin.ta...
  • silentwolfyh
  • silentwolfyh
  • 2016年04月15日 23:35
  • 1146
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:Writing from Flume to HDFS
举报原因:
原因补充:

(最多只允许输入30个字)