Writing from Flume to HDFS

Example: Writing from Flume to HDFS

Apache Flume is a service for collecting log data. You can capture events in Flume and store them in HDFS for analysis. For a conceptual description of Flume, see theFlume User Guide. This example is a quick walkthrough to get Flume up and running.

Flume Out of the Box

To use Flume in a fresh Quickstart VM:

  1. Import a new VM instance.
  2. Configure the new VM.
    1. Allocate a minimum of 10023 MB memory.
    2. Allocate 2 CPUs.
    3. Allocate 20MB video memory.
    4. Consider setting the clipboard to bidirectional.
  3. Start the VM.
  4. Launch Cloudera Manager.
  5. In the browser, click the Cloudera Manager link.
  6. Start Hue.
  7. Start Flume.
  8. Use Telnet to test the default Flume implementation.
    1. Open a terminal window.
    2. Install Telnet with the commandsudo yum install telnet.
    3. Launch Telnet with the command telnet localhost 10001.
    4. At the prompt, enter Hello world!.
    5. Open /var/log/flume-ng/flume-cmf-flume-AGENT-quickstart.cloudera.log.
    6. Scroll to the bottom of the log, which should have an entry similar to the following.
      2015-06-05 15:45:55,561 INFO org.apache.flume.sink.LoggerSink: 
      Event: { headers:{} body: 48 65 6C 6C 6F 20 77 6F 72 6C 64 21 0D
               Hello world!. }

Writing from Flume to HDFS

You can configure Flume to write incoming messages to data files stored in HDFS for later processing.

To configure Flume to write to HDFS:

  1. In the VM web browser, open Hue.
  2. Click File Browser.
  3. Create the /flume/events directory.
    1. In the /user/clouderadirectory, click New->Directory.
    2. Create a directory namedflume.
    3. In the flume directory, create a directory named events.
    4. Check the box to the left of theevents directory, then click thePermissions setting.
    5. Enable Write access for Groupand Other users.
    6. Click Submit.
  4. Change the Flume configuration.
    1. Open Cloudera Manager in your web browser.
    2. In the list of services, click Flume.
    3. Click the Configuration tab.
    4. Scroll or search for theConfiguration File item.
    5. Append the following lines to theConfiguration File settings.
      tier1.sinks.sink1.type= HDFS
      tier1.sinks.sink1.fileType=DataStream
      tier1.sinks.sink1.channel      = channel1
      tier1.sinks.sink1.hdfs.path = hdfs://localhost:8020/user/cloudera/flume/events
    6. At the top of the settings list, click Save Changes.
  5. On the far right, choose Actions->Restart to restart Flume.
  6. When the restrart is complete, clickClose.
  7. Click the Home tab. If necessary, start the Yarn service.
  8. In a terminal window, launch Telnet with the command telnet localhost 10001.
  9. At the prompt, enter Hello HDFS!.
  10. In the Hue File Browser, open the/user/cloudera/flume/eventsdirectory.
  11. There will be a file named FlumeDatawith a serial number as the file extension. Click the file name link to view the data sent by Flume to HDFS. The output is similar to the following.
    0000000: 53 45 51 06 21 6f 72 67 2e 61 70 61 63 68 65 2e SEQ.!org.apache.
    0000010: 68 61 64 6f 6f 70 2e 69 6f 2e 4c 6f 6e 67 57 72 hadoop.io.LongWr
    0000020: 69 74 61 62 6c 65 22 6f 72 67 2e 61 70 61 63 68 itable"org.apach
    0000030: 65 2e 68 61 64 6f 6f 70 2e 69 6f 2e 42 79 74 65 e.hadoop.io.Byte
    0000040: 73 57 72 69 74 61 62 6c 65 00 00 00 00 00 00 85 sWritable.......
    0000050: a6 6f 46 0c f4 16 33 a6 eb 43 c2 21 5c 1b 4f 00 .oF...3..C.!\.O.
    0000060: 00 00 18 00 00 00 08 00 00 01 4d c6 1b 01 1f 00 ..........M.....
    0000070: 00 00 0c 48 65 6c 6c 6f 20 48 44 46 53 21 0d    ...Hello HDFS!.

Sentiment Analysis of Input from Flume

Now that Flume is sending data to HDFS, you can apply the Sentiment Analysis example to comments you enter.

All of the source for this example is provided in  flumeToHDFS.tar.gz, which contains:
  • flume.config
  • makefile
  • Map.java
  • MrManager.java
  • Reduce.java
  • neg-words.txt
  • pos-words.txt
  • stop-words.txt
  • /shakespeare
    • comedies
    • histories
    • poems
    • tragedies
To test sentiment analysis with Flume input:
  1. Expand flumeToHDFS.tar.gz on the VM.
  2. In a terminal window, navigate to the /flume2hdfs
  3. Launch Telnet with the commandtelnet localhost 10001.
  4. Enter the following lines, hitting Enter after each line.(Telnet returns the response OK to each line).
    I enjoy using CDH. I think CDH is wonderful.
    I like the power and flexibility of CDH.
    I dislike brussels sprouts. I hate mustard greens.
    Flume is a great product. I have several use cases in mind for which it is well suited.
  5. Enter run_flume to start the Sentiment Analysis example via the makefile. The application returns results from all counters, ending with the custom counters and report.
    	org.myorg.Map$Gauge
    		NEGATIVE=2
    		POSITIVE=6
    **********
    Sentiment score = (6.0 - 2.0) / (6.0 + 2.0)
    Sentiment score = 0.5
    
    Positivity score = 6.0/(6.0+2.0)
    Positivity score = 75%
    ********** 
Page generated October 23, 2015.
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值