Flume性能测试报告(翻译Flume官方wiki报告)

因使用flume的时候总是会对其性能有所调研,网上找的要么就是自测的
这里找到一份官方wiki的测试报告供大家参考



https://cwiki.apache.org/confluence/display/FLUME/Performance+Measurements+-+round+2


 

测试环境:

以下测试基于单个agent

hadoop集群配置:20-node Hadoop cluster (1 name node and 19 data nodes).

服务器配置: 24 cores – Xeon E5-2640 v2 @ 2.00GHz, 164 GB RAM,  7200 rpm Hard Drive.  

1.     File channel with HDFS Sink (Sequence File):

基于1.4版本的flume测试,source为4个exec,channel为file,sink为hdfs

Flume version: 1.4

Source: 4 x Exec Source, 100k batchSize

HDFS Sink Batch size: 500,000

Event Size: 500 byte events.

Channel: File

Events/Sec
Sinks1 data dirs2 data dirs4 data dirs6 data dirs8 data dirs10 data dirs
114.3k(7Mb/s)     
221.9k     
4 35.8k    
8  72.5k77k78.6(37Mb/s)76.6k
10  58k   
12  49.3k49k  
 

 

Measurements were taken to get an idea around the configuration that yields best performance. So took measurements only for all data points in the grid that made sense. For example it was not necessary to take measurements for multiple dataDirs at single sink, as it was evident multiple HDFS sink would better than single sink config.

混合的多sinks要比单sink的效果好

2.     HDFS Sink:

相比1使用了内存channel ,memory channel

Flume version: 1.4

Channel: Memory

Event Size: 500 byte events.

#hdfs sinks

snappy batch

sz:1.2mill 

snappy batch

sz:1.4mill

 Sequence File

batch sz:1.2mill

 1 34.3k(17Mb/s) 33k 33k
 2

71k 

 75k 69k
 4141k  145k 141k
 8271k  273k 251k
 12382k  380k 370k
 16478k  538k(240M/s) 486k(232M/s)
 

 

Some simple observations:

  • increasing number of dataDirs helps FC perf even on single disk systems  
  • Increasing  number of sinks helps

 提高sink的数量是有显著效果的

3.     Hive Sink:

hive sink ,channel为内存,flume版本为1.5或者1.6

Flume version: 1.5 & 1.6

Channel: Memory

BatchSz:1million

Event Size: 500 byte events.

 Flume 1.5Flume 1.6
 Events/sMpsEvents/sMps
 1 Sink   
DELIMITED Text36,88518138,46166
Json12,7356  
     
     
 16 sinks(agent maxed out)  
DELIMITED Text209,600100348,214166
Json25,7511231,13514
     
 

 

Observation: Feeding JSON data to Hive sink is much slower, potentially due to higher parsing overhead of JSON in part.

 发送json数据格式会慢一些,主要是慢在json的解析上

 

4.     HBase Sink:

Flume version: 1.5

Channel: Memory

Serializer: RegexHbaseEventSerializer

Total Sinks: 1

Event Size(bytes)Batch Sz:1Batch Sz:100Batch Sz:1000Batch Sz:10000
500 11mb/s 11mb/s
10000.5bB/s14/mb/s22mb/s27mb/s
 

 

5.     ASync HBase Sink:

Flume version: 1.5

Channel: Memory

Serializer: SimpleAsyncHbaseEventSerializer

Total Sinks: 1

Event Size(bytes)Batch Sz:1Batch Sz:100Batch Sz:1000
500 0.4mb/s0.5mb/s
10000.8mb/s0.8mb/s0.9mb/s
 

 

6.     Kafka Source:

Flume version: 1.6

Channel: Memory

Sink: Null Sink

Event Size: 1000 bytes

Total Sinks: 1

Batch Size

(bytes)

Mb/s
1,00062
10,000112
20,000125
40,000147
80,000153

作 者:小闪电 

出处:http://www.cnblogs.com/yueyanyu/ 

本文版权归作者和博客园共有,欢迎转载、交流,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文链接。如果觉得本文对您有益,欢迎点赞、欢迎探讨。本博客来源于互联网的资源,若侵犯到您的权利,请联系博主予以删除。

 


 

转载于:https://www.cnblogs.com/yueyanyu/p/6972802.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值