A Hadoop data pipeline to analyze applicaction performance

本文介绍了一个基于Hadoop的数据管道,用于分析应用程序性能。通过Flume收集日志和数据库性能数据,存储在HDFS和HBase中,利用Pig进行ETL转换,Oozie触发作业,最后用Hive进行分析和报告。该系统设计包括数据收集、存储、处理和分析四个阶段,旨在识别和优化性能低效的RESTful API。
摘要由CSDN通过智能技术生成

1.  Introduction

In recent years, Hadoop has been under the spotlight for its flexible and scalable architecture to store and process big data on commodity machines. One of its common use cases is to analyze application log files, as the size of log files generated by applications keeps increasing (volume) and log files are often unstructured (variety).   

In this project, we have built a data pipeline to analyze application performance based on application performance data (appperfdata) extracted from log files and database performance data (db2perfdata) extracted from DBAU database. XXX is used as a sample application to analyze, but it can be tailored to analyze other applications as well.


In this sample use case, appperdata is the duration of restful APIs. For example, from below appperfdata, we can know how many milliseconds the restful API took to execute. In this example, it costs 283 miliseconds to complete API “/msqe/coreserverDEV2/webapp/maskingservice/needMasking/4964”. Therefore, by ordering the information based on API duration, we can see which restful APIs are poor performed and then do optimization accordingly.

2012-12-14-06-01        06:01:24.743    283    /XXX/webapp/maskingservice/needMasking/4964

On the other hand, from below db2perfdata, we can know how many select, read, update, insert, delete operations are performed at a certain time (currently it is collected at minute-level).
2012-12-14-06-01,3038,281910,383,365,0
By correlating appperfdata and db2perfdata based on timestamp, ‘2012-12-14-06-01’ in above examp

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值