A Hadoop data pipeline to analyze applicaction performance

最新推荐文章于 2020-11-28 12:41:54 发布

天外有菌

最新推荐文章于 2020-11-28 12:41:54 发布

阅读量3k

点赞数

分类专栏： Hadoop 文章标签： Hadoop Flume Oozie Pig Hive

本文链接：https://blog.csdn.net/zhangjun2915/article/details/9005329

版权

1. Introduction

In recent years, Hadoop has been under the spotlight for its flexible and scalable architecture to store and process big data on commodity machines. One of its common use cases is to analyze application log files, as the size of log files generated by applications keeps increasing (volume) and log files are often unstructured (variety).

In this project, we have built a data pipeline to analyze application performance based on application performance data (appperfdata) extracted from log files and database performance data (db2perfdata) extracted from DBAU database. XXX is used as a sample application to analyze, but it can be tailored to analyze other applications as well.

In this sample use case, appperdata is the duration of restful APIs. For example, from below appperfdata, we can know how many milliseconds the restful API took to execute. In this example, it costs 283 miliseconds to complete API “/msqe/coreserverDEV2/webapp/maskingservice/needMasking/4964”. Therefore, by ordering the information based on API duration, we can see which restful APIs are poor performed and then do optimization accordingly.

2012-12-14-06-01 06:01:24.743 283 /XXX/webapp/maskingservice/needMasking/4964

On the other hand, from below db2perfdata, we can know how many select, read, update, insert, delete operations are performed at a certain time (currently it is collected at minute-level).
2012-12-14-06-01,3038,281910,383,365,0
By correlating appperfdata and db2perfdata based on timestamp, ‘2012-12-14-06-01’ in above examp

最低0.47元/天解锁文章

天外有菌

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
A Hadoop data pipeline to analyze applicaction performance

1. IntroductionIn recent years, Hadoop has been under the spotlight for its flexible and scalable architecture to store and process big data on commodity machines. One of its common use cases is to
复制链接

扫一扫

专栏目录