eBay open sources a big, fast SQL-on-Hadoop database

最新推荐文章于 2024-08-24 17:05:29 发布

miller_lover

最新推荐文章于 2024-08-24 17:05:29 发布

阅读量691

点赞数

分类专栏： big data 文章标签： ebay 大数据开源 kylin

big data 专栏收录该内容

101 篇文章 0 订阅

订阅专栏

eBay has open sourced a database technology, called Kylin, that takes advantage of distributed processing and the HBase data store in order to return faster results for SQL queries over Hadoop data.

Online auction site eBay has open sourced a database technology called Kylin that the company says enables fast queries over even petabytes of data stored in Hadoop. eBay isn’t a big data user on par with companies like Google and Facebook, but it does run technologies such as Hadoop at a fairly large scale and Kylin seems a good example of the type of innovation it’s doing on top of them.

eBay details Kylin in a blog post on Wednesday, citing among other features its REST APIs, ANSI-SQL compatibility, connections to analysis tools Tableau and Excel, and sub-second latency on some queries. However, the most unique features of Kylin involve how it deals with scale. eBay says it can query billions of rows of data — on datasets more that 14 terabytes in size — at speeds much faster than using the traditional Apache Hive tool.

The way Kylin works, at a high level, is to take data from Hive; pre-process large queries using MapReduce; and then store those results as key-value “cuboids” in HBase. When a user runs a Kylin query using a particular set of variables, the values are ready to go without requiring them to be processed again. It’s not entirely dissimilar from the cubes than analytic databases have been utilizing for years, but Kylin’s cuboids are designed with HBase’s preferred data structure in mind.

Here’s how eBay says Kylin has is used within the company:

At the time of open-sourcing Kylin, we already had several eBay business units using it in production. Our largest use case is the analysis of 12+ billion source records generating 14+ TB cubes. Its 90% query latency is less than 5 seconds. Now, our use cases target analysts and business users, who can access analytics and get results through the Tableau dashboard very easily – no more Hive query, shell command, and so on.

It would be interesting to know how Kylin stacks up against next-generation versions of Hive, Spark SQL and other options for SQL analysis in Hadoop that have emerged as a result of the YARN resource manager available in the latest versions of Apache Hadoop. My guess is it’s slower but more scalable than in-memory options or those not requiring MapReduce processing, but that it might be a solid option for the large percentage of Hadoop users still running earlier versions of the software.

miller_lover

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
eBay open sources a big, fast SQL-on-Hadoop database

摘要：eBay已经开源了一种数据库技术—— Kylin，它利用了分布式处理和HBase数据存储技术，目的是让Hadoop的SQL查询返回更快的结果。【编者按】eBay开源了一种名为 Kylin 的数据库技术，eBay在周三的一篇博客上分享了Kylin 的诸多细节，基于 Hadoop 提供 SQL 接口和 OLAP 接口，支持 TB 到 PB 级别的数据量，Kylin旨在减少Hadoop
复制链接

扫一扫

专栏目录