Comparison between Hive, Impala, Drill and SparkSQL

Hive

Impala

Drill

SparkSQL

Project Goal

Offline batch processing stuff;

Long running job performing data heavy operation, such as joins on huge data sets

Run real-time queries on top of existing Hadoop warehouse

Provides distributed query capability across multiple big data platform.

Query data from any or all of those data sources at the same time and can push down into the underlying storage system.

Execute SQL query, then deal with the result sets.

Similarity

Impala is designed based on Hive.

Using the same metadata.

All designed for Hadoop env.

Support query data from a variety of different datasources. (RDBMS, NoSQL, File, JSON...)

All support JDBC/ODBC drivers.

 

 

 

 

 

Difference

Suitable for Offline data processing

Focus on online real-time data processing

Not only hadoop project

 

 

 

 

Schema Free: all data is internally represented as either a simple or complex JSON data structure

 

 

 

Fully support SQL Query

(ANSI SQL:2003)

Just have SQL query capabilities

Subset of SQL (SQL-Like)

 

 

Supported by many BI tools

 

 

 

 

Better security support for data accessing

References:

https://www.javacodegeeks.com/2015/12/apache-spark-vs-apache-drill.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

yexianyi

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值