spark查询mysql会慢么,Spark查询运行非常慢

最新推荐文章于 2022-09-04 21:37:58 发布

weixin_39649405

最新推荐文章于 2022-09-04 21:37:58 发布

阅读量150

点赞数

文章标签： spark查询mysql会慢么

i have a cluster on AWS with 2 slaves and 1 master. All instances are of type m1.large. I'm running spark version 1.4. I'm benchmarking the performance of spark over 4m data coming from red shift. I fired one query through pyspark shell

df = sqlContext.load(source="jdbc", url="connection_string", dbtable="table_name", user='user', password="pass")

df.registerTempTable('test')

d=sqlContext.sql("""

select user_id from (

select -- (i1)

sum(total),

user_id

from

(select --(i2)

avg(total) as total,

user_id

from

test

group by

order_id,

user_id) as a

group by

user_id

having sum(total) > 0

) as b

"""

)

When i do d.count(), the above query takes 30 sec when df is not cached and 17sec when df is cached in memory.

I'm expecting these timings to be closer to 1-2s.

These are my spark configurations:

spark.executor.memory 6154m

spark.driver.memory 3g

spark.shuffle.spill false

spark.default.parallelism 8

rest is set to its default values. Can any one see what i'm missing here ?

解决方案

This is normal, don't except Spark to run in a few milli-secondes like mysql or postgres do. Spark is low latency compared to other big data solutions like Hive, Impala... you cannot compare it with classic database, Spark is not a database where data are indexed!

They clearly put Spark here:

Did you try Apache Drill? I found it a bit faster (I use it for small HDFS JSON files, 2/3Gb, much faster than Spark for SQL queries).

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

weixin_39649405

关注关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
spark查询mysql会慢么,Spark查询运行非常慢

i have a cluster on AWS with 2 slaves and 1 master. All instances are of type m1.large. I'm running spark version 1.4. I'm benchmarking the performance of spark over 4m data coming from red shift. I f...
复制链接

扫一扫