select
mid_id,
dt,
date_diff,
count(*)over(partition by mid_id,date_diff) diff_count
from (
select mid_id,
dt,
date_add(dt, -rk) date_diff
from (
select mid_id,
dt,
rank() over (partition by mid_id order by dt) rk
from (
select mid_id,
dt
from dws_uv_detail_daycount
where dt >= date_add('2020-06-20', -6)
group by mid_id, dt--日期相关。一个设备在一个日期使用。
) t1
) t2
)t3
求最近七天内连续三天活跃用户数时,报错。信息如下:
Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed during runtime. Please check stacktrace for the root cause.
原因如下:
①我们把hive的计算引擎替换为spark,有可能是内存不足,。
②有可能是源码不兼容。
如果在计算引擎为MapReduce的hive上运行,是不会报错的。
希望大家可以关注下公众号,会定期分享自己从业经历、技术积累及踩坑经验,支持一下,鞠躬感谢~
关注公众号,回复“资料全集”,不定期最新大数据业内资讯。