作业脚本采用Python语言编写,Spark为Python开发者提供了一个API—–PySpark,利用PySpark可以很方便的连接Hive
下面是准备要查询的HiveSQL
select
sum(o.sale_price)
,sum(case when cate_id2 in(16,18) then o.sale_price else 0 end )
,sum(CASE WHEN cate_id2 in(13,15,17,19,20,21,22,156) THEN o.sale_price else 0 end )
FROM dw.or_order_item_total o
join dw.cd_item_total i on o.item_id = i.item_id and i.ds ='2018-03-31'
WHERE o.ds = '2018-03-31' and substr(o.ord_tm,1,7) ='2018-03'
;
下面是准备提交的Python脚本
#!/usr/bin/python
#-*-coding:utf-8 -*-
from pyspark import SparkConf, SparkContext
from pyspark.sql import HiveContext
import sys
def test():
reload(sys)
sys.setdefaultencoding( "utf-8" )
conf = SparkConf().setMaster("yarn-client").setAppName("My App")
sc &