注意点:
1,记得把jar里的META-INF下除了mainfest.mf外的全删除,重新上传执行
2,spark自身带的4040界面看作业运行过程和状态
3,yarn模式在8080界面查看
4,spark操作hive的时候记得把hive-site.xml放到每个节点的/spark-2.0.0-preview-bin-hadoop2.6/conf下,因为都要查hive
5,在hive分区表中有的字段,不能出现在select中
val sqlStr = s""" |insert overwrite into daily_visit partition (date='$date') //日期,通常是通过参数传进来的 |select count(distinct guid) uv,sum(pv) pv, |count(case when pv>=2 then sessionid else null end) second_num, |count(sessionid) visits from |(select ds date, sessionid, max(guid) guid, count(url) pv from tracklog and hour='18' |and length(url) > 0 |group by ds,sessionid) a |group by date """.stripMargin6,不需要把mysql驱动包放到spark某目录
7,记得加
<property> <name>hive.metastore.uris</name> <value>thrift://master:9083</value> </property>