Spark&Spark streaming
那年夏天110
这个作者很懒,什么都没留下…
展开
-
spark sql+hive ETL
spark-sql 缺点:执行语句insert overwrite table xx…在结果目录会有大量小文件,容易内存溢出执行失败Spark sql +hivecreate table if not exists db.res(sum_id string,cnt_id string,dis_id string,cnt_uid string,dis_uid string) partitioned by (dt string comment '分区日期 yyyy-MM-dd') stored as o原创 2020-08-31 13:28:40 · 617 阅读 · 0 评论 -
spark集群app运行状态监控
部署运行机器:bdp-40*/1 * * * * /disk4/bd/spark_job_monitor/spark-job-monitor.shspark-job-monitor.sh 脚本代码:#!/bin/bash23 # spark job name array4 job_name_array=(5 'app_sparkToMg-1114'6 'sparkToDB-1016'7 'sparkToMg-0709'8 'sparkToMg_0822_online'9 )10原创 2020-08-31 13:00:27 · 378 阅读 · 0 评论 -
CDH+apache spark集群
spark-1.6.3-bin-hadoop2.6.tgz 官网下载,环境变量配置: /etc/profile bdp-40 节点81 # set java env82 export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk83 export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar84 export PATH=$P...原创 2020-08-31 12:57:43 · 303 阅读 · 0 评论 -
spark2.4.2+hadoop3编译
wget https://archive.apache.org/dist/spark/spark-2.4.2/spark-2.4.2.tgzspark2.4 pom.xml 修改: <!-- Add vendor maven repositories --> <!-- Cloudera --> <repository> <...原创 2020-08-31 10:58:36 · 864 阅读 · 2 评论