Spark开发
spark-core开发
使用IDEA开发spark程序
创建scala项目
配置maven依赖
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>SparkDemo</artifactId>
<groupId>com.realsee.aidata</groupId>
<version>1.0.0</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>spark-core</artifactId>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.0.1</version>
</dependency>
</dependencies>
<build>
<plugins>
<!-- 该插件用于将 Scala 代码编译成 class 文件 -->
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.2</version>
<executions>
<execution>
<!-- 声明绑定到 maven 的 compile 阶段 -->
<goals>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>3.1.0</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
scala代码开发
package com.realsee.aidata
import org.apache.log4j.Logger
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
/**
* @author weizhaolin
* @date 2022/7/29 16:45
*/
object Demo {
val logger = Logger.getLogger("Demo")
def main(args: Array[String]): Unit = {
// 创建连接
// 此处不配置AppName、Master等,配置写在spark-submit.sh里
val sparkConf = new SparkConf()
val sc: SparkContext = new SparkContext(sparkConf)
// args接收spark-submit.sh里传入的参数
// 以下写具体逻辑代码
println(args(0))
val rdd: RDD[Int] = sc.makeRDD(List(1, 2, 3, 4))
rdd.collect().foreach(println)
// 关闭连接
sc.stop()
}
}
使用maven打包
打包完成后,会在Project根目录下的out文件夹下,生成xxx.jar(本案例即spark-core.jar)
spark提交脚本开发
#!/bin/bash
if [ $# -ne 1 ]; then
echo "缺少参数flag:(dev or online or dev)"
echo "正确运行示例:sh $0 flag"
exit 1
fi
PT=$1
echo ''start spark-core.sh $PT''
spark-submit \
--class com.realsee.aidata.demo \
--name weizhaolin \
--master yarn \
--deploy-mode cluster \
--executor-memory 10g \
--num-executors 20 \
--executor-cores 2 \
--driver-memory 6g \
--conf spark.default.parallelism=100 \
--conf spark.storage.memoryfraction=0.6 \
--conf spark.sql.shuffle.partitions=1000 \
--conf spark.shuffle.memoryfraction=0.3 \
--conf spark.yarn.executor.memoryoverhead=2048 \
spark-core.jar \
$PT
if [ $? -ne 0 ]; then
echo "spark submit failed"
exit 1
fi
–class:scala类名
–name:app名称
–master:运行环境,yarn模式
–num-executors:executors数量
–executor-cores:每个executors核数
spark-sql开发
敬请期待… …