之前一直是在交互式操作hive表,这次使用scala编译成jar包,通过spark-submit提交到yarn
-
首先由于是用maven来管理scala项目,所以需要增加pom.xml的配置文件,要注意scala版本要跟spark的jar包中的scala版本一致,否则编译出来的jar包会报错
java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: scala/runtime
:参考链接https://blog.csdn.net/u013054888/article/details/54600229 -
在交互式界面中,sparkcontext的环境都已经准备好了,所以没有什么问题,但是在jar包中要自己初始化sparkSession,之前以为配置什么的都要手动加载,但是发现其实spark提交给yarn时,已经把相关的conf配置一起打包了:
参考链接https://blog.csdn.net/piduzi/article/details/81636253
object Main { def main(args: Array[String]) = { val spark = sparkBuilder.enableHiveSupport().getOrCreate() spark.stop() } }
pom.xml中scala相关配置
<properties>
<scala.version>2.11.12</scala.version>
</properties>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-compiler</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-reflect</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-actors</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<version>2.15.2</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<plugin>
<!-- 这是个编译scala代码的 -->
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.1.6</version>
<executions>
<execution>
<id>scala-compile-first</id>
<phase>process-resources</phase>
<goals>
<goal>add-source</goal>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>