本文前提是win10已经安装配置好了Java JDK,不再赘述。
1.下载安装Spark
1.1 先将spark-3.3.0-bin-hadoop3.tgz 解压为spark-3.3.0-bin-hadoop3.tar,再将tar文件解压到目标路径,比如C:\Apps\spark-3.3.0-bin-hadoop3。此包已自带Scala 2.12。
1.2 将C:\Apps\spark-3.3.0-bin-hadoop3\bin加到系统环境变量Path。新增变量SPARK_HOME和
HADOOP_HOME,值都是路径
C:\Apps\spark-3.3.0-bin-hadoop3
2. 下载安装winutils.exe
winutils/winutils.exe at master · steveloughran/winutils · GitHub
在windows下跑spark需要winutils.exe,因为它使用类
POSIX方式通过windows API进行文件访问操作.winutils.exe 使Spark可以使用Windows特有的服务包括在windows环境跑shell命令。
3.Spark命令行
命令行执行命令 spark-shell,
spark.version可以查看版本,也可以运行其他Spark代码,比如创建RDD
4.配置IntelliJ
4.1 安装Scala插件
File>Settings...>Plugins, 点击Install安装,完成后需要重启IntelliJ。Installed表示已安装。
4.2 设置Scala SDK
File>Project Structure...>Platform Settings / Global Libraries -> +/Add -> Scala SDK -> Download。下载安装对应版本2.12.15
4.3 下载sample工程,https://github.com/spark-examples/spark-hello-world-example
4.3.1 修改pom.xml,修改scala,spark对应版本号
<properties>
<scala.version>2.12.15</scala.version>
<spark.version>3.3.0</spark.version>
</properties>
4.3.2 插件报错
Failure to find org.apache.maven.plugins:maven-eclipse-plugin:pom
删除maven-eclipse-plugin在pom.xml的相关部分。
4.3.3 报错解决
class java.lang.RuntimeException/error reading Scala signature of package.class: Scala signature package has wrong version
expected: 5.0
found: 5.2 in package.class)
这是设置的Scala sdk版本与pom里的版本不一致引起的。修改pom.xml
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
</dependency>
最后的pom.xml如下
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>spark-hello-world-example</artifactId>
<version>1.0-SNAPSHOT</version>
<inceptionYear>2008</inceptionYear>
<packaging>jar</packaging>
<properties>
<scala.version>2.12.15</scala.version>
<spark.version>3.3.0</spark.version>
</properties>
<repositories>
<repository>
<id>scala-tools.org</id>
<name>Scala-Tools Maven2 Repository</name>
<url>http://scala-tools.org/repo-releases</url>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>scala-tools.org</id>
<name>Scala-Tools Maven2 Repository</name>
<url>http://scala-tools.org/repo-releases</url>
</pluginRepository>
</pluginRepositories>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.specs</groupId>
<artifactId>specs</artifactId>
<version>1.2.5</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<resources><resource><directory>src/main/resources</directory></resource></resources>
<plugins>
</plugins>
</build>
<reporting>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
</configuration>
</plugin>
</plugins>
</reporting>
</project>
4.3.4 创建SparkSessionTest应用,或选中SparkSessionTest右键 Run 'Spark Test'
运行结果如下:
5. 参考
How to Run Spark Hello World Example in IntelliJ - Spark by {Examples}