创建Scala项目
使用Idea创建项目:
指定<groupId>cn.ac.iie.spark</groupId>
和 <artifactId>sql</artifactId>
依赖包如下:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>cn.ac.iie.spark</groupId>
<artifactId>sql</artifactId>
<version>1.0-SNAPSHOT</version>
<inceptionYear>2008</inceptionYear>
<properties>
<scala.version>2.11.12</scala.version>
<spark.version>2.4.5</spark.version>
</properties>
<dependencies>
<!-- Scala -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<!-- Spark -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<version>2.15.0</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
<configuration>
<args>
<arg>-dependencyfile</arg>
<arg>${project.build.directory}/.scala_dependencies</arg>
</args>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.6</version>
<configuration>
<useFile>false</useFile>
<disableXmlReport>true</disableXmlReport>
<!-- If you have classpath issue like NoDefClassError,... -->
<!-- useManifestOnlyJar>false</useManifestOnlyJar -->
<includes>
<include>**/*Test.*</include>
<include>**/*Suite.*</include>
</includes>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass></mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
</plugins>
</build>
</project>
创建Object
创建一个Object,内容如下:
package cn.ac.iie.spark
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.SQLContext
/**
* SQLContextApp 的使用
* 注意:IDEA是在本地,而测试数据在服务器上,能不能在本地进行开发测试?可以
*/
object SQLContextApp {
def main(args: Array[String]): Unit = {
val path = args(0)
// 1. 创建相应的Context
val sparkConf= new SparkConf()
// 在生产或测试环境中,APPName和Master是通过脚本指定的
sparkConf.setAppName("SQLContextApp")
sparkConf.setMaster("local[2]")
val sc = new SparkContext(sparkConf)
val sqlContext = new SQLContext(sc)
// 2. 相关处理:json
val people = sqlContext.read.format("json").load(path)
people.printSchema()
people.show()
// 3. 关闭资源
sc.stop()
}
}
因为需要传参数,我们设置Edit Configuration
:
原始数据如下:
{"name":"Michael", "salary":3000}
{"name":"Andy", "salary":4500}
{"name":"Justin", "salary":3500}
{"name":"Berta", "salary":4000}
{"name":"vincent", "salary":90000}
控制台输出如下:
root
|-- name: string (nullable = true)
|-- salary: long (nullable = true)
+-------+------+
| name|salary|
+-------+------+
|Michael| 3000|
| Andy| 4500|
| Justin| 3500|
| Berta| 4000|
|vincent| 90000|
+-------+------+
可以看到正确打印出来了schema
和数据
在目录下使用maven进行打包:mvn clean package -DskipTests
提交到环境中运行
往往我们在生产或测试环境中,APPName和Master是通过脚本指定的,因此需要注释掉sparkConf.setAppName("SQLContextApp") sparkConf.setMaster("local[2]")
在服务器中提交:
./spark-submit --name SQLContextApp --class cn.ac.iie.spark.SQLContextApp --master local[2] /home/iie4bu/lib/sql-1.0-SNAPSHOT.jar /home/iie4bu/app/spark-2.4.5-bin-2.6.0-cdh5.15.1/examples/src/main/resources/people.json