Java Spark读取JSON文件
环境
CentOS Linux release 8.1.1911 (Core)
Spark version 3.0.0
java 14.0.2
Apache Maven 3.6.3
目录结构
./target
目录由mvn
编译打包生成
.
├── pom.xml
├── src
│ └── main
│ └── java
│ └── com
│ └── data
│ └── process
│ └── JSON.java
├── target
│ ├── classes
│ │ └── com
│ │ └── data
│ │ └── process
│ │ └── JSON.class
│ ├── data-process-example-0.0.1.jar
│ ├── generated-sources
│ │ └── annotations
│ ├── maven-archiver
│ │ └── pom.properties
│ └── maven-status
│ └── maven-compiler-plugin
│ └── compile
│ └── default-compile
│ ├── createdFiles.lst
│ └── inputFiles.lst
└── test.json
实现代码
./src/main/java/com/data/process/
目录下创建JSON.java
vim ./src/main/java/com/data/process/JSON.java
配置./pom.xml
文件
vim ./pom.xml
测试JSON
文件./test.json
vim ./test.json
Java代码
package com.data.process;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.Dataset;
public class JSON {
public static void main(String[] args) {
SparkSession sparkSession = SparkSession.builder().appName("JSON").master("local[*]").getOrCreate();
Dataset<Row> dataset = sparkSession.read().json("./test.json");
dataset.show();
}
}
Maven配置文件
<project>
<groupId>com.data.process</groupId>
<artifactId>data-process-example</artifactId>
<modelVersion>4.0.0</modelVersion>
<name>example</name>
<packaging>jar</packaging>
<version>0.0.1</version>
<dependencies>
<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.0.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>1.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.12</artifactId>
<version>2.4.0</version>
</dependency>
</dependencies>
<properties>
<java.version>1.9</java.version>
</properties>
<build>
<pluginManagement>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>${java.version}</source>
<target>${java.version}</target>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>
JSON文件
[{"a": 1, "b": 2, "c": 3, "d": 4, "e": 5}]
编译&打包&运行
mvn clean & mvn compile && mvn package && spark-submit --class com.data.process.$1 ./target/data-process-example-0.0.1.jar
测试结果
+---+---+---+---+---+
| a| b| c| d| e|
+---+---+---+---+---+
| 1| 2| 3| 4| 5|
+---+---+---+---+---+
最后
- 由于博主水平有限,不免有疏漏之处,欢迎读者随时批评指正,以免造成不必要的误解!