环境如下:(更新了林子雨教程中不可使用的部分)
Hadoop 2.7.1
java JDK 11
Spark 3.1.1
1.maven安装
2.java应用程序编码
/*** SimpleApp.java **/
import org.apache.spark.api.java.;
import org.apache.spark.api.java.function.Function;
public class SimpleApp {
public static void main(String[] args) {
String logFile = "file:///usr/local/spark/README.md"; // Should be some file on your system
JavaSparkContext sc = new JavaSparkContext("local", "Simple App",
"file:///usr/local/spark/", new String[]{"target/simple-project-1.0.jar"});
JavaRDD<String> logData = sc.textFile(logFile).cache();
long numAs = logData.filter(new Function<String, Boolean>() {
public Boolean call(String s) { return s.contains("a"); }
}).count();
long numBs = logData.filter(new Function<String, Boolean>() {
public Boolean call(String s) { return s.contains("b"); }
}).count();
System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs);
}
}
该程序依赖Spark Java API,因此我们需要通过Maven进行编译打包。在./sparkapp2中新建文件pom.xml(vim ./sparkapp2/pom.xml),添加内容如下,声明该独立应用程序的信息以及与Spark的依赖关系:
edu.berkeley
simple-project
4.0.0
Simple Project
jar
1.0
Akka repository
http://repo.akka.io/releases
org.apache.spark
spark-core_2.12
3.1.1
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.encoding>UTF-8</maven.compiler.encoding>
<java.version>11</java.version>
<maven.compiler.source>11</maven.compiler.source>
<maven.compiler.target>11</maven.compiler.target>
关于Spark dependency的依赖关系,可以访问The Central Repository。搜索spark-core可以找到相关依赖关系信息。
3…使用maven打包java程序