spark版本1.6.1
工程目录结构:
pom.xml文件内容
1 <?xml version="1.0" encoding="UTF-8"?> 2 <project xmlns="http://maven.apache.org/POM/4.0.0" 3 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 4 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> 5 <modelVersion>4.0.0</modelVersion> 6 7 <groupId>com.baidu.ghawk</groupId> 8 <artifactId>ghawk-spark</artifactId> 9 <version>1.0.0-SNAPSHOT</version> 10 11 12 <dependencies> 13 <dependency> 14 <groupId>org.apache.spark</groupId> 15 <artifactId>spark-core_2.10</artifactId> 16 <version>1.6.1</version> 17 <scope>provided</scope> 18 </dependency> 19 </dependencies> 20 21 <build> 22 <plugins> 23 <plugin> 24 <groupId>org.apache.maven.plugins</groupId> 25 <artifactId>maven-shade-plugin</artifactId> 26 <version>1.2.1</version> 27 <executions> 28 <execution> 29 <phase>package</phase> 30 <goals> 31 <goal>shade</goal> 32 </goals> 33 <configuration> 34 <transformers> 35 <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> 36 <mainClass>com.baidu.ghawk.spark.SparkMain</mainClass> 37 </transformer> 38 </transformers> 39 </configuration> 40 </execution> 41 </executions> 42 </plugin> 43 </plugins> 44 </build> 45 46 </project>
SparkMain.java文件代码内容:
1 import org.apache.spark.api.java.JavaRDD; 2 import org.apache.spark.SparkConf; 3 import org.apache.spark.api.java.JavaSparkContext; 4 import org.apache.spark.api.java.function.Function; 5 6 public class SparkMain { 7 8 public static void main(String[] args) { 9 String logfile = "README.md"; 10 SparkConf sparkConf = new SparkConf().setAppName("first java spark program"); 11 JavaSparkContext jsc = new JavaSparkContext(sparkConf); 12 13 JavaRDD<String> lines = jsc.textFile(logfile); 14 lines.cache(); 15 16 long numSpark = lines.filter( 17 new Function<String, Boolean>() { 18 public Boolean call(String s) throws Exception { 19 return s.contains("spark"); 20 } 21 } 22 ).count(); 23 24 System.out.println("result: " + numSpark); 25 } 26 }
编译工程:
$ mvn clean install
将生成的包ghawk-spark-1.0.0-SNAPSHOT.jar copy到spark二进制目录下
运行程序:
$ bin/spark-submit ghawk-spark-1.0.0-SNAPSHOT.jar