pom.xml
SparkTest.java
可以单独运行,也可以提交到spark集群: spark-submit.cmd --class SparkTest D:\workspace\spark-test\target\spark-test-0.0.1-SNAPSHOT.jar
- <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
- xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
- <modelVersion>4.0.0</modelVersion>
- <groupId>active </groupId>
- <artifactId>spark-test</artifactId>
- <version>0.0.1-SNAPSHOT</version>
-
- <dependencies>
- <dependency>
- <groupId>org.apache.spark</groupId>
- <artifactId>spark-core_2.10</artifactId>
- <version>2.1.0</version>
- </dependency>
- </dependencies>
- </project>
- import java.util.Arrays;
-
- import org.apache.spark.SparkConf;
- import org.apache.spark.api.java.JavaPairRDD;
- import org.apache.spark.api.java.JavaRDD;
- import org.apache.spark.api.java.JavaSparkContext;
-
- import scala.Tuple2;
-
- public class SparkTest {
-
- public static void main(String[] args) {
- SparkConf conf = new SparkConf().setAppName("Test").setMaster("local");
- JavaSparkContext sc = new JavaSparkContext(conf);
- JavaRDD<String> file = sc.parallelize(Arrays.asList("Hello test", "Hello test2", "dds"));
- JavaRDD<String> words = file.flatMap(s -> Arrays.asList(s.split(" |\t|\n|\r")).iterator());
- JavaPairRDD<String, Integer> counts = words.mapToPair(s -> new Tuple2<String, Integer>(s, 1));
- counts = counts.reduceByKey((x, y) -> x + y);
-
- System.out.println(counts.collect());
- sc.close();
- }
-
- }
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/10742815/viewspace-2134860/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/10742815/viewspace-2134860/