在下载下来的spark里,有个样例程序叫做JavaSparkPi,大意是利用Spark的MapReduce函数求圆周率.
代码如下:
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.spark.examples;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.Function2;
import java.util.ArrayList;
import java.util.List;
/**
* Computes an approximation to pi
* Usage: JavaSparkPi [slices]
*/
public final class JavaSparkPi {
public static void main(String[] args) throws Exception {
SparkConf sparkConf = new SparkConf().setAppName("JavaSparkPi");
sparkConf.setMaster("local");
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
int slices = (args.length == 1) ? Integer.parseInt(args[0]) : 2;
int n = 3000000 * slices;
List<Integer> l = new ArrayList<Integer>(n);
for (int i = 0; i < n; i++) {
l.add(i);
}
JavaRDD<Integer> dataSet = jsc.parallelize(l, slices);
int count = dataSet.map(new Function<Integer, Integer>() {
@Override
public Integer call(Integer integer) {
double x = Math.random() * 2 - 1;
double y = Math.random() * 2 - 1;
return (x * x + y * y < 1) ? 1 : 0;
}
}).reduce(new Function2<Integer, Integer, Integer>() {
@Override
public Integer call(Integer integer, Integer integer2) {
return integer + integer2;
}
});
System.out.println("Pi is roughly " + 4.0 * count / n);
jsc.stop();
}
}
代码一开始构造了一个很大的集合.然后利用Map函数迭代,并随机采样坐标点.
实现背景几何解剖大致如下
取圆心x,y正负1区间为正方形,那么正方形面积为4.
取半径为1圆,圆心坐标为0,0.那么圆形面积为3.141........,也就是元周率.
代码开始随机采样坐标点,并判断坐标点是否在圆内.
double x = Math.random() * 2 - 1;
double y = Math.random() * 2 - 1;
return (x * x + y * y < 1) ? 1 : 0;
随机构造X,Y,Math.random只会返回小于1的数,所以后面的乘以2减去1,必然是在正方形内.
x*x+y*y=1反映的是坐标是否在圆周上.那么<1自然就是判断是否在圆内部了.
假设在圆内,就返回1,否则返回0,结合后面的reduce就可以得到总共有多少个点是在圆内的.
已知合计n个采样点,共count个在圆内的点.
那么count/n就可以得出 采样点在圆内的合计数 所在 总共采样点个数的比例.利用这个比例去乘以正方形面积.就可以得到元周率近似值.
结论,当采样数越大,得出的圆周率越精确.