流程:
mapToPair + reduceByKey:计算每个url的出现次数;
mapToPair + sortByKey:反转PairRDD,按照降序的方式对url的出现次数进行排序;
take():获取前十个url。
sparkConf.put("es.resource", indexName + "/" + indexType);
sparkConf.put("es.nodes", hosts);
sparkConf.put("es.port", "8080");
sparkConf.put("es.query", queryStr);
sparkConf.put("es.scroll.size", "1000");
sparkConf.put("spark.executor.memory", "3072m");
String appName = "findTopNURLFromES";
int cores = SparkUtil.getSparkCore(start, end);
JavaSparkContext jsc = SparkUtil.createSparkContext(this.getClass(), false, appName, cores, sparkConf);
JavaPairRDD<String, Map<String, Object>> esRDD = JavaEsSpark.esRDD(jsc).pe