input:spark.txt
hadoop hive spark flume
hdfs spark zookeeper storm
flume hue flume hdfs
spark hive hdfs spark
map
scala版本:
scala> val lines = sc.textFile("/spark.txt")
scala> val words = lines.flatMap(line => line.split(" "))
scala> words.collect
结果:
Array[Array[String]] = Array(Array(hadoop, hive, spark, flume), Array(hdfs, spark, zookeeper, storm), Array(flume, hue, flume, hdfs), Array(spark, hive, hdfs, spark))
java版本:
JavaRDD<String> lines = sc.textFile("C:\\Users\\chenhaolin\\Desktop\\spark.txt");
JavaRDD<String[]> words = lines.map(new Function<String, String[]>() {
@Override
public String[] call(String v1) throws Exception {
return v1.split(" ");
}
});
flatMap
scala版本:
scala> val lines = sc.textFile(&