spark load file的几种方式:
1、直接导入localfile,而不是HDFSsc.textFile("file:///path to the file/")
如sc.textFile("file:///home/spark/Desktop/README.md")
注意:
当设置了HADOOP_CONF_DIR的时候,即配置了集群环境的时候,如果直接sc.textFile("path/README.md")
路径会自动变成: hdfs://master:9000/user/spark/README.md
这个时候如果HDFS中没有,就会说,input path does not exist
2、给hdfs 的路径也可以
相关内容:
1、Spark Quick Start - call to open README.md needs explicit fs prefix
Good catch; the Spark cluster on EC2 is configured to use HDFS as its default filesystem, so
it can’t find this file. The quick start was written to run on a single machine with an
out-of-the-box install. If you’d like to upload this file to the HDFS cluster on EC2, use
the following command:
2、
This has