请尝试以下Scala代码 .
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path
val hadoopConf = new Configuration()
val hdfs = FileSystem.get(hadoopConf)
val srcPath = new Path(srcFilePath)
val destPath = new Path(destFilePath)
hdfs.copyFromLocalFile(srcPath, destPath)
您还应该检查Spark是否在conf / spark-env.sh文件中设置了HADOOP_CONF_DIR变量 . 这将确保Spark将找到Hadoop配置设置 .
build.sbt文件的依赖项:
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.6.0"
libraryDependencies += "org.apache.commons" % "commons-io" % "1.3.2"
libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs" % "2.6.0"
OR
您可以使用来自apache commons的IOUtils将数据从InputStream复制到OutputStream
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.commons.io.IOUtils;
val hadoopconf = new Configuration();
val fs = FileSystem.get(hadoopconf);
//Create output stream to HDFS file
val outFileStream = fs.create(new Path("hdfs://:/output_path"))
//Create input stream from local file
val inStream = fs.open(new Path("hdfs://:/input_path"))
IOUtils.copy(inStream, outFileStream)
//Close both files
inStream.close()
outFileStream.close()