通过 Python 使用 HDFS API 读取文件Use the HDFS API to read files in Python
03/23/2020
本文内容
有时你可能想直接读取文件而不使用第三方库。There may be times when you want to read files directly without using third party libraries. 当你的常规存储 Blob 无法用作本地 DBFS 装载时,这对于读取小文件很有用。This can be useful for reading small files when your regular storage blobs are not available as local DBFS mounts.
将以下示例代码用于 Azure Blob 存储。Use the following example code for Azure Blob storage.
URI = sc._gateway.jvm.java.net.URI
Path = sc._gateway.jvm.org.apache.hadoop.fs.Path
FileSystem = sc._gateway.jvm.org.apache.hadoop.fs.FileSystem
conf = sc._jsc.hadoopConfiguration()
conf.set(
"fs.azure.account.key..blob.core.windows.net,
"")
fs = Path('wasbs://@.blob.core.windows.net//').getFileSystem(sc._jsc.hadoopConfiguration())
istream = fs.open(Path('