参考网站:http://arrow.apache.org/docs/python/filesystems.html#hadoop-file-system-hdfs
环境:
python 2.7.14 +pyarrow + hadoop 2.7
系统配置
File System Interfaces
In this section, we discuss filesystem-like interfaces in PyArrow.
Hadoop File System (HDFS) 语法
PyArrow comes with bindings to a C++-based interface to the Hadoop File System. You connect like so:
import pyarrow as pa
fs = pa.hdfs.connect(host, port, user=user, kerb_ticket=ticket_cache_path)
with fs.open(path, 'rb') as f:
# Do something with f
By default, pyarrow.hdfs.HadoopFileSystem
uses libhdfs, a JNI-based interface to the Java Hadoop client. This library is loaded at runtime (rather than at link / library load time, since the library may not be in your LD_LIBRARY_PATH), and relies on some environment variables. #环境变量配置
HADOOP_HOME
: the root of your installed Hadoop distribution. Often has lib/native/libhdfs.so.JAVA_HOME
: the location of your Java SDK installation.