说明:hadoop集群在CentOS系统上,Python在Windows系统上
Python连接HDFS文件系统
1、安装pyhdfs模块、hdfs模块
PS C:\Program Files\Python35\Scripts> pip3 install pyhdfs
安装hdfs模块
PS C:\Program Files\Python35\Scripts> pip3 install hdfs
2、导入模块并连接HDFS文件系统
>>> from hdfs.client import Client
>>> client = Client("http://master:50070")
3、read()读取HDFS文件系统的数据
>>> with client.read("/tmp/py.txt") as fs:
... content = fs.read()
4、write()向HDFS文件系统写入数据
>>> with client.write("/tmp/py.log", data="this is python conn hdfs", encoding="utf-8") as fs:
... fs.write()
5、测试数据是否写过来
[hadoop@slave04 ~]$ hdfs dfs -cat /tmp/py.log
this is python conn hdfs
6、再通过read()查看py.log文件里是否有数据
>>> with client.read("/tmp/p