Python通过thrift访问hadoop:报错java.lang.IllegalArgumentException: Wrong FS: hdfs:/ expected file:///

近日在研究使用python操作hdfs集群上面的文件,由于暂时不想使用第三方库,故使用thrift的方式。
在借鉴了这里之后,写了简单一段脚本hdfs-test.py

import sys
sys.path.append('gen-py')
from hdfs import hadoopthrift_cli

host = '10.33.28.200'

port = 10086
fs_con = hadoopthrift_cli(host,port)
fs_con.connect()
fs_con.do_ls(r'hdfs://10.33.28.200:9000/')

然后修改服务器端脚本start_thrift_server.sh,主要是修改其中jar包的位置信息。
启动服务器端

[root@hadoop1 scripts]# sh start_thrift_server.sh 10086
Starting the hadoop thrift server on port [10086]...
15/04/18 21:30:52 INFO hadoop.thrift: Starting the hadoop thrift server on port [10086]...

启动客户端

python hdfs-test.py

不料,此时的提示是这样的

[root@test py-hdfs]# python hdfs-test.py 
Traceback (most recent call last):
  File "hdfs-test.py", line 11, in <module>
    fs_con.do_ls(r'hdfs://10.33.28.200:9000/')
  File "/root/py-hdfs/hdfs.py", line 297, in do_ls
    status = self.client.stat(path)
  File "gen-py/hadoopfs/ThriftHadoopFileSystem.py", line 452, in stat
    return self.recv_stat()
  File "gen-py/hadoopfs/ThriftHadoopFileSystem.py", line 463, in recv_stat
    (fname, mtype, rseqid) = self._iprot.readMessageBegin()
  File "build/bdist.linux-i686/egg/thrift/protocol/TBinaryProtocol.py", line 126, in readMessageBegin
  File "build/bdist.linux-i686/egg/thrift/protocol/TBinaryProtocol.py", line 206, in readI32
  File "build/bdist.linux-i686/egg/thrift/transport/TTransport.py", line 58, in readAll
  File "build/bdist.linux-i686/egg/thrift/transport/TTransport.py", line 159, in read
  File "build/bdist.linux-i686/egg/thrift/transport/TSocket.py", line 120, in read
thrift.transport.TTransport.TTransportException: TSocket read 0 bytes

而服务器端也报错

[root@hadoop1 scripts]# sh start_thrift_server.sh 10086
Starting the hadoop thrift server on port [10086]...
15/04/18 22:49:39 INFO hadoop.thrift: Starting the hadoop thrift server on port [10086]...
java.lang.IllegalArgumentException: Wrong FS: hdfs://10.33.28.200:9000/, expected: file:///
        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:390)
        at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:55)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:398)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:255)
        at org.apache.hadoop.thriftfs.HadoopThriftServer$HadoopThriftHandler.stat(HadoopThriftServer.java:425)
        at org.apache.hadoop.thriftfs.api.ThriftHadoopFileSystem$Processor$stat.process(Unknown Source)
        at org.apache.hadoop.thriftfs.api.ThriftHadoopFileSystem$Processor.process(Unknown Source)
        at com.facebook.thrift.server.TThreadPoolServer$WorkerProcess.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:701)

但是如果将客户端文件里面的访问文件地址改为

fs_con.do_ls(r'/')

则会显示出服务器的本地文件结构

[root@test py-hdfs]# python hdfs-test.py 
1       4096    1429308924000   rwxrwxrwx       root    root    file:/tmp
1       12288   1400066227000   r-xr-xr-x       root    root    file:/sbin
1       4096    1401269647000   rwxrwxrwx       root    root    file:/share
1       4096    1316778468000   rwxr-xr-x       root    root    file:/mnt
1       0       1429300092000   rw-r--r--       root    root    file:/.autofsck
1       12288   1400066213000   r-xr-xr-x       root    root    file:/lib
1       16384   1396620511000   rwx------       root    root    file:/lost+found
1       4096    1400066199000   rwxr-xr-x       root    root    file:/var
1       4096    1429311134000   r-xr-x---       root    root    file:/root
1       4096    1316778468000   rwxr-xr-x       root    root    file:/srv
1       4096    1396620633000   rwxr-xr-x       root    root    file:/selinux
1       1024    1396620854000   r-xr-xr-x       root    root    file:/boot
1       0       1429300085000   rwxr-xr-x       root    root    file:/sys
1       0       1400066782000   rw-r--r--       root    root    file:/.autorelabel
1       4096    1316778468000   rwxr-xr-x       root    root    file:/home
1       4096    1401446050000   rwxr-xr-x       root    root    file:/media
1       0       1429300085000   r-xr-xr-x       root    root    file:/proc
1       4096    1429364692000   rwxr-xr-x       root    root    file:/etc
1       4096    1416433067000   rwxr-xr-x       root    root    file:/usr
1       4096    1316778468000   rwxr-xr-x       root    root    file:/opt
1       3720    1429300104000   rwxr-xr-x       root    root    file:/dev
1       4096    1400066213000   r-xr-xr-x       root    root    file:/bin

感觉应该是项目无法找到文件系统的配置文件,即core-site.xml中的关于hdfs的地址。但将conf目录放入path仍然无法解决。

后来查看各类代码,包括HadoopThriftServer.java等,查找了各种path,pathname等字段的写法,也未发现问题。

人言道,外事不决问谷歌,内事不决问百度。可能搜索能力还是不够,在谷歌折腾一天也没找到,倒是反而在百度搜到了解决办法。详情请看这里

果然是因为项目无法找到原来是需要将配置文件,而此情况下,需将配置文件放于项目目录下即可。对于本项目,就把core-site.xml放在start_thrift_server.sh文件同一个目录下即可。

[root@test py-hdfs]# python hdfs-test.py 
0       0       1413390909861   rwxr-xr-x       root    supergroup      hdfs://10.33.28.200:9000/root
0       0       1413412534130   rwxr-xr-x       root    supergroup      hdfs://10.33.28.200:9000/user
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

crookie

你的鼓励将是我创作的最大动力

¥2 ¥4 ¥6 ¥10 ¥20
输入1-500的整数
余额支付 (余额:-- )
扫码支付
扫码支付:¥2
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值