使用非Java语言访问hdfs有两种方法,一种是利用libhdfs.so来访问hdfs,另一种是使用thrift通信框架来访问,这里暂先介绍libhdfs
1、先安装libhdfs
# 前提是安装jdk6、jre6,利用cloudera.repo来安装hadoop-0.20
sudo yum –y install libhdfs*
2、安装python-devel(2.6+), gcc
sudo yum –y install python-devel gcc
3、下载libpyhdfs源码, 准备依赖包
svn checkout http://libpyhdfs.googlecode.com/svn/trunk/ libpyhdfs
cd libpyhdfs
cp /usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u0.jar lib/hadoop-0.20.1-core.jar
cp /usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar lib/
cp /usr/lib/libhdfs.so.0 lib/
ln –s lib/libhdfs.so.0 lib/libhdfs.so
4、配置setup.py, 修改Java环境路径
vim setup.py
include_dirs = ['/usr/lib/jvm/java-6-sun/include/']
-> include_dirs = ['/usr/java/jdk1.6.0_24/include/']
runtime_library_dirs = ['/usr/local/lib/pyhdfs', '/