说明
环境说明
操作系统:centos7.2
hadoop环境使用的是2.7.2
java使用的是1.8
安装Rhadoop的准备环境
必要的包
install.packages("rJava")
install.packages("reshape2")
install.packages("Rcpp")
install.packages("iterators")
install.packages("itertools")
install.packages("digest")
install.packages("RJSONIO")
install.packages("functional")
install.packages("caTools")
环境变量的设置
HADOOP_CMD环境变量的设置(易错):
要把hadoop的bin下的hadoop赋给HADOOP_CMD
HADOOP_CMD=/opt/hadoop-2.7.2/bin/hadoop
HADOOP_STREAMING环境变量的设置(rmr需要):
export HADOOP_STREAMING=/opt/hadoop-2.7.2/share/hadoop/tools/lib/hadoop-streaming-2.7.2.jar
安装Rhadoop
rhdfs的安装(只需要在user-client上安装即可)
R CMD INSTALL rhdfs_1.0.8.tar.gz
rmr2的安装(每个节点都需要安装)
R CMD INSTALL rmr2_3.3.1.tar.gz
测试
rhdfs
>library("rhdfs")
> hdfs.init()
16/08/01 15:55:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> hdfs.ls("/")
permission owner group size modtime file
1 drwxr-xr-x root supergroup 0 2016-07-31 14:53 /library
2 drwxr-xr-x Administrator supergroup 0 2016-07-31 16:37 /user
rmr2
Rhadoop中各个包在集群中的安装情况
Package | Where to Install |
---|---|
plyrmr | On every node in the cluster |
ravro | Only on the node that runs the R client |
rhbase | Only on the node that runs the R client |
rhdfs | Only on the node that runs the R client |
rmr2 | On every node in the cluster |
补充
官方github上的文档可以参考以下,对于重要的地方我个人认为还是提示的不错