简述:sparkr对r语言的方法进行封装实现分布式在集群中进行计算,在实际应用中spark人的数据集与sparkdataframe与r dataframe进行交互,并使用r单机实现统计学算法,部分及其复杂的算法sparkr是没有进行封装的
sparkr给不会scala语言只会r语言的提供使用
-
环境spark2.4.5,R3.6, cdh spark2.4.0(需要集成r)
-
如:将data.table,data.frame dt.score数据集转化成sparkR中的dataframe时可以执行sparkR中提供的方法
-
sparkR加载包默认会覆盖掉R中的方法
-
如需要调用R中的方法需要指定调用
-
dataframe作为R和sparkR中的桥梁,不同的是sparkR可以进行分布式计算,在sparkr中
-
使用R中的方法直接调用spark中的方法则会s4接口异常
7.'::'可以指定sparkr的包的使用,可以使用分布式计算,':::'则是使用scala的api
The following objects are masked from ‘package:jsonlite’:
flatten, toJSON
The following objects are masked from ‘package:plyr’:
arrange, count, desc, join, mutate, rename, summarize, take
The following object is masked from ‘package:MASS’:
select
The following object is masked from ‘package:nlme’:
gapply
The following objects are masked from ‘package:data.table’:
between, cube, first, hour, last, like, minute, month, quarter, rollup, second, tables, year
The following objects are masked from ‘package:dplyr’:
arrange, between, coalesce, collect, contains, count, cume_dist, dense_rank, desc, distinct, explain, expr, filter, first, group_by,
intersect, lag, last, lead, mutate, n, n_distinct, ntile, percent_rank, rename, row_number, sample_frac, select, slice, sql,
summarize, union
The following object is masked from ‘package:RMySQL’:
summary
The following objects are masked from ‘package:stats’:
cov, filter, lag, na.omit, predict, sd, var, window
The following objects are masked from ‘package:base’:
as.data.frame, colnames, colnames<-, drop, endsWith, intersect, rank, rbind, sample, startsWith, subset, summary, transform, union