最近沉迷于学习sparklyr。看标题就知道,今天讲的是如何手撕数据。
这次会用到一个sparklyr扩展包:
Function referencemitre.github.io首先读取数据:
conf <- spark_config()
conf$spark.executor.memory <- "16GB"
conf$spark.memory.fraction <- 0.9
conf$spark.executor.cores <- 4
conf$spark.dynamicAllocation.enabled <- "false"
sc <- spark_connect(master="spark://127.0.1.1:7077",
version = "2.3.2",
config = conf,
spark_home = "/home/spark/spark-2.3.2-bin-hadoop2.7/")
test<-spark_read_parquet(sc,
path = '/home/spark_test_file',
memory = F)
> test
# Source: spark<?> [?? x 2]
V17 V18
<int> <chr>