背景
由于谷歌的bigquery不支持直接上传大于1G的数据,因此需要通过先上传到GCS(google cloud storage),再从GCS将数据load进bigquery。
年初谷歌的验证规则作了更新,基于googleAuthR的gar_auth()函数已经失效,再也不能通过点一下进行验证了,会报如下错误:
但是hadley大神开发的bigrquery包,却能够通过验证,
可能tidyverse API用的是Rstudio的官方邮箱,官方认证,所以能够用来验证。
但是,用bigrquery包有两个局限性:
- 不支持直接上传大数据集;
- 该验证只能使用bigquery服务。
下面介绍下GCS、bigquery通用的验证方法,
依赖
- bigquery、GCS服务已开通;
- BigQuery API、Google Cloud Storage JSON API已启用:
- 服务账号已创建,且为owner权限:
失败的尝试
- 使用gar_set_client()无法通过验证;
- 创建客户端id,并使用该凭据,无法通过验证。
成功的尝试
- 点击服务账号;
- 创建新密钥;
- 创建json私钥;
创建完后会自动下载新密钥的json文件。
- 使用私钥json文件完成GCS、bigquery验证。
- 官方例子测试*:
可自行测试文件 -> GCS -> bigquery的数据流,
## Not run:
library(googleCloudStorageR)
library(bigQueryR)
gcs_global_bucket("your-project")
## custom upload function to ignore quotes and column headers
f <- function(input, output) {
write.table(input, sep = ",", col.names = FALSE, row.names = FALSE,
quote = FALSE, file = output, qmethod = "double")}
## upload files to Google Cloud Storage
gcs_upload(mtcars, name = "mtcars_test1.csv", object_function = f)
gcs_upload(mtcars, name = "mtcars_test2.csv", object_function = f)
## create the schema of the files you just uploaded
user_schema <- schema_fields(mtcars)
## load files from Google Cloud Storage into BigQuery
bqr_upload_data(projectId = "your-project",
datasetId = "test",
tableId = "from_gcs_mtcars",
upload_data = c("gs://your-project/mtcars_test1.csv",
"gs://your-project/mtcars_test2.csv"),
schema = user_schema)
## for big files, its helpful to create your schema on a small sample
## a quick way to do this on the command line is:
# "head bigfile.csv > head_bigfile.csv"
## upload nested lists as JSON
the_list <- list(list(col1 = "yes", col2 = "no",
col3 = list(nest1 = 1, nest2 = 3), col4 = "oh"),
list(col1 = "yes2",
col2 = "n2o", col3 = list(nest1 = 5, nest2 = 7),
col4 = "oh2"),
list(col1 = "yes3", col2 = "no3",
col3 = list(nest1 = 7, nest2 = 55), col4 = "oh3"))
bqr_upload_data(datasetId = "test",
tableId = "nested_list_json",
upload_data = the_list,
autodetect = TRUE)
## End(Not run)
对R与数据科学感兴趣的小伙伴欢迎关注同名公众号:R语言工程化