让数据在两个buckets之间传输 - Google Storage Transfer Service

在业务场景中, 有时我们不想直接暴露数据存储空间给上游系统, 而需要设置1个landing Path 让上游系统发送数据

如图:
在这里插入图片描述
我们只需grant landing bucket 的权限给上游系统, 而上游系统是访问不了storage bucket的保证了数据隔离
但是至于怎么把放在landing bucket的文件自动导到Storage Bucket.

我们当然可以写代码, build 一些ETL service去完成, 这个service 定期观察landing bucket是否有新文件到。 有就把文件放入Storage Bucket



Storage Transfer Service 介绍

Google 已经具有个产品叫 Storage Transfer Service, 可以帮我们把文件在两个bucket 之间传输(甚至包括AWS S3 Bucket, 不在本文讨论范围内)

大概原理图:
在这里插入图片描述



各组件介绍



src bucket

就是storage transfer service 的源bucket

storage notification

要实现event driven的storage transfer serivce, 我们必须为src bucket 创建1个bucket notification, 一旦有新文件进入or 改动(事件类型可以配置), bucket notfication 会发送一条消息(新文件的元数据) 到1个pubsub topic

参考
https://cloud.google.com/storage/docs/pubsub-notifications

pubsub topic

pubsub 组件是实现event driven 的关键, topic 的主要作用就是用于接受storage notification发送的消息

pubsub pull subscription

subscription 就为了让 后面的 transfer streaming job 去消费消息(发送自storage notification)

Storage Transfer streaming job

这个就是整个流程的核心, 这个job 会24小时启动, 它会monitor pubsub subscription, 一旦有新的消息, 它就回去把文件从src bucket move 到 target bucket

并不难理解



1个具体例子

创建 buckets

首先 我们先创建 两个bucket
jason-hsbc-demo-src
jason-hsbc-demo-target

// define src bucket
resource "google_storage_bucket" "bucket-jason-hsbc-demo-src" {
  name     = "jason-hsbc-demo-src"
  project  = var.project_id
  location = var.region_id
}


//define target bucket
resource "google_storage_bucket" "bucket-jason-hsbc-demo-target" {
  name     = "jason-hsbc-demo-target"
  project  = var.project_id
  location = var.region_id
}
创建 pubsub topic 和subscription

topic: topic-sts-demo
subscription: subscription-sts-demo


//define a pubsub topic
resource "google_pubsub_topic" "topic_sts_demo" {
  name = "topic-sts-demo"
  project  = var.project_id
}


//define a pubsub subscription
resource "google_pubsub_subscription" "subscription_sts_demo" {
  name     = "subscription-sts-demo"
  topic    = google_pubsub_topic.topic_sts_demo.name
  project  = var.project_id
}
分配权限
  1. 把topic 的publish 权限 分配给gcs agent account , 否则storage notification 没有权限发送消息给pubsub
resource "google_pubsub_topic_iam_binding" "topic_sts_demo_binding" {
  topic   = google_pubsub_topic.topic_sts_demo.id
  role    = "roles/pubsub.publisher"
  members = ["serviceAccount:${var.gcs_sa}"]
}

至于如何查出当前gcp 项目的gcs agent account , 可以用下面命令获得:

[gateman@manjaro-x13 chapter-01]$ gcloud storage service-agent
service-912156613264@gs-project-accounts.iam.gserviceaccount.com

也可以从下面document里查询
https://cloud.google.com/iam/docs/service-agents

  1. 把subscription 的Read/Edit 权限分配给 storage transfer service 的 agent account, 注意这个不是上面那个gcs agent account. 两个不同的
resource "google_pubsub_subscription_iam_binding" "subscription_sts_demo_binding" {
  subscription = google_pubsub_subscription.subscription_sts_demo.name
  role         = "roles/editor"
  members = ["serviceAccount:${var.sts_sa}"]
  
}
  1. 把 两个bucket的读写权限都grant 给 storage transfer service 的 agent account

注意, 实际上transfer service agent account 需要src bucket的 storage.buckets.get 权限, 建议grant object.admin role, 如果只给objectUser role的话会有如下错误

 Error: googleapi: Error 400: Failed to obtain the location of the GCS bucket jason-hsbc-demo-src Additional details: project-912156613264@storage-transfer-service.iam.gserviceaccount.com does not have storage.buckets.get access to the Google Cloud Storage bucket. Permission 'storage.buckets.get' denied on resource (or it may not exist)., failedPrecondition
resource "google_storage_bucket_iam_binding" "bucket-jason-hsbc-demo-target-binding" {
  bucket = google_storage_bucket.bucket-jason-hsbc-demo-target.name
  role = "roles/storage.admin"
  members = ["serviceAccount:${var.sts_sa}"]
}

resource "google_storage_bucket_iam_binding" "bucket-jason-hsbc-demo-src-binding" {
  bucket = google_storage_bucket.bucket-jason-hsbc-demo-src.name
  role = "roles/storage.admin"
  members = ["serviceAccount:${var.sts_sa}"]
}



为src bucket 创建storage notification

注意要正确指定上面create的pubsub topic

// https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/storage_notification.html
// define a bucket notification
resource "google_storage_notification" "notification" {
  bucket         = google_storage_bucket.bucket-jason-hsbc-demo-src.name
  payload_format = "JSON_API_V1"
  topic          = google_pubsub_topic.topic_sts_demo.id
  event_types    = ["OBJECT_FINALIZE", "OBJECT_METADATA_UPDATE"]
  custom_attributes = {
    new-attribute = "new-attribute-value"
  }
  depends_on = [google_pubsub_topic_iam_binding.topic_sts_demo_binding]
}

最后一步, 基于pubsub subscription 创建1个storage transfer stream job

参考https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/storage_transfer_job.html

resource "google_storage_transfer_job" "transfer-job-sts-demo" {
  description = "transfer-job-sts-demo"
  project     = var.project_id
  transfer_spec {

    transfer_options {
      overwrite_objects_already_existing_in_sink = true
      overwrite_when = "ALWAYS"
      delete_objects_from_source_after_transfer = true 
    }

    gcs_data_source {
      bucket_name = google_storage_bucket.bucket-jason-hsbc-demo-src.name

    }
    gcs_data_sink {
      bucket_name = google_storage_bucket.bucket-jason-hsbc-demo-target.name
    }
   
  }

  event_stream {
      name =  format("projects/%s/subscriptions/%s", var.project_id, google_pubsub_subscription.subscription_sts_demo.name)
  }

  
  depends_on = [google_storage_notification.notification, 
                google_pubsub_subscription_iam_binding.subscription_sts_demo_binding,
                google_storage_bucket_iam_binding.bucket-jason-hsbc-demo-target-binding,
                google_storage_bucket_iam_binding.bucket-jason-hsbc-demo-src-binding]
}
测试

我们先给src bucket 上传1个文件


[gateman@manjaro-x13 chapter-01]$ gsutil cp *csv  gs://jason-hsbc-demo-src
Copying file://supermarket_sales.csv [Content-Type=text/csv]...
- [1 files][128.4 KiB/128.4 KiB]                                                
Operation completed over 1 objects/128.4 KiB. 

检查bucket 文件

[gateman@manjaro-x13 chapter-01]$ gsutil ls gs://jason-hsbc-demo-src
gs://jason-hsbc-demo-src/chapter-01-steps.sql
[gateman@manjaro-x13 chapter-01]$ gsutil ls gs://jason-hsbc-demo-target
gs://jason-hsbc-demo-target/Untitled-1.mak
gs://jason-hsbc-demo-target/chapter-01-steps.sql
gs://jason-hsbc-demo-target/supermarket_sales.csv

可以见到 csv 文件已经被传送到target bucket

我们也可以从UI 上查看transfer job的状态
在这里插入图片描述

  • 18
    点赞
  • 15
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

nvd11

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值