vpc安全监测
There are a lot of Data Analytics use cases on GCP these days. The one of the most common concerns about building a Data Analytics Platform on GCP is security. Google Cloud has a lot of security products to protect customer’s data, and one of them is VPC Service Controls(VPC SC).
如今,GCP上有很多Data Analytics用例。 关于在GCP上构建数据分析平台的最常见问题之一是安全性。 Google Cloud有很多安全产品可以保护客户的数据,其中之一就是VPC服务控件(VPC SC)。
VPC SC provides an additional layer of security defense for Google Cloud services that is independent of Identity and Access Management (IAM). While IAM enables granular identity-based access control, VPC SC enables broader context-based perimeter security, including controlling data egress across the perimeter.
VPC SC为Google Cloud服务提供了另一层安全防御,它独立于身份和访问管理(IAM)。 IAM支持基于粒度身份的访问控制,而VPC SC支持基于上下文的更广泛的外围安全,包括控制跨外围的数据出口。
This blog post describes an example of how to build a Data Platform using Cloud Functions, Dataflow, Google Cloud Storage and Bigquery with VPC Service Controls.
这篇博客文章介绍了如何使用Cloud Functions,Dataflow,Google Cloud Storage和带有VPC Service Controls的Bigquery构建数据平台的示例。
示例架构 (Example Architecture)
Let’s start with a customer who wants to create the data pipeline from GCS to BigQuery with Cloud DataFlow. And Cloud Dataflow is kicked by Cloud Functions once data is stored in GCS.
让我们从一个客户开始,该客户希望使用Cloud DataFlow创建从GCS到BigQuery的数据管道。 一旦将数据存储在GCS中,云功能就会启动Cloud Dataflow。
设置VPC服务控件 (Set up VPC Service Controls)
In order for developers to be able to view and edit components on the VPC Service Controls from Google Cloud Console or terminal, VPC SC has to include access levels settings. The first step is to create an access policy and access levels to configure on VPC Service Controls.
为了使开发人员能够从Google Cloud Console或终端上查看和编辑VPC服务控件上的组件,VPC SC必须包含访问级别设置。 第一步是创建访问策略和访问级别,以在VPC服务控件上进行配置。
After that, you will configure VPC SC for your projects using the access policy and access levels which you created.
之后,您将使用创建的访问策略和访问级别为项目配置VPC SC。
To configure access policy, access levels and VPC SC, Resource Manager Organization Viewer role is required.
要配置访问策略,访问级别和VPC SC,需要Resource Manager Organization Viewer角色。
创建访问策略和访问级别 (Creating Access Policy and Access Levels)
Access levels can be set for IP addresses, devices, user and service accounts. In this project, we’ll create an access level which includes IP addresses, user and service accounts. Setting access levels for users and service accounts is available with only command line. Please see Creating an access policy.
可以为IP地址,设备,用户和服务帐户设置访问级别。 在此项目中,我们将创建一个访问级别,其中包括IP地址,用户和服务帐户。 只能通过命令行设置用户和服务帐户的访问级别。 请参阅创建访问策略 。
Here is the way to create an access level with CLI.
这是使用CLI创建访问级别的方法。
- Create the following `CONDITIONS.yaml` 创建以下`CONDITIONS.yaml`
- ipSubnetworks
- 252.0.2.0/24
- 2001:db8::/32
members:
- serviceAccount:<serviceaccount>
- user:<user>
…
2. Create an access level with the following command
2.使用以下命令创建访问级别
gcloud access-context-manager levels create <NAME> --title <TITLE> — basic-level-spec CONDITIONS.yaml --combine-function=OR --policy=<POLICY_NAME>
3. You can view the access level which you created on Google Cloud Console, by selecting the organization that contains the project, in Security -> Access Context Manager
3.在“安全性”->“访问上下文管理器”中,通过选择包含项目的组织,可以查看在Google Cloud Console上创建的访问级别。
Note that you cannot see the code preview of the script in Cloud Functions console view if you configure VPC SC with an access level that restricts access to users and service accounts only. To resolve this issue, add IP addresses or device access restrictions at access level. For more information on how to create an access level, see Creating a basic access level.
请注意,如果将VPC SC配置为仅限制对用户和服务帐户的访问级别,则无法在Cloud Functions控制台视图中看到脚本的代码预览。 要解决此问题,请在访问级别添加IP地址或设备访问限制。 有关如何创建访问级别的更多信息,请参见创建基本访问级别 。
创建服务范围 (Creating Service Perimeter)
Let’s create VPC SC with the following components.
让我们用以下组件创建VPC SC。
- BigQuery API BigQuery API
- Cloud Functions API 云功能API
- Google Cloud Dataflow API Google Cloud Dataflow API
- Google Cloud Storage API Google Cloud Storage API
The detailed document for creating VPC SC is here. You should use the access level which you created at `Ingress Policies: Access Levels`.
有关创建VPC SC的详细文档,请参见此处 。 您应该使用在“入口策略:访问级别”中创建的访问级别。
Whereas BigQuery and GCS can be used above, it’s necessary to configure additional settings for Cloud Function and Dataflow. The following sections describe that.
尽管上面可以使用BigQuery和GCS,但必须为Cloud Function和Dataflow配置其他设置。 以下各节对此进行了描述。
使用VPC SC配置云功能 (Configuration for Cloud Functions with VPC SC)
To use Cloud Functions with VPC Service Controls, you have to configure Organization policy and serverless VPC Access.
要将Cloud Functions与VPC服务控件一起使用,您必须配置组织策略和无服务器VPC访问。
设置组织政策 (Set up Organization Policies)
To use VPC SC with Cloud Functions, you have to set up the following organization policies.
要将VPC SC与Cloud Functions一起使用,您必须设置以下组织策略。
Mandatory:
强制性的:
Optional:
可选的:
To manage organization policies, you need the Organization Policy Administrator role.
要管理组织策略,您需要组织策略管理员角色。
Please see Using VPC Service Controls for more details.
有关更多详细信息,请参见使用VPC服务控件 。
设置无服务器VPC访问 (Set up Serverless VPC Access)
Create Connector according to Creating a connector.Here is an example.
根据创建连接器创建连接器。这里是一个示例。
部署云功能后更新访问级别 (Update Access level after deploying Cloud Functions)
You have to add Cloud Build service account to the access level you have created according to this manual. At the first time you deploy Cloud Functions, you'll get an error like this.
您必须将Cloud Build服务帐户添加到根据本手册创建的访问级别。 第一次部署Cloud Functions时,会出现这样的错误。
You should add Cloud Build service account identified here to the `CONDITIONS.yaml` you created in the previous section and update the access level with the following command.
您应该将此处标识的Cloud Build服务帐户添加到上一节中创建的CONDITIONS.yaml中,并使用以下命令更新访问级别。
gcloud access-context-manager levels update <NAME> — title <TITLE> — basic-level-spec CONDITIONS.yaml — combine-function=OR — policy=<POLICY_NAME>
使用VPC SC的自定义数据流模板 (Custom Dataflow Templates with VPC SC)
When running Dataflow Templates with VPC SC, the worker instances must be created on the subnetwork in which Private Google Access is enabled. We will describe the configuration for this and the command line arguments required for template staging and execution.
使用VPC SC运行数据流模板时,必须在启用了私有Google Access的子网中创建工作程序实例。 我们将描述此配置以及模板登台和执行所需的命令行参数。
设置Google私有访问权限 (Set up Private Google Access)
The subnetwork to be used by Dataflow worker instances must be configured with Private Google Access as described earlier. The Private Google Access configuration can be configured to allow the project owner, editor, or network administrator roles to be done by an IAM user who has.
如前所述,必须使用私有Google Access配置Dataflow工作实例要使用的子网。 可以将“私有Google Access”配置配置为允许拥有项目的IAM用户完成项目所有者,编辑者或网络管理员角色。
为VPC SC环境登台自定义数据流模板 (Stage Custom Dataflow Templates for VPC SC environment)
If you’d like to stage your template, you have to use Service Account which is included in the access level which you set at VPC SC. Please set the environmental variable as below.
如果要登台模板,则必须使用服务帐户,该帐户包含在VPC SC设置的访问级别中。 请如下设置环境变量。
export GOOGLE_APPLICATION_CREDENTIALS=<credential path>
To engure that the Dataflow worker nodes used during template staging use a subnetwork with Private Google Access configured, ` — subnetwork=<SUBNETWORK_NAME> — usePublicIps=false` option is necessary at the commandline arguments.
为了确保在模板暂存期间使用的Dataflow工作程序节点使用配置了私有Google Access的子网,在命令行参数中必须使用`-subnetwork = <SUBNETWORK_NAME>-usePublicIps = false`选项。
The entire command line will look like this.
整个命令行如下所示。
mvn compile exec:java \
-Dexec.mainClass=com.example.myclass \
-Dexec.args=” — runner=DataflowRunner \
— project=YOUR_PROJECT_ID \
— stagingLocation=gs://YOUR_BUCKET_NAME/staging \
— templateLocation=gs://YOUR_BUCKET_NAME/templates/YOUR_TEMPLATE_NAME
— subnetwork=SUBNETWORK_NAME
— usePublicIps=false”
使用VPC SC执行自定义数据流模板 (Execute Custom Dataflow templates with VPC SC)
You need to specify subnetwork and ip_configuration as Dataflow API arguments at the script which is called by Cloud functions.
您需要在由Cloud函数调用的脚本中将子网和ip_configuration指定为Dataflow API参数。
The script sample in Python
Python中的脚本样本
from googleapiclient.discovery import build
from oauth2client.client import GoogleCredentialsdef run(event, context):
bucket = ‘my_bucket’
projectId = ‘my_project’
location = ‘asia-northeast1’
jobName = ‘my-job’
tableSpec = ‘my_project:test_dataset.test_table’
gcsPath = f’gs://{bucket}/templates/my-template’textFilePath = f’gs://{bucket}/’ + event[‘name’]credentials = GoogleCredentials.get_application_default()
service = build(‘dataflow’, ‘v1b3’, credentials=credentials)body = {
“jobName”: jobName,
“parameters”: {
“textFilePath”: textFilePath,
“tableSpec”: tableSpec
},
“environment”: {
“subnetwork”: “mysubnetwork”,
“ip_configuration”: “WORKER_IP_PRIVATE”
}
}
res = service.projects().locations().templates().launch(
projectId=projectId,
gcsPath=gcsPath,
location=location,
body=body,
).execute()
结论 (Conclusion)
In this blog post, I introduced how to build a secure data pipeline in GCP using VPC SC.Enjoy a good Data engineer life with GCP!
在这篇博文中,我介绍了如何使用VPC SC在GCP中建立安全的数据管道,让GCP享有美好的数据工程师的生活!
vpc安全监测