逐步指南,将数据加载到bigquery中

In this Part 6 of the series, “Modernisation of a Data Platform”, we would be focussing a little more on BigQuery’s key concepts which are essential for designing a DWH.

在本系列的第6部分“数据平台的现代化”中,我们将重点关注BigQuery的关键概念,这些概念对于设计DWH是必不可少的。

In this part, we will see how to deal with table design in BigQuery using different methods and load a covid19_tweets dataset and run a query to analyse the data.

在这一部分中,我们将看到如何使用不同的方法处理BigQuery中的表设计,并加载covid19_tweets数据集并运行查询以分析数据。

Creating a Schema:

创建模式:

We can create a schema in BigQuery either while migrating the data from an existing datawarehouse or ingesting the data into BigQuery from various data sources that are either on cloud or on-premise.

我们可以在BigQuery中创建架构,既可以从现有数据仓库中迁移数据,也可以从云或本地的各种数据源中将数据提取到BigQuery中。

Other than manually creating the schema, Bigquery also gives an option to auto-detect the schema.

除了手动创建模式外,Bigquery还提供了自动检测模式的选项。

How does this auto-detect work?

自动检测如何工作?

BigQuery compares the header row of an input file and a representative sample of 100 records from row 2 onwards. If the data types of 100 samples differ from the header row, BigQuery proceeds to use them as column names. User will just have to enable auto-detect to have the schema created automatically while load happens.

BigQuery比较输入文件的标题行和从第2行开始的100条记录的代表性示例。 如果100个样本的数据类型与标题行不同,则BigQuery继续将其用作列名。 用户只需启用自动检测功能,即可在加载发生时自动创建架构。

Datatypes in BigQuery:

BigQuery中的数据类型:

While most of the data types are standard ones such as Integer, float, Numeric, Boolean etc, one special data type that we need to discuss is STRUCT.

尽管大多数数据类型都是标准数据类型,例如Integer,float,Numeric,Boolean等,但我们需要讨论的一种特殊数据类型是STRUCT。

This data type is particularly used for nested and repeated fields. The best example to represent a STRUCT is addresses. Normally, addresses have multiple sub-fields such as Is_active, address line 1, address line 2, town, city, post code, number of years_addr etc.,

此数据类型特别用于嵌套字段和重复字段。 代表STRUCT的最佳示例是地址。 通常,地址具有多个子字段,例如Is_active,地址行1,地址行2,城镇,城市,邮政编码,years_addr等。

All these fields can be nested under the parent field ‘Addresses’. While normal data types are either ‘nullable’ or ‘non nullable’, the mode of STRUCTS would always be defined as ‘REPEATED’.

所有这些字段都可以嵌套在父字段“地址”下。 虽然普通数据类型为“可为空”或“不可为空”,但STRUCTS的模式将始终定义为“重复”。

Creating Tables & Managing Accesses:

创建表和管理访问:

The easiest way to create a table in BQ is by using the Cloud Console. The UI is extremely friendly and user can navigate to BigQuery console to create tables in a dataset.

在BQ中创建表的最简单方法是使用Cloud Console。 用户界面非常友好,用户可以导航到BigQuery控制台以在数据集中创建表。

Alternatively, there is a REST API service that can be used to insert tables with a specific schema into the dataset.

另外,还有一种REST API服务,可用于将具有特定架构的表插入数据集中。

BigQuery provides an option to restrict access at a dataset level. However, there is a beta feature (as of this article is being published) to grant access at a table level or view level too. Access can be granted as a data viewer, data admin, data editor, data owner etc.,

BigQuery提供了一个选项来限制数据集级别的访问。 但是,有一个beta功能(在本文发布时)也可以在表级别或视图级别授予访问权限。 可以授予访问权限,例如数据查看者,数据管理员,数据编辑者,数据所有者等,

BigQuery allows users to copy table, delete table, alter the expiration time of the

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值