Flink SQL:Getting Started

Getting Started

Flink SQL makes it simple to develop streaming applications using standard SQL. It is easy to learn Flink if you have ever worked with a database or SQL like system by remaining ANSI-SQL 2011 compliant. This tutorial will help you get started quickly with a Flink SQL development environment.
Flink SQL使使用标准SQL开发流应用程序变得简单。如果您曾经使用过仍然遵循ANSI-SQL 2011的数据库或类似SQL的系统,那么学习Flink很容易。本教程将帮助您快速开始使用Flink SQL开发环境。

Prerequisites

You only need to have basic knowledge of SQL to follow along. No other programming experience is assumed.
您只需要掌握SQL的基本知识即可。假设没有其他编程经验。

Installation

There are multiple ways to install Flink. For experimentation, the most common option is to download the binaries and run them locally. You can follow the steps in local installation to set up an environment for the rest of the tutorial.
安装Flink有多种方法。对于实验,最常见的选项是下载二进制文件并在本地运行它们。您可以按照本地安装中的步骤为教程的其余部分设置环境。

Once you’re all set, use the following command to start a local cluster from the installation folder:
完成所有设置后,使用以下命令从安装文件夹启动本地群集:

./bin/start-cluster.sh

Once started, the Flink WebUI on localhost:8081 is available locally, from which you can monitor the different jobs.
启动后,localhost:8081上的Flink WebUI在本地可用,您可以从中监视不同的作业。

SQL Client

The SQL Client is an interactive client to submit SQL queries to Flink and visualize the results. To start the SQL client, run the sql-client script from the installation folder.
SQL客户端是一个交互式客户端,用于向Flink提交SQL查询并可视化结果。要启动SQL客户端,请从安装文件夹运行SQL客户端脚本。

./bin/sql-client.sh

Hello World

Once the SQL client, our query editor, is up and running, it’s time to start writing queries. Let’s start with printing ‘Hello World’, using the following simple query:
一旦SQL客户端(我们的查询编辑器)启动并运行,就可以开始编写查询了。让我们从打印“Hello World”开始,使用以下简单查询:

SELECT 'Hello World';

Running the HELP command lists the full set of supported SQL statements. Let’s run one such command, SHOW, to see a full list of Flink built-in functions.
运行HELP命令会列出一组完整的受支持SQL语句。让我们运行一个这样的命令SHOW,查看Flink内置函数的完整列表。

SHOW FUNCTIONS;

These functions provide users with a powerful toolbox of functionality when developing SQL queries. For example, CURRENT_TIMESTAMP will print the machine’s current system time where it is executed.
在开发SQL查询时,这些函数为用户提供了功能强大的工具箱。例如,CURRENT_TIMESTAMP将打印执行机器的当前系统时间。

SELECT CURRENT_TIMESTAMP;

Source Tables

As with all SQL engines, Flink queries operate on top of tables. It differs from a traditional database because Flink does not manage data at rest locally; instead, its queries operate continuously over external tables.
与所有SQL引擎一样,Flink查询在表之上运行。它与传统数据库不同,因为Flink不在本地管理静态数据;相反,它的查询在外部表上持续运行。

Flink data processing pipelines begin with source tables. Source tables produce rows operated over during the query’s execution; they are the tables referenced in the FROM clause of a query. These could be Kafka topics, databases, filesystems, or any other system that Flink knows how to consume.
Flink数据处理管道从source表开始。source表生成在查询执行期间操作的行;它们是查询的FROM子句中引用的表。这些可能是Kafka topics、数据库、文件系统或Flink知道如何使用的任何其他系统。

Tables can be defined through the SQL client or using environment config file. The SQL client support SQL DDL commands similar to traditional SQL. Standard SQL DDL is used to create, alter, drop tables.
可以通过SQL客户端或使用环境配置文件定义表。SQL客户端支持类似于传统SQL的SQL DDL命令。标准SQL DDL用于创建、更改和删除表。

Flink has a support for different connectors and formats that can be used with tables. Following is an example to define a source table backed by a CSV file with emp_id, name, dept_id as columns in a CREATE table statement.
Flink支持用于表的不同的连接器和格式。下面是一个示例,用于在CREATE表语句中定义由CSV文件支持的source表,其中emp_id、name、dept_id作为列。

CREATE TABLE employee_information (
    emp_id INT,
    name VARCHAR,
    dept_id INT
) WITH ( 
    'connector' = 'filesystem',
    'path' = '/path/to/something.csv',
    'format' = 'csv'
);

A continuous query can be defined from this table that reads new rows as they are made available and immediately outputs their results. For example, we can filter for just those employees who work in department 1.
可以从此表中定义持续查询,在新行可用时读取新行,并立即输出其结果。例如,我们可以只筛选在部门1工作的员工。

SELECT * from employee_information WHERE dept_id = 1;

Continuous Queries

While not designed initially with streaming semantics in mind, SQL is a powerful tool for building continuous data pipelines. Where Flink SQL differs from traditional database queries is that is continuously consuming rows as the arrives and produces updates to its results.
虽然最初设计时没有考虑流语义,但SQL是构建持续数据管道的强大工具。Flink SQL与传统数据库查询的不同之处在于,它在数据到达时不断消费数据行,并对其结果进行更新。

A continuous query never terminates and produces a dynamic table as a result. Dynamic tables are the core concept of Flink’s Table API and SQL support for streaming data.
持续查询永远不会终止并生成动态表。动态表是Flink的Table API和流数据SQL支持的核心概念。

Aggregations on continuous streams need to store aggregated results continuously during the execution of the query. For example, suppose you need to count the number of employees for each department from an incoming data stream. The query needs to maintain the most up to date count for each department to output timely results as new rows are processed.
持续流上的聚合需要在查询执行期间持续存储聚合结果。例如,假设您需要从传入的数据流中统计每个部门的员工数量。查询需要维护每个部门的最新计数,以便在处理新行时及时输出结果。

SELECT 
   dept_id,
   COUNT(*) as emp_count 
FROM employee_information 
GROUP BY dept_id;

Such queries are considered stateful. Flink’s advanced fault-tolerance mechanism will maintain internal state and consistency, so queries always return the correct result, even in the face of hardware failure.
此类查询被视为有状态查询。Flink的高级容错机制将保持内部状态和一致性,因此即使在硬件出现故障时,查询也会始终返回正确的结果。

Sink Tables

When running this query, the SQL client provides output in real-time but in a read-only fashion. Storing results - to power a report or dashboard - requires writing out to another table. This can be achieved using an INSERT INTO statement. The table referenced in this clause is known as a sink table. An INSERT INTO statement will be submitted as a detached query to the Flink cluster.
运行此查询时,SQL客户端以read-only方式实时提供输出。存储结果-为报表或仪表板提供动力-需要写出到另一个表中。这可以使用INSERT INTO语句实现。此子句中引用的表称为sink表。INSERT INTO语句将作为分离查询提交到Flink集群。

INSERT INTO department_counts
SELECT 
   dept_id,
   COUNT(*) as emp_count 
FROM employee_information;

Once submitted, this will run and store the results into the sink table directly, instead of loading the results into the system memory.
提交后,它将直接运行并将结果存储到sink表中,而不是将结果加载到系统内存中。

Looking for Help!

If you get stuck, check out the community support resources. In particular, Apache Flink’s user mailing list consistently ranks as one of the most active of any Apache project and a great way to get help quickly.
如果您遇到困难,请查看社区支持资源。特别是,Apache Flink的用户邮件列表一直被列为Apache项目中最活跃的项目之一,是快速获得帮助的好方法。

Resources to Learn more

  • SQL:Supported operations and syntax for SQL.
    SQL:支持的SQL操作和SQL语法。
  • SQL Client: Play around with Flink SQL and submit a table program to a cluster without programming knowledge
    SQL客户端:使用Flink SQL,将表程序提交给集群,而不需要编程知识
  • Concepts & Common API: Shared concepts and APIs of the Table API and SQL.
    概念和通用API:Table API和SQL的共享概念和API。
  • Streaming Concepts: Streaming-specific documentation for the Table API or SQL such as configuration of time attributes and handling of updating results.
    流概念:Table API或SQL的流式特定文档,例如时间属性的配置和更新结果的处理。
  • Built-in Functions: Supported functions in Table API and SQL.
    内置函数:Table API和SQL中支持的函数。
  • Connect to External Systems: Available connectors and formats for reading and writing data to external systems.
    连接到外部系统:用于向外部系统读取和写入数据的可用连接器和格式。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值