Trino concepts#
Overview#
To understand Trino, you must first understand the terms and concepts used throughout the Trino documentation.
While it is easy to understand statements and queries, as an end-user you should have familiarity with concepts such as stages and splits to take full advantage of Trino to execute efficient queries. As a Trino administrator or a Trino contributor you should understand how Trino’s concepts of stages map to tasks and how tasks contain a set of drivers which process data.
This section provides a solid definition for the core concepts referenced throughout Trino, and these sections are sorted from most general to most specific.
Note
The book Trino: The Definitive Guide and the research paper Presto: SQL on Everything can provide further information about Trino and the concepts in use.
Architecture#
Trino is a distributed query engine that processes data in parallel across multiple servers. There are two types of Trino servers, coordinators and workers. The following sections describe these servers and other components of Trino’s architecture.
Cluster#
A Trino cluster consists of a coordinator and many workers. Users connect to the coordinator with their SQL query tool. The coordinator collaborates with the workers. The coordinator and the workers access the connected data sources. This access is configured in catalogs.
Processing each query is a stateful operation. The workload is orchestrated by the coordinator and spread parallel across all workers in the cluster. Each node runs Trino in one JVM instance, and processing is parallelized further using threads.
Coordinator#
The Trino coordinator is the server that is responsible for parsing statements, planning queries, and managing Trino worker nodes. It is the “brain” of a Trino installation and is also the node to which a client connects to submit statements for execution. Every Trino installation must have a Trino coordinator alongside one or more Trino workers. For development or testing purposes, a single instance of Trino can be configured to perform both roles.
The coordinator keeps track of the activity on each worker and coordinates the execution of a query. The coordinator creates a logical model of a query involving a series of stages, which is then translated into a series of connected tasks running on a cluster of Trino workers.
Coordinators communicate with workers and clients using a REST API.
trino协调节点是负责解析语句、规划查询和管理trino工作节点的服务器。它是trino安装的“大脑”,也是客户端连接以提交语句执行的节点。每个trino安装必须有一个trino协调节点,以及一个或多个trino工作节点。对于开发或测试目的,可以配置trino的单个实例来执行这两个角色。
协调节点跟踪每个工作节点上的活动,并协调查询的执行。协调节点创建了一个查询的逻辑模型,其中包含一系列阶段,然后将其转换为在trino工作节点集群上运行的一系列相互连接的任务。
协调节点使用REST API与工作节点和客户端进行通信。
Worker#
A Trino worker is a server in a Trino installation, which is responsible for executing tasks and processing data. Worker nodes fetch data from connectors and exchange intermediate data with each other. The coordinator is responsible for fetching results from the workers and returning the final results to the client.
When a Trino worker process starts up, it advertises itself to the discovery server in the coordinator, which makes it available to the Trino coordinator for task execution.
Workers communicate with other workers and Trino coordinators using a REST API.
trino工作节点是trino安装中的服务器,负责执行任务和处理数据。工作节点从连接器获取数据,并相互交换中间数据。协调节点负责从工作节点获取结果,并将最终结果返回给客户端。
当trino工作节点进程启动时,它会将自己通告给协调节点中的发现服务器,这样trino协调节点就可以使用它来执行任务。
工作节点使用REST API与其他工作节点和trino协调节点进行通信。
Data sources#
Throughout this documentation, you’ll read terms such as connector, catalog, schema, and table. These fundamental concepts cover Trino’s model of a particular data source and are described in the following section.
Connector#
A connector adapts Trino to a data source such as Hive or a relational database. You can think of a connector the same way you think of a driver for a database. It is an implementation of Trino’s SPI, which allows Trino to interact with a resource using a standard API.
Trino contains several built-in connectors: a connector for JMX, a System connector which provides access to built-in system tables, a Hive connector, and a TPCH connector designed to serve TPC-H benchmark data. Many third-party developers have contributed connectors so that Trino can access data in a variety of data sources.
Every catalog is associated with a specific connector. If you examine a catalog configuration file, you see that each contains a mandatory property connector.name
, which is used by the catalog manager to create a connector for a given catalog. It is possible to have more than one catalog use the same connector to access two different instances of a similar database. For example, if you have two Hive clusters, you can configure two catalogs in a single Trino cluster that both use the Hive connector, allowing you to query data from both Hive clusters, even within the same SQL query.
连接器将trino适配到如Hive或关系型数据库的数据源。你可以像理解数据库的驱动一样来理解连接器。它是trino的SPI的一个实现,它允许trino使用标准API与资源交互。
trino包含若干内置连接器:JMX连接器、提供对内置系统表访问的系统连接器、Hive连接器以及为TPC-H基准数据服务的TPCH连接器。许多第三方开发人员对连接器做出了贡献,以便trino可以访问各种数据源中的数据。
每个目录都与一个特定的连接器相关联。如果检查目录配置文件,你将看到每个文件都包含一个强制属性connector.name
,目录管理器使用该属性为给定的目录创建连接器。多个目录可以使用同一个连接器来访问类似数据库的两个不同实例。例如,如果有两个Hive集群,你可以在一个trino集群中配置两个都使用Hive连接器的目录,使你可以从两个Hive集群(甚至在同一个SQL查询中)查询数据。
Catalog#
A Trino catalog contains schemas and references a data source via a connector. For example, you can configure a JMX catalog to provide access to JMX information via the JMX connector. When you run SQL statements in Trino, you are running them against one or more catalogs. Other examples of catalogs include the Hive catalog to connect to a Hive data source.
When addressing a table in Trino, the fully-qualified table name is always rooted in a catalog. For example, a fully-qualified table name of hive.test_data.test
refers to the test
table in the test_data
schema in the hive
catalog.
Catalogs are defined in properties files stored in the Trino configuration directory.
trino目录包含模式并通过连接器引用数据源。例如,可以配置一个JMX目录,以便通过JMX连接器访问JMX信息。在trino中运行SQL语句时,将针对一个或多个目录运行该语句。目录的其他示例包括连接Hive数据源的Hive目录。
在trino中寻址表时,完全限定的表名称总是以目录作为根。例如,一个完全限定的表名hive.test_data.test
将引用hive
目录中test_data
模式中的test
表。
目录定义在trino配置目录的属性文件中。
Schema#
Schemas are a way to organize tables. Together, a catalog and schema define a set of tables that can be queried. When accessing Hive or a relational database such as MySQL with Trino, a schema translates to the same concept in the target database. Other types of connectors may choose to organize tables into schemas in a way that makes sense for the underlying data source.
模式是组织表的一种形式。目录和模式一起定义了一组可以查询的表。当使用trino访问Hive或例如MySQL等关系数据库时,模式会在目标数据库中转换为相同的概念。其他类型的连接器可以选择以对基础数据源有意义的方式将表组织到模式中。
Table#
A table is a set of unordered rows, which are organized into named columns with types. This is the same as in any relational database. The mapping from source data to tables is defined by the connector.
表是一组无序行,这些行被组织成具有类型的命名列。这与任何关系数据库中的情况相同。源数据到表的映射由连接器定义。
Query execution model#
Trino executes SQL statements and turns these statements into queries, that are executed across a distributed cluster of coordinator and workers.
trino执行SQL语句,并将这些语句转换为跨协调节点和工作节点分布式集群执行的查询。
Statement#
Trino executes ANSI-compatible SQL statements. When the Trino documentation refers to a statement, it is referring to statements as defined in the ANSI SQL standard, which consists of clauses, expressions, and predicates.
Some readers might be curious why this section lists separate concepts for statements and queries. This is necessary because, in Trino, statements simply refer to the textual representation of a statement written in SQL. When a statement is executed, Trino creates a query along with a query plan that is then distributed across a series of Trino workers.
trino执行ANSI兼容的SQL语句。当trino文档引用一个语句时,既引用ANSI SQL标准中定义的语句。ANSI SQL标准由子句、表达式和谓词组成。
一些读者可能会好奇,为什么本节为语句和查询列出单独的概念。这是必要的,因为在trino中,语句只是引用SQL语句的文本表示形式。当执行一条语句时,trino将创建一个查询和一个查询计划,然后该查询计划将分布在一系列trino工作节点上。
Query#
When Trino parses a statement, it converts it into a query and creates a distributed query plan, which is then realized as a series of interconnected stages running on Trino workers. When you retrieve information about a query in Trino, you receive a snapshot of every component that is involved in producing a result set in response to a statement.
The difference between a statement and a query is simple. A statement can be thought of as the SQL text that is passed to Trino, while a query refers to the configuration and components instantiated to execute that statement. A query encompasses stages, tasks, splits, connectors, and other components and data sources working in concert to produce a result.
当trino解析一个语句时,将其转换为一个查询,并创建一个分布式查询计划,该计划随后实现为trino工作节点上运行的一系列相互连接的阶段。在trino中检索有关查询的信息时,会收到响应语句生成结果集所涉及的每个组件的快照。
语句和查询之间的区别很简单。可以将语句视为传递给trino的SQL文本,而查询则是指执行该语句而实例化的配置和组件。查询包含阶段、任务、分片、连接器和其他组件以及协同工作以生成结果的数据源。
Stage#
When Trino executes a query, it does so by breaking up the execution into a hierarchy of stages. For example, if Trino needs to aggregate data from one billion rows stored in Hive, it does so by creating a root stage to aggregate the output of several other stages, all of which are designed to implement different sections of a distributed query plan.
The hierarchy of stages that comprises a query resembles a tree. Every query has a root stage, which is responsible for aggregating the output from other stages. Stages are what the coordinator uses to model a distributed query plan, but stages themselves don’t run on Trino workers.
当trino执行查询时,它将执行分解为多个阶段。例如,如果trino需要聚合存储在Hive中的10亿行数据,它通过创建一个根阶段来聚合其他几个阶段的输出,这些阶段都设计用于实现分布式查询计划的不同部分。
包含查询的阶段层次结构类似于树。每个查询都有一个根阶段,它负责汇总其他阶段的输出。阶段是协调节点用来为分布式查询计划建模的,但是阶段本身并不在trino工作节点上运行。
Task#
As mentioned in the previous section, stages model a particular section of a distributed query plan, but stages themselves don’t execute on Trino workers. To understand how a stage is executed, you need to understand that a stage is implemented as a series of tasks distributed over a network of Trino workers.
Tasks are the “work horse” in the Trino architecture as a distributed query plan is deconstructed into a series of stages, which are then translated to tasks, which then act upon or process splits. A Trino task has inputs and outputs, and just as a stage can be executed in parallel by a series of tasks, a task is executing in parallel with a series of drivers.
如前一节所述,阶段为分布式查询计划的特定部分建模,但是阶段本身并不在trino工作节点上执行。要理解一个阶段是如何执行的,需要理解一个阶段是实现为一系列任务,这些任务分布在trino工作节点的网络上。
在trino体系结构中,任务是“工作马”,因为分布式查询计划被分解为一系列阶段,这些阶段然后被转换为任务,然后对这些任务作用或处理分片。trino任务具有输入和输出,就像一个阶段可以由一系列任务并行执行一样,一个任务与一系列驱动并行执行。
Split#
Tasks operate on splits, which are sections of a larger data set. Stages at the lowest level of a distributed query plan retrieve data via splits from connectors, and intermediate stages at a higher level of a distributed query plan retrieve data from other stages.
When Trino is scheduling a query, the coordinator queries a connector for a list of all splits that are available for a table. The coordinator keeps track of which machines are running which tasks, and what splits are being processed by which tasks.
分片是一个较大数据集的一部分,任务在分片上执行。分布式查询计划最低级别的阶段通过分片来从连接器检索数据,分布式查询计划较高级别的中间阶段从其他阶段检索数据。
当trino调度查询时,协调节点将查询连接器,以获得可用于表的所有分片的列表。协调节点跟踪哪些机器正在运行哪些任务,以及由哪些任务处理哪些分片。
Driver#
Tasks contain one or more parallel drivers. Drivers act upon data and combine operators to produce output that is then aggregated by a task and then delivered to another task in another stage. A driver is a sequence of operator instances, or you can think of a driver as a physical set of operators in memory. It is the lowest level of parallelism in the Trino architecture. A driver has one input and one output.
任务包含一个或多个并行驱动。驱动根据数据并结合运算符生成输出,然后输出由一个任务聚合,然后在另一个阶段传递给另一个任务。驱动是一系列操作符实例,或者可以将驱动视为内存中操作符的物理集合。它的并行度在trino架构中是最低级别。驱动有一个输入和一个输出。
Operator#
An operator consumes, transforms and produces data. For example, a table scan fetches data from a connector and produces data that can be consumed by other operators, and a filter operator consumes data and produces a subset by applying a predicate over the input data.
算子消费、转换和生产数据。例如,表扫描从连接器获取数据并生成其他运算符可以使用的数据,而过滤运算符通过在输入数据上应用谓词来使用数据并生成子集。
Exchange#
Exchanges transfer data between Trino nodes for different stages of a query. Tasks produce data into an output buffer and consume data from other tasks using an exchange client.
交换在trino节点之间为查询的不同阶段传递数据。任务将数据生成到输出缓冲区中,并使用交换客户端使用来自其他任务的数据。