如何在CentOS 7上安装和使用ClickHouse

介绍 (Introduction)

ClickHouse is an open-source, column-oriented analytics database created by Yandex for OLAP and big data use cases. ClickHouse’s support for real-time query processing makes it suitable for applications that require sub-second analytical results. ClickHouse’s query language is a dialect of SQL that enables powerful declarative querying capabilities while offering familiarity and a smaller learning curve for the end user.

ClickHouseYandexOLAP和大数据使用案例创建的开源,面向列的分析数据库。 ClickHouse对实时查询处理的支持使其适合需要亚秒级分析结果的应用程序。 ClickHouse的查询语言是SQL的一种方言,可启用强大的声明性查询功能,同时为最终用户提供熟悉程度和较小的学习曲线。

Column-oriented databases store records in blocks grouped by columns instead of rows. By not loading data for columns absent in the query, column-oriented databases spend less time reading data while completing queries. As a result, these databases can compute and return results much faster than traditional row-based systems for certain workloads, such as OLAP.

面向列的数据库将记录存储在按列而不是行分组的块中。 通过不加载查询中不存在的列的数据,面向列的数据库在完成查询时花费更少的时间读取数据。 因此,对于某些工作负载(例如OLAP),这些数据库可以比传统的基于行的系统更快地计算和返回结果。

Online Analytics Processing (OLAP) systems allow for organizing large amounts of data and performing complex queries. They are capable of managing petabytes of data and returning query results quickly. In this way, OLAP is useful for work in areas like data science and business analytics.

在线分析处理 (OLAP)系统允许组织大量数据并执行复杂的查询。 它们能够管理PB级数据并快速返回查询结果。 这样,OLAP可用于数据科学和业务分析等领域的工作。

In this tutorial, you’ll install the ClickHouse database server and client on your machine. You’ll use the DBMS for typical tasks and optionally enable remote access from another server so that you’ll be able to connect to the database from another machine. Then you’ll test ClickHouse by modeling and querying example website-visit data.

在本教程中,您将在计算机上安装ClickHouse数据库服务器和客户端。 您将使用DBMS执行典型的任务,并有选择地启用来自另一台服务器的远程访问,以便能够从另一台计算机连接到数据库。 然后,您将通过建模和查询示例网站访问数据来测试ClickHouse。

先决条件 (Prerequisites)

  • One CentOS 7 server with a sudo enabled non-root user and firewall setup. You can follow the initial server setup tutorial to create the user and this tutorial to set up the firewall.

    一台具有启用了sudo CentOS 7服务器的非root用户和防火墙设置。 您可以按照初始服务器设置教程来创建用户,并按照本教程来设置防火墙。

  • (Optional) A secondary CentOS 7 server with a sudo enabled non-root user and firewall setup. You can follow the initial server setup tutorial and the additional setup tutorial for the firewall.

    (可选)具有启用了sudo非root用户和防火墙设置的辅助CentOS 7服务器。 您可以遵循初始服务器设置教程和防火墙的其他设置教程

第1步-安装ClickHouse (Step 1 — Installing ClickHouse)

In this section, you will install the ClickHouse server and client programs using yum.

在本节中,您将使用yum安装ClickHouse服务器和客户端程序。

First, SSH into your server by running:

首先,通过运行SSH进入服务器:

  • ssh sammy@your_server_ip

    ssh sammy @ your_server_ip

Install the base dependencies by executing:

通过执行以下命令安装基本依赖项:

  • sudo yum install -y pygpgme yum-utils

    须藤yum install -y pygpgme yum-utils

The pygpgme packages is used for adding and verifying GPG signatures while the yum-utils allows easy management of source RPMs.

pygpgme软件包用于添加和验证GPG签名,而yum-utils允许轻松管理源RPM。

Altinity, a ClickHouse consulting firm, maintains a YUM repository that has the latest version of ClickHouse. You’ll add the repository’s details to securely download validated ClickHouse packages by creating the file. To check the package contents, you can inspect the sources from which they are built at this Github project.

ClickHouse咨询公司Altinity维护着一个YUM存储库,该存储库具有最新版本的ClickHouse。 您将添加存储库的详细信息,以通过创建文件来安全地下载经过验证的ClickHouse软件包。 要检查包装箱中的物品,您可以在Github项目中检查其来源。

Create the repository details file by executing:

通过执行以下操作来创建存储库详细信息文件:

  • sudo vi /etc/yum.repos.d/altinity_clickhouse.repo

    须藤vi /etc/yum.repos.d/altinity_clickhouse.repo

Next, add the following contents to the file:

接下来,将以下内容添加到文件中:

/etc/yum.repos.d/altinity_clickhouse.repo
/etc/yum.repos.d/altinity_clickhouse.repo
[altinity_clickhouse]
name=altinity_clickhouse
baseurl=https://packagecloud.io/altinity/clickhouse/el/7/$basearch
repo_gpgcheck=1
gpgcheck=0
enabled=1
gpgkey=https://packagecloud.io/altinity/clickhouse/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
metadata_expire=300

[altinity_clickhouse-source]
name=altinity_clickhouse-source
baseurl=https://packagecloud.io/altinity/clickhouse/el/7/SRPMS
repo_gpgcheck=1
gpgcheck=0
enabled=1
gpgkey=https://packagecloud.io/altinity/clickhouse/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
metadata_expire=300

Now that you’ve added the repositories, enable them with the following command:

现在,您已经添加了存储库,使用以下命令启用它们:

  • sudo yum -q makecache -y --disablerepo='*' --enablerepo='altinity_clickhouse'

    sudo yum -q makecache -y --disablerepo ='*'--enablerepo ='altinity_clickhouse'

The -q flag tells the command to run in quiet mode. The makecache command makes available the packages specified in the --enablerepo flag.

-q标志告诉命令在安静模式下运行。 makecache命令使--enablerepo标志中指定的软件包可用。

On execution, you’ll see output similar to the following:

执行后,您将看到类似于以下内容的输出:


   
   
Output
Importing GPG key 0x0F6E36F6: Userid : "https://packagecloud.io/altinity/clickhouse (https://packagecloud.io/docs#gpg_signing) <support@packagecloud.io>" Fingerprint: 7001 38a9 6a20 6b22 bf28 3c06 ed26 58f3 0f6e 36f6 From : https://packagecloud.io/altinity/clickhouse/gpgkey

The output confirms it has successfully verified and added the GPG key.

输出确认已成功验证并添加了GPG密钥。

The clickhouse-server and clickhouse-client packages will now be available for installation. Install them with:

clickhouse-serverclickhouse-client软件包现在将可用于安装。 通过以下方式安装它们:

  • sudo yum install -y clickhouse-server clickhouse-client

    须藤yum install -y clickhouse-server clickhouse-client

You’ve installed the ClickHouse server and client successfully. You’re now ready to start the database service and ensure that it’s running correctly.

您已经成功安装了ClickHouse服务器和客户端。 现在,您可以启动数据库服务并确保其正常运行。

步骤2 —启动服务 (Step 2 — Starting the Service)

The clickhouse-server package that you installed in the previous section creates a systemd service, which performs actions such as starting, stopping, and restarting the database server. systemd is an init system for Linux to initialize and manage services. In this section you’ll start the service and verify that it is running successfully.

您在上一节中安装的clickhouse-server软件包将创建一个systemd服务,该服务执行诸如启动,停止和重新启动数据库服务器的操作。 systemd是Linux用来初始化和管理服务的初始化系统。 在本部分中,您将启动服务并验证它是否已成功运行。

Start the clickhouse-server service by running:

通过运行以下命令来启动clickhouse-server服务:

  • sudo service clickhouse-server start

    sudo服务clickhouse-服务器启动

You will see output similar to the following:

您将看到类似于以下内容的输出:


   
   
Output
Start clickhouse-server service: Path to data directory in /etc/clickhouse-server/config.xml: /var/lib/clickhouse/ DONE

To verify that the service is running successfully, execute:

要验证服务是否成功运行,请执行:

  • sudo service clickhouse-server status

    sudo服务clickhouse-服务器状态

It will print an output similar to the following which denotes that the server is running properly:

它将输出类似于以下内容的输出,表示服务器正常运行:


   
   
Output
clickhouse-server service is running

You have successfully started the ClickHouse server and will now be able to use the clickhouse-client CLI program to connect to the server.

您已经成功启动了ClickHouse服务器,现在将能够使用clickhouse-client CLI程序连接到服务器。

第3步-创建数据库和表 (Step 3 — Creating Databases and Tables)

In ClickHouse, you can create and delete databases by executing SQL statements directly in the interactive database prompt. Statements consist of commands following a particular syntax that tell the database server to perform a requested operation along with any data required. You create databases by using the CREATE DATABASE table_name syntax. To create a database, first start a client session by running the following command:

在ClickHouse中,可以通过在交互式数据库提示中直接执行SQL语句来创建和删除数据库。 语句由遵循特定语法的命令组成,这些命令告诉数据库服务器执行请求的操作以及所需的任何数据。 您可以使用CREATE DATABASE table_name语法创建数据库。 要创建数据库,请首先通过运行以下命令启动客户端会话:

  • clickhouse-client --multiline

    clickhouse-client --multiline

This command will log you into the client prompt where you can run ClickHouse SQL statements to perform actions such as:

此命令将使您登录到客户端提示符,您可以在其中运行ClickHouse SQL语句以执行诸如以下操作:

  • Creating, updating, and deleting databases, tables, indexes, partitions, and views.

    创建,更新和删除数据库,表,索引,分区和视图。

  • Executing queries to retrieve data that is optionally filtered and grouped using various conditions.

    执行查询以检索使用各种条件过滤和分组的数据。

The --multiline flag tells the CLI to allow entering queries that span multiple lines.

--multiline标志告诉CLI允许输入跨越多行的查询。

In this step, with the ClickHouse client ready for inserting data, you’re going to create a database and table. For the purposes of this tutorial, you’ll create a database named test, and inside that you’ll create a table named visits that tracks website-visit durations.

在这一步中,准备好ClickHouse客户端插入数据后,您将创建一个数据库和表。 在本教程中,您将创建一个名为test的数据库,并在其中创建一个名为visits的表,该表可跟踪网站访问的持续时间。

Now that you’re inside the ClickHouse command prompt, create your test database by executing:

现在,您位于ClickHouse命令提示符下,通过执行以下操作创建test数据库:

  • CREATE DATABASE test;

    创建数据库测试 ;

You’ll see the following output that shows that you have created the database:

您将看到以下输出,显示您已创建数据库:


   
   
Output
CREATE DATABASE test Ok. 0 rows in set. Elapsed: 0.003 sec.

A ClickHouse table is similar to tables in other relational databases; it holds a collection of related data in a structured format. You can specify columns along with their types, add rows of data, and execute different kinds of queries on tables.

ClickHouse表类似于其他关系数据库中的表。 它以结构化格式保存相关数据的集合。 您可以指定列及其类型,添加数据行,并在表上执行各种查询。

The syntax for creating tables in ClickHouse follows this example structure:

在ClickHouse中创建表的语法遵循以下示例结构:

CREATE TABLE table_name
(
    column_name1 column_type [options],
    column_name2 column_type [options],
    ...
) ENGINE = engine

The table_name and column_name values can be any valid ASCII identifiers. ClickHouse supports a wide range of column types; some of the most popular are:

table_namecolumn_name值可以是任何有效的ASCII标识符。 ClickHouse支持多种列类型; 一些最受欢迎的是:

  • UInt64: used for storing integer values in the range 0 to 18446744073709551615.

    UInt64 :用于存储0到18446744073709551615之间的整数值。

  • Float64: used for storing floating point numbers such as 2039.23, 10.5, etc.

    Float64 :用于存储浮点数,例如2039.23、10.5等。

  • String: used for storing variable length characters. It does not require a max length attribute since it can store arbitrary lengths.

    String :用于存储可变长度字符。 它不需要最大长度属性,因为它可以存储任意长度。

  • Date: used for storing dates that follow the YYYY-MM-DD format.

    Date :用于存储遵循YYYY-MM-DD格式的日期。

  • DateTime: used for storing dates coupled with time and follows the YYYY-MM-DD HH:MM:SS format.

    DateTime :用于存储带有时间的日期,并遵循YYYY-MM-DD HH:MM:SS格式。

After the column definitions, you specify the engine used for the table. In ClickHouse, Engines determine the physical structure of the underlying data, the table’s querying capabilities, its concurrent access modes, and support for indexes. Different engine types are suitable for different application requirements. The most commonly used and widely applicable engine type is MergeTree.

在列定义之后,您可以指定用于表的引擎。 在ClickHouse中, 引擎确定基础数据的物理结构,表的查询功能,并发访问模式以及对索引的支持。 不同的发动机类型适合于不同的应用要求。 最常用和广泛适用的引擎类型是MergeTree

Now that you have an overview of table creation, you’ll create a table. Start by confirming the database you’ll be modifying:

现在,您已经对表创建有了概述,接下来将创建一个表。 首先确认要修改的数据库:

  • USE test;

    USE 测试 ;

You will see the following output showing that you have switched to the test database from the default database:

您将看到以下输出,显示您已从default数据库切换到test数据库:


   
   
Output
USE test Ok. 0 rows in set. Elapsed: 0.001 sec.

The remainder of this guide will assume that you are executing statements within this database’s context.

本指南的其余部分将假定您正在该数据库的上下文中执行语句。

Create your visits table by running this command:

通过运行以下命令来创建您的visits表:

  • CREATE TABLE visits (

    创建表访问 (

  • id UInt64,

    ID UInt64 ,

  • duration Float64,

    持续时间Float64 ,

  • url String,

    网址字串 ,

  • created DateTime

    创建的DateTime

  • ) ENGINE = MergeTree()

    )引擎= MergeTree()

  • PRIMARY KEY id

    主键编号

  • ORDER BY id;

    按id订购;

Here’s a breakdown of what the command does. You create a table named visits that has four columns:

这是该命令的功能细分。 您创建一个名为visits的表,该表具有四列:

  • id: The primary key column. Similarly to other RDBMS systems, a primary key column in ClickHouse uniquely identifies a row; each row should have a unique value for this column.

    id :主键列。 与其他RDBMS系统类似,ClickHouse中的主键列唯一标识一行; 每行对于此列应具有唯一的值。

  • duration: A float column used to store the duration of each visit in seconds. float columns can store decimal values such as 12.50.

    duration :一个float列,用于存储每次访问的持续时间(以秒为单位)。 float列可以存储十进制值,例如12.50。

  • url: A string column that stores the URL visited, such as http://example.com.

    url :一个字符串列,用于存储访问的URL,例如http://example.com

  • created: A date and time column that tracks when the visit occurred.

    created :一个日期和时间列,用于跟踪访问发生的时间。

After the column definitions, you specify MergeTree as the storage engine for the table. The MergeTree family of engines is recommended for production databases due to its optimized support for large real-time inserts, overall robustness, and query support. Additionally, MergeTree engines support sorting of rows by primary key, partitioning of rows, and replicating and sampling data.

在列定义之后,将MergeTree指定为表的存储引擎。 建议将MergeTree引擎家族用于生产数据库,因为它对大型实时插入,总体鲁棒性和查询支持进行了优化支持。 此外,MergeTree引擎支持按主键对行进行排序,对行进行分区以及复制和采样数据。

If you intend to use ClickHouse for archiving data that is not queried often or for storing temporary data, you can use the Log family of engines to optimize for that use-case.

如果打算使用ClickHouse归档不经常查询的数据或存储临时数据,则可以使用Log引擎引擎针对该用例进行优化。

After the column definitions, you’ll define other table-level options. The PRIMARY KEY clause sets id as the primary key column and the ORDER BY clause will store values sorted by the id column. A primary key uniquely identifies a row and is used for efficiently accessing a single row and efficient colocation of rows.

在列定义之后,您将定义其他表级选项。 PRIMARY KEY子句将id设置为主键列,而ORDER BY子句将存储按id列排序的值。 主键唯一地标识一行,并用于有效访问单行和行的有效并置。

On executing the create statement, you will see the following output:

执行create语句时,您将看到以下输出:


   
   
Output
CREATE TABLE visits ( id UInt64, duration Float64, url String, created DateTime ) ENGINE = MergeTree() PRIMARY KEY id ORDER BY id Ok. 0 rows in set. Elapsed: 0.010 sec.

In this section, you’ve created a database and a table to track website-visits data. In the next step, you’ll insert data into the table, update existing data, and delete that data.

在本部分中,您已经创建了一个数据库和一个表来跟踪网站访问数据。 在下一步中,您将数据插入表中,更新现有数据,然后删除该数据。

步骤4 —插入,更新和删除数据和列 (Step 4 — Inserting, Updating, and Deleting Data and Columns)

In this step, you’ll use your visits table to insert, update, and delete data. The following command is an example of the syntax for inserting rows into a ClickHouse table:

在此步骤中,您将使用visits表来插入,更新和删除数据。 以下命令是在ClickHouse表中插入行的语法示例:

INSERT INTO table_name VALUES (column_1_value, column_2_value, ....);

Now, insert a few rows of example website-visit data into your visits table by running each of the following statements:

现在,通过运行以下每个语句,将几行示例网站访问数据插入到您的visits表中:

  • INSERT INTO visits VALUES (1, 10.5, 'http://example.com', '2019-01-01 00:01:01');

    INSERT INTO 访问 VALUES( 1,10.5,'http://example.com','2019-01-01 00:01:01' );

  • INSERT INTO visits VALUES (2, 40.2, 'http://example1.com', '2019-01-03 10:01:01');

    INSERT INTO 访问 VALUES( 2,40.2,'http://example1.com','2019-01-03 10:01:01' );

  • INSERT INTO visits VALUES (3, 13, 'http://example2.com', '2019-01-03 12:01:01');

    INSERT INTO 访问 VALUES( 3,13,'http://example2.com','2019-01-03 12:01:01' );

  • INSERT INTO visits VALUES (4, 2, 'http://example3.com', '2019-01-04 02:01:01');

    INSERT INTO 访问 VALUES( 4,2,'http://example3.com','2019-01-04 02:01:01' );

You’ll see the following output repeated for each insert statement:

您将看到每个插入语句重复以下输出:


   
   
Output
INSERT INTO visits VALUES Ok. 1 rows in set. Elapsed: 0.004 sec.

The output for each row shows that you’ve inserted it successfully into the visits table.

每行的输出表明您已成功将其插入到visits表中。

Now you’ll add an additional column to the visits table. When adding or deleting columns from existing tables, ClickHouse supports the ALTER syntax.

现在,您将在visits表中添加一列。 在现有表中添加或删除列时,ClickHouse支持ALTER语法。

For example, the basic syntax for adding a column to a table is as follows:

例如,将列添加到表的基本语法如下:

ALTER TABLE table_name ADD COLUMN column_name column_type;

Add a column named location that will store the location of the visits to a website by running the following statement:

通过运行以下语句,添加一个名为location的列,该列将存储访问网站的位置:

  • ALTER TABLE visits ADD COLUMN location String;

    ALTER TABLE 访问 ADD COLUMN 位置String ;

You’ll see output similar to the following:

您将看到类似于以下内容的输出:


   
   
Output
ALTER TABLE visits ADD COLUMN location String Ok. 0 rows in set. Elapsed: 0.014 sec.

The output shows that you have added the location column successfully.

输出显示您已经成功添加了location列。

As of version 19.4.3, ClickHouse doesn’t support updating and deleting individual rows of data due to implementation constraints. ClickHouse has support for bulk updates and deletes, however, and has a distinct SQL syntax for these operations to highlight their non-standard usage.

从19.4.3版开始,由于实施限制,ClickHouse不支持更新和删除单独的数据行。 ClickHouse支持批量更新和删除,并且为这些操作提供了独特SQL语法,以突出它们的非标准用法。

The following syntax is an example for bulk updating rows:

以下语法是批量更新行的示例:

ALTER TABLE table_name UPDATE  column_1 = value_1, column_2 = value_2 ...  WHERE  filter_conditions;

You’ll run the following statement to update the url column of all rows that have a duration of less than 15. Enter it into the database prompt to execute:

您将运行以下语句来更新duration小于15的所有行的url列。将其输入数据库提示以执行:

  • ALTER TABLE visits UPDATE url = 'http://example2.com' WHERE duration < 15;

    ALTER TABLE 访问 UPDATE url =' http://example2.com ' 持续时间<15 ;

The output of the bulk update statement will be as follows:

批量更新语句的输出如下:


   
   
Output
ALTER TABLE visits UPDATE url = 'http://example2.com' WHERE duration < 15 Ok. 0 rows in set. Elapsed: 0.003 sec.

The output shows that your update query completed successfully. The 0 rows in set in the output denotes that the query did not return any rows; this will be the case for any update and delete queries.

输出显示更新查询成功完成。 输出0 rows in set0 rows in set表示查询未返回任何行; 任何更新和删除查询都会如此。

The example syntax for bulk deleting rows is similar to updating rows and has the following structure:

批量删除行的示例语法类似于更新行,并具有以下结构:

ALTER TABLE table_name DELETE WHERE filter_conditions;

To test deleting data, run the following statement to remove all rows that have a duration of less than 5:

要测试删除数据,请运行以下语句以删除duration小于5的所有行:

  • ALTER TABLE visits DELETE WHERE duration < 5;

    ALTER TABLE 访问 DELETE WHERE 持续时间<5 ;

The output of the bulk delete statement will be similar to:

批量删除语句的输出将类似于:


   
   
Output
ALTER TABLE visits DELETE WHERE duration < 5 Ok. 0 rows in set. Elapsed: 0.003 sec.

The output confirms that you have deleted the rows with a duration of less than five seconds.

输出确认您已删除持续时间少于五秒的行。

To delete columns from your table, the syntax would follow this example structure:

要从表中删除列,语法应遵循以下示例结构:

ALTER TABLE table_name DROP COLUMN column_name;

Delete the location column you added previously by running the following:

通过运行以下命令删除先前添加的location列:

  • ALTER TABLE visits DROP COLUMN location;

    ALTER TABLE 访问 DROP COLUMN的位置 ;

The DROP COLUMN output confirming that you have deleted the column will be as follows:

确认已删除该列的DROP COLUMN输出如下:


   
   
Output
ALTER TABLE visits DROP COLUMN location String Ok. 0 rows in set. Elapsed: 0.010 sec.

Now that you’ve successfully inserted, updated, and deleted rows and columns in your visits table, you’ll move on to query data in the next step.

现在,您已经成功地在visits表中插入,更新和删除了行和列,接下来将继续查询数据。

第5步-查询数据 (Step 5 — Querying Data)

ClickHouse’s query language is a custom dialect of SQL with extensions and functions suited for analytics workloads. In this step, you’ll run selection and aggregation queries to retrieve data and results from your visits table.

ClickHouse的查询语言是SQL的自定义方言,具有适用于分析工作负载的扩展名和功能。 在此步骤中,您将运行选择查询和汇总查询以从visits表中检索数据和结果。

Selection queries allow you to retrieve rows and columns of data filtered by conditions that you specify, along with options such as the number of rows to return. You can select rows and columns of data using the SELECT syntax. The basic syntax for SELECT queries is:

选择查询使您可以检索按指定条件过滤的数据的行和列,以及诸如要返回的行数之类的选项。 您可以使用SELECT语法SELECT数据的行和列。 SELECT查询的基本语法为:

SELECT func_1(column_1), func_2(column_2) FROM table_name WHERE filter_conditions row_options;

Execute the following statement to retrieve url and duration values for rows where the url is http://example.com:

执行以下语句以检索urlhttp://example.com行的urlduration值:

  • SELECT url, duration FROM visits WHERE url = 'http://example2.com' LIMIT 2;

    SELECT网址,持续时间FROM 访问次数,其中 URL ='http://example2.com'LIMIT 2;

You will see the following output:

您将看到以下输出:


   
   
Output
SELECT url, duration FROM visits WHERE url = 'http://example2.com' LIMIT 2 ┌─url─────────────────┬─duration─┐ │ http://example2.com │ 10.5 │ └─────────────────────┴──────────┘ ┌─url─────────────────┬─duration─┐ │ http://example2.com │ 13 │ └─────────────────────┴──────────┘ 2 rows in set. Elapsed: 0.013 sec.

The output has returned two rows that match the conditions you specified. Now that you’ve selected values, you can move to executing aggregation queries.

输出已返回符合您指定条件的两行。 现在,您已经选择了值,您可以转到执行聚合查询。

Aggregation queries are queries that operate on a set of values and return single output values. In analytics databases, these queries are run frequently and are well optimized by the database. Some aggregate functions supported by ClickHouse are:

聚合查询是对一组值进行操作并返回单个输出值的查询。 在分析数据库中,这些查询会经常运行,并由数据库很好地优化。 ClickHouse支持的一些汇总功能是:

  • count: returns the count of rows matching the conditions specified.

    count :返回符合指定条件的行数。

  • sum: returns the sum of selected column values.

    sum :返回选定列值的总和。

  • avg: returns the average of selected column values.

    avg :返回所选列值的平均值。

Some ClickHouse-specific aggregate functions include:

一些特定于ClickHouse的聚合函数包括:

  • uniq: returns an approximate number of distinct rows matched.

    uniq :返回匹配的不同行的大约数量。

  • topK: returns an array of the most frequent values of a specific column using an approximation algorithm.

    topK :使用近似算法返回特定列的最频繁值的数组。

To demonstrate the execution of aggregation queries, you’ll calculate the total duration of visits by running the sum query:

为了演示聚合查询的执行,您将通过运行sum查询来计算访问的总持续时间:

  • SELECT SUM(duration) FROM visits;

    从访问中选择总和(持续时间) ;

You will see output similar to the following:

您将看到类似于以下内容的输出:


   
   
Output
SELECT SUM(duration) FROM visits ┌─SUM(duration)─┐ │ 63.7 │ └───────────────┘ 1 rows in set. Elapsed: 0.010 sec.

Now, calculate the top two URLs by executing:

现在,通过执行以下命令来计算前两个URL:

  • SELECT topK(2)(url) FROM visits;

    从访问中选择topK(2)(url) ;

You will see output similar to the following:

您将看到类似于以下内容的输出:


   
   
Output
SELECT topK(2)(url) FROM visits ┌─topK(2)(url)──────────────────────────────────┐ │ ['http://example2.com','http://example1.com'] │ └───────────────────────────────────────────────┘ 1 rows in set. Elapsed: 0.010 sec.

Now that you have successfully queried your visits table, you’ll delete tables and databases in the next step.

现在您已经成功查询了visits表,接下来将删除表和数据库。

第6步-删除表和数据库 (Step 6 — Deleting Tables and Databases)

In this section, you’ll delete your visits table and test database.

在本部分中,您将删除您的visits表并test数据库。

The syntax for deleting tables follows this example:

删除表的语法遵循以下示例:

DROP TABLE table_name;

To delete the visits table, run the following statement:

要删除visits表,请运行以下语句:

  • DROP TABLE visits;

    DROP TABLE 访问 ;

You will see the following output declaring that you’ve deleted the table successfully:

您将看到以下输出,声明您已成功删除该表:


   
   
Output
DROP TABLE visits Ok. 0 rows in set. Elapsed: 0.005 sec.

You can delete databases using the DROP database table_name syntax. To delete the test database, execute the following statement:

您可以使用DROP database table_name语法删除数据库。 要删除test数据库,请执行以下语句:

  • DROP DATABASE test;

    DROP DATABASE 测试 ;

The resulting output shows that you’ve deleted the database successfully:

结果显示,您已成功删除数据库:


   
   
Output
DROP DATABASE test Ok. 0 rows in set. Elapsed: 0.003 sec.

You’ve deleted tables and databases in this step. Now that you’ve created, updated, and deleted databases, tables, and data in your ClickHouse instance, you’ll enable remote access to your database server in the next section.

您已在此步骤中删除了表和数据库。 现在,您已经在ClickHouse实例中创建,更新和删除了数据库,表和数据,接下来的部分将使您能够远程访问数据库服务器。

步骤7 —设置防火墙规则(可选) (Step 7 — Setting Up Firewall Rules (Optional))

If you intend to only use ClickHouse locally with applications running on the same server, or do not have a firewall enabled on your server, you don’t need to complete this section. If instead, you’ll be connecting to the ClickHouse database server remotely, you should follow this step.

如果打算仅将ClickHouse与在同一服务器上运行的应用程序一起在本地使用,或者在服务器上未启用防火墙,则无需完成本节。 如果相反,您将远程连接到ClickHouse数据库服务器,则应遵循此步骤。

Currently your server has a firewall enabled that disables your public IP address accessing all ports. You’ll complete the following two steps to allow remote access:

当前,您的服务器启用了防火墙,该防火墙禁止您的公共IP地址访问所有端口。 您将完成以下两个步骤以允许远程访问:

  • Add a firewall rule allowing incoming connections to port 8123, which is the HTTP port that ClickHouse server runs.

    添加一个防火墙规则,以允许进入端口8123传入连接,端口8123是ClickHouse服务器运行的HTTP端口。

If you are inside the database prompt, exit it by typing CTRL+D.

如果您在数据库提示中,请输入CTRL+D退出它。

Edit the configuration file by executing:

通过执行以下操作来编辑配置文件:

  • sudo vi /etc/clickhouse-server/config.xml

    须藤vi /etc/clickhouse-server/config.xml

Then uncomment the line containing <!-- <listen_host>0.0.0.0</listen_host> -->, like the following file:

然后取消注释包含<!-- <listen_host>0.0.0.0</listen_host> --> ,例如以下文件:

/etc/clickhouse-server/config.xml
/etc/clickhouse-server/config.xml
...
 <interserver_http_host>example.yandex.ru</interserver_http_host>
    -->

    <!-- Listen specified host. use :: (wildcard IPv6 address), if you want to accept connections both with IPv4 and IPv6 from everywhere. -->
    <!-- <listen_host>::</listen_host> -->
    <!-- Same for hosts with disabled ipv6: -->
    <listen_host>0.0.0.0</listen_host>

    <!-- Default values - try listen localhost on ipv4 and ipv6: -->
    <!--
    <listen_host>::1</listen_host>
    <listen_host>127.0.0.1</listen_host>
    -->
...

Save the file and exit vi. For the new configuration to apply restart the service by running:

保存文件并退出vi 。 要应用新配置,请运行以下命令重新启动服务:

  • sudo service clickhouse-server restart

    sudo服务clickhouse-服务器重启

You will see the following output from this command:

您将从此命令中看到以下输出:


   
   
Output
Stop clickhouse-server service: DONE Start clickhouse-server service: Path to data directory in /etc/clickhouse-server/config.xml: /var/lib/clickhouse/ DONE

Add the remote server’s IP to zone called public:

将远程服务器的IP添加到名为public的区域中:

  • sudo firewall-cmd --permanent --zone=public --add-source=second_server_ip/32

    sudo firewall-cmd-永久--zone = public --add-source = second_server_ip / 32

ClickHouse’s server listens on port 8123 for HTTP connections and port 9000 for connections from clickhouse-client. Allow access to both ports for your second server’s IP address with the following command:

ClickHouse的服务器在端口8123上侦听HTTP连接,在端口9000上侦听来自clickhouse-client连接。 使用以下命令允许访问第二个服务器的IP地址的两个端口:

  • sudo firewall-cmd --permanent --zone=public --add-port=8123/tcp

    须藤防火墙cmd-永久-区域=公共-添加端口= 8123 / tcp
  • sudo firewall-cmd --permanent --zone=public --add-port=9000/tcp

    须藤防火墙cmd-永久-区域=公共-添加端口= 9000 / tcp

You will see the following output for both commands that shows that you’ve enabled access to both ports:

这两个命令都将看到以下输出,表明您已启用对两个端口的访问:


   
   
Output
success

Now that you have added the rules, reload the firewall for the changes to take effect:

现在,您已经添加了规则,请重新加载防火墙以使更改生效:

  • sudo firewall-cmd --reload

    sudo firewall-cmd-重新加载

This command will output a success message as well. ClickHouse will now be accessible from the IP that you added. Feel free to add additional IPs such as your local machine’s address if required.

此命令也会输出一条success消息。 现在可以从您添加的IP访问ClickHouse。 如果需要,可以随意添加其他IP,例如本地计算机的地址。

To verify that you can connect to the ClickHouse server from the remote machine, first follow the steps in Step 1 of this tutorial on the second server and ensure that you have the clickhouse-client installed on it.

要验证您可以从远程计算机连接到ClickHouse服务器,请首先在第二台服务器上按照本教程的步骤1中的步骤进行操作,并确保已在其上安装clickhouse-client

Now that you have logged into the second server, start a client session by executing:

现在,您已经登录到第二台服务器,通过执行以下命令启动客户端会话:

  • clickhouse-client --host your_server_ip --multiline

    clickhouse-client --host your_server_ip --multiline

You will see the following output that shows that you have connected successfully to the server:

您将看到以下输出,显示您已成功连接到服务器:


   
   
Output
ClickHouse client version 19.4.3. Connecting to your_server_ip:9000 as user default. Connected to ClickHouse server version 19.4.3 revision 54416. hostname :)

In this step, you’ve enabled remote access to your ClickHouse database server by adjusting your firewall rules.

在此步骤中,您已通过调整防火墙规则启用了对ClickHouse数据库服务器的远程访问。

结论 (Conclusion)

You have successfully set up a ClickHouse database instance on your server and created a database and table, added data, performed queries, and deleted the database. Within ClickHouse’s documentation you can read about their benchmarks against other open-source and commercial analytics databases and general reference documents. Further features ClickHouse offers includes distributed query processing across multiple servers to improve performance and protect against data loss by storing data over different shards.

您已经在服务器上成功设置了ClickHouse数据库实例,并创建了数据库和表,添加了数据,执行了查询并删除了该数据库。 在ClickHouse的文档,你可以了解他们的基准,对其他的开源和商业分析数据库和一般的参考文献 。 ClickHouse提供的其他功能包括跨多个服务器进行分布式查询处理,以通过将数据存储在不同的分上来提高性能并防止数据丢失。

翻译自: https://www.digitalocean.com/community/tutorials/how-to-install-and-use-clickhouse-on-centos-7

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值