The author selected the COVID-19 Relief Fund to receive a donation as part of the Write for DOnations program.
作者选择了COVID-19救济基金来接受捐赠,这是Write for DOnations计划的一部分。
介绍 (Introduction)
Many applications, such as monitoring systems and data collection systems, accumulate data for further analysis. These analyses often look at the way a piece of data or a system changes over time. In these instances, data is represented as a time series, with every data point accompanied by a timestamp. An example would look like this:
许多应用程序(例如监视系统和数据收集系统)会累积数据以进行进一步分析。 这些分析通常着眼于数据或系统随时间变化的方式。 在这些情况下,数据表示为时间序列,每个数据点都带有时间戳。 一个示例如下所示:
2020-06-01 09:00:00 server.cpu.1 0.9
2020-06-01 09:00:00 server.cpu.15 0.8
2020-06-01 09:01:00 server.cpu.1 0.9
2020-06-01 09:01:00 server.cpu.15 0.8
...
The relevance of time series data has recently grown thanks to the new deployments of the Internet of Things (IoT) and Industrial Internet of Things. There are more and more devices that collect various time-series information: fitness trackers, smart watches, home weather stations, and various sensors, to name a few. These devices collect a lot of information, and all this data must be stored somewhere.
由于物联网 (IoT)和工业物联网的新部署,时间序列数据的相关性最近有所增长。 越来越多的设备收集各种时间序列信息:健身追踪器,智能手表,家庭气象站和各种传感器,仅举几例。 这些设备收集了大量信息,所有这些数据必须存储在某个地方。
Classic relational databases are most often used to store data, but they don’t always fit when it comes to the huge data volumes of time series. When you need to process a large amount of time series data, relational databases can be too slow. Because of this, specially optimized databases, called NoSQL databases, have been created to avoid the problems of relational databases.
经典的关系数据库最常用于存储数据,但涉及时间序列的庞大数据量时,它们并不总是适合。 当您需要处理大量时间序列数据时,关系数据库可能太慢。 因此,已经创建了专门优化的数据库,称为NoSQL数据库 ,以避免关系数据库的问题。
TimescaleDB is an open-source database optimized for storing time-series data. It is implemented as an extension of PostgreSQL and combines the ease-of-use of relational databases and the speed of NoSQL databases. As a result, it allows you to use PostgreSQL for both storing business data and time-series data in one place.
TimescaleDB是一个开放源数据库,已针对存储时间序列数据进行了优化。 它实现为PostgreSQL的扩展,并结合了关系数据库的易用性和NoSQL数据库的速度。 结果,它允许您使用PostgreSQL在一个地方存储业务数据和时间序列数据。
By following this tutorial, you’ll set up TimescaleDB on Ubuntu 20.04, configure it, and learn how to work with it. You’ll run through creating time-series databases and making simple queries. Finally, you’ll see how to get rid of unnecessary data.
通过遵循本教程,您将在Ubuntu 20.04上设置TimescaleDB,对其进行配置,并学习如何使用它。 您将贯穿创建时间序列数据库并进行简单查询。 最后,您将看到如何清除不必要的数据。
先决条件 (Prerequisites)
To follow this tutorial, you will need:
要遵循本教程,您将需要:
One Ubuntu 20.04 server set up by following our Initial Server Setup Guide for Ubuntu 20.04, including a non-root user with sudo privileges and a firewall.
按照我们针对Ubuntu 20.04的初始服务器安装指南设置一台Ubuntu 20.04服务器,其中包括具有sudo特权的非root用户和防火墙。
PostgreSQL installed on your server. Follow Step 1 of How To Install and Use PostgreSQL on Ubuntu 20.04 to install it.
在您的服务器上安装了PostgreSQL。 按照如何在Ubuntu 20.04上安装和使用PostgreSQL步骤1进行安装。
第1步-安装TimescaleDB (Step 1 — Installing TimescaleDB)
TimescaleDB is not available in Ubuntu’s default package repositories, so in this step you will install it from the TimescaleDB Personal Packages Archive (PPA).
Ubuntu的默认软件包存储库中没有TimescaleDB,因此在此步骤中,您将从TimescaleDB Personal Packages Archive(PPA)安装它。
First, add Timescale’s APT repository:
首先,添加Timescale的APT存储库:
- sudo add-apt-repository ppa:timescale/timescaledb-ppa sudo add-apt-repository ppa:timescale / timescaledb-ppa
Confirm this action by hitting the ENTER
key.
通过按ENTER
键确认此操作。
Next, refresh your APT cache to update your package lists:
接下来,刷新您的APT缓存以更新您的软件包列表:
- sudo apt update sudo apt更新
You can now proceed with the installation. This tutorial uses PostgreSQL version 12; if you are using a different version of PostgreSQL (11 or 10, for example), replace the value in the following command and run it:
现在,您可以继续安装。 本教程使用PostgreSQL版本12; 如果使用的是其他版本的PostgreSQL(例如11或10),请替换以下命令中的值并运行它:
sudo apt install timescaledb-postgresql-12
sudo apt安装timescaledb-postgresql- 12
Note: Support for PostgreSQL versions 9.6.3+ and 10.9+ is deprecated and will be removed in a future releases.
注意:不支持PostgreSQL版本9.6.3+和10.9+,并将在以后的版本中删除。
TimescaleDB is now installed and ready to be used. Next, you will turn it on and adjust some of the settings associated with it in the PostgreSQL configuration file to optimize the database.
现在已安装TimescaleDB并准备使用它。 接下来,您将打开它并在PostgreSQL配置文件中调整与它相关联的一些设置以优化数据库。
第2步-配置TimescaleDB (Step 2 — Configuring TimescaleDB)
The TimescaleDB module works fine with the default PostgreSQL configuration settings, but to improve performance and make better use of processor, memory, and disk resources, developers of TimescaleDB suggest configuring some individual parameters. This can be done automatically with the timescaledb-tune
tool or by manually editing your server’s postgresql.conf
file.
TimescaleDB模块可以与默认的PostgreSQL配置设置一起正常工作,但是为了提高性能并更好地利用处理器,内存和磁盘资源,TimescaleDB的开发人员建议配置一些单独的参数。 这可以使用timescaledb-tune
工具自动完成,也可以通过手动编辑服务器的postgresql.conf
文件来完成。
In this tutorial, you will use the timescaledb-tune
tool. It reads the postgresql.conf
file and interactively suggests making changes.
在本教程中,您将使用timescaledb-tune
工具 。 它读取postgresql.conf
文件并以交互方式建议进行更改。
Run the following command to start the configuration wizard:
运行以下命令以启动配置向导:
- sudo timescaledb-tune 须藤timescaledb-tune
First, you will be asked to confirm the path to the PostgreSQL configuration file:
首先,将要求您确认PostgreSQL配置文件的路径:
Output
Using postgresql.conf at this path:
/etc/postgresql/12/main/postgresql.conf
Is this correct? [(y)es/(n)o]:
The utility automatically detects the path to the configuration file, so confirm this by entering y
:
该实用程序会自动检测到配置文件的路径,因此请输入y
确认:
Output
...
Is this correct? [(y)es/(n)o]: y
Writing backup to:
/tmp/timescaledb_tune.backup202005300523
Next, you will be prompted to change the shared_preload_libraries
variable to preload the TimescaleDB module upon starting the PostgreSQL server:
接下来,系统将提示您更改shared_preload_libraries
变量,以在启动PostgreSQL服务器时预加载TimescaleDB模块:
Output
shared_preload_libraries needs to be updated
Current:
#shared_preload_libraries = ''
Recommended:
shared_preload_libraries = 'timescaledb'
Is this okay? [(y)es/(n)o]:
shared_preload_libraries
accepts a comma separated list of modules as a value, designating which modules PostgreSQL should load before starting the database server. Making this change will add the timescaledb
module to that list.
shared_preload_libraries
接受以逗号分隔的模块列表作为值,指定在启动数据库服务器之前PostgreSQL应该加载哪些模块。 进行此更改会将timescaledb
模块添加到该列表中。