事物序列化_大规模测量每件事物m3时间序列简介

最新推荐文章于 2024-02-06 16:36:17 发布

weixin_26728245

最新推荐文章于 2024-02-06 16:36:17 发布

阅读量674

点赞数

文章标签： python

原文链接：https://medium.com/streamthoughts/measuring-every-thing-at-scale-an-introduction-to-time-series-with-m3-a1e8d81465c

版权

事物序列化Today, it’s easy to say that almost everything we do, everything we use, and even everything around us is capable of producing data. But what is even more true, is that this data is produced in r...

摘要由CSDN通过智能技术生成

事物序列化

Today, it’s easy to say that almost everything we do, everything we use, and even everything around us is capable of producing data. But what is even more true, is that this data is produced in real-time to describe something that is happening.

如今，可以很容易地说，我们所做的几乎所有事情，我们使用的所有东西，甚至我们周围的一切都能够产生数据。但是更真实的是，这些数据是实时生成的，用于描述正在发生的事情。

Therefore, it’s logical to think that data must be also harnessed in real-time to be able to extract the most value from it. In addition, and perhaps most importantly, data must be stored and processed with a temporal context to retain its full significance. This is actually the condition necessary to fully understand the context in which something exists or occurred.

因此，逻辑上认为还必须实时利用数据才能从中获取最大价值。另外，也许是最重要的是，必须在时间上下文中存储和处理数据，以保持其全部意义。实际上，这是充分理解存在或发生某事的上下文所必需的条件。

So, let’s take some real-life examples where the temporal context (i.e., time) is an essential part of the meaning of your data:

因此，让我们举一些真实的例子，其中时间上下文(即时间)是数据含义的重要组成部分：

Recording sports performance metrics (i.e. speed, position, heart rate) during a sporting activity through a connected watch.
通过连接的手表记录体育活动过程中的运动表现指标(即速度，位置，心律)。
Measuring atmospheric conditions to provide data for weather forecasts (wind speed, temperature, atmospheric pressure, etc).
测量大气条件以提供天气预报数据(风速，温度，大气压力等)。
Monitoring of a server’s system resource usage.
监视服务器的系统资源使用情况。
Monitoring of a home’s energy consumption.
监视房屋的能源消耗。
Monitoring stock prices, etc.
监视股票价格等

All of these examples have one thing in common: they are all about data that we want to measure over time to monitor their evolution, to detect or predict trends (maybe in correlation with other events), or to alert on thresholds. We more commonly refer to these data as time-series.

所有这些示例有一个共同点：它们都是关于我们要随时间测量以监视其演变，检测或预测趋势(可能与其他事件相关)或警告阈值的数据。我们通常将这些数据称为时间序列。

The explosion of the IoT (Internet of Things) in recent years has greatly accelerated the need to be able to efficiently store and analyze this data, which most often means millions of new metrics produced every second.

物联网(爆炸物联网)近年来大大加快了需要能够有效地存储和分析数据，其中大部分往往意味着数以百万计的新的标准生产的每一秒。

什么是时间序列，什么是时间序列数据库(TSDB)？ (What is a time-series and what is a time-series database (TSDB) ?)

Time-series are sequences of numeric data points that are generated in successive order. Each data point represents a measure (also called a metric). Each metric has a name, a timestamp, and usually one or more labels that describe the actual object being measured.

时间序列是按连续顺序生成的数字数据点的序列。每个数据点代表一个度量(也称为度量)。每个度量标准都有一个名称，一个时间戳记，通常还有一个或多个描述实际测量对象的标签。

To store such data we could perfectly use a traditional relational database (such as PostgreSQL) and create a simple SQL table like this :

为了存储此类数据，我们可以完美地使用传统的关系数据库(例如PostgreSQL)并创建一个简单SQL表，如下所示：

CREATE TABLE timeseries (
 metric_name TEXT NOT NULL,
 metric_ts timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP,
 value double precision NOT NULL,
 labels json,
 PRIMARY KEY(metric_name, metric_ts) 
 );

And, for example, to query and aggregate every point from now to the last 10 minutes we could use a SQL query similar to :

并且，例如，要查询和汇总从现在到最后10分钟的每个点，我们可以使用类似于以下内容SQL查询：

SELECT avg(value) FROM timeseries WHERE metric_name = ‘heart_rate_bpm’ AND metric_ts >= NOW() — INTERVAL ’10 minutes’;

However, this solution would not be really effective for data-intensive applications and long-term use. And sooner or later we would probably be limited by :

但是，此解决方案对于数据密集型应用程序和长期使用而言并不会真正有效。迟早我们可能会受到以下限制：

The horizontal scalability capabilities, whether for long-term storage, resiliency, or multi-region deployment needs.
水平可伸缩性功能，无论是针对长期存储，弹性还是多区域部署需求。
The ability to massively insert millions of metrics per second (most relational databases are based on B-TREE index structures).
每秒大量插入数百万个指标的能力(大多数关系数据库基于B-TREE索引结构)。
The ability to automatically roll-up data over time. For example, to aggregate all metrics from the previous month into 5-minute points).
随着时间的推移自动汇总数据的能力。例如，将上个月的所有指标汇总为5分钟)。

Also, there are likely hotspots when inserting very high throughput measurements. This can lead to poor performance, depending on the type of index used by the database, due to concurrent accesses.

此外，插入非常高的吞吐量测量值时可能会出现热点。由于并发访问，这可能导致性能下降，具体取决于数据库使用的索引类型。

For all of these reasons, it‘s usually preferable to use solutions that are specifically designed to enable efficient storage and querying of this kind of data. These solutions are called time-series databases (TSDB).

由于所有这些原因，通常最好使用专门设计的解决方案，以实现对此类数据的有效存储和查询。这些解决方案称为时间序列数据库(TSDB)。

Below are some of the most known TSDB :

以下是一些最著名的TSDB：

InfluxDB

最低0.47元/天解锁文章

weixin_26728245

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
事物序列化_大规模测量每件事物m3时间序列简介

事物序列化Today, it’s easy to say that almost everything we do, everything we use, and even everything around us is capable of producing data. But what is even more true, is that this data is produced in r...
复制链接

扫一扫