t–sql pl–sql_SQL Server –在T-SQL中使用最低的度量单位

t–sql pl–sql

A client recently discovered a discrepancy on one of our reports that showed an improvement in performance metrics but was inaccurate. Our reports came from a software tool, which showed the average performance throughout the day. It derived this number from periodic checks and the frequency changed, which affected our report. When we showed an improvement in the metrics on a report, the client showed us that the frequency change may have impacted this, not necessarily any improvement in performance. In situations where we’re measuring values and comparing them to other values, how can we prevent a change in measurement from impacting our reports?

一位客户最近发现我们的一份报告存在差异,该差异表明绩效指标有所提高,但不准确。 我们的报告来自一个软件工具,该工具显示了一天的平均效果。 它是通过定期检查得出此数字的,并且更改的频率也影响了我们的报告。 当我们在报告中显示度量指标有所改善时,客户向我们展示了频率变化可能会对此产生影响,而未必会对性能有所改善。 在我们正在测量值并将其与其他值进行比较的情况下,如何防止测量值的变化影响我们的报告?

总览 (Overview)

Data providers may change their measurements for a variety of reasons and these changes can impact reports. Consider a data provider showing an oil well producing an average amount of oil every hour, then changing the report to every day (but consider the well may not be active around the clock). These may not be comparable time frames with derived measures, or we may be able to use a 24-hour cycle to drive the per-hour production.

数据提供者可能出于各种原因而更改其度量,而这些更改可能会影响报告。 考虑一个数据提供者,该数据提供者显示一口油井每小时生产平均量的油,然后将报告更改为每天(但要考虑该油井可能不会全天候运行)。 这些时间可能无法与衍生指标相媲美的时间范围,或者我们可以使用24小时周期来驱动每小时的产量。

In some of these situations where metrics change, the data provider will send an algorithm which automatically adjusts or calculates the change so that we can compare past data to new data. In this article, we’ll look at the situations where we must derive this for ourselves and how we can prevent this. We’ll look at two examples where we can see the impact of this on a report:

在某些情况下,指标会发生变化,数据提供者将发送一种算法,该算法会自动调整或计算变化,以便我们可以将过去的数据与新的数据进行比较。 在本文中,我们将探讨必须自己得出这一点的情况以及如何防止这种情况。 我们将看两个示例,从中可以看出这对报告的影响:

  • Metric example

    指标示例
  • Inflation example

    通货膨胀的例子

识别单元的一小部分 (Identifying a smaller part of the unit)

Our best technique to prevent this will involve identifying the smallest part of the whole unit for measurement. ETL developers may use the unit as a whole, such as total performance metric per day or price per house in an area and these measurements may be impacted if the underlying data changes formats or if other measurements are impacted. In the case of homes, if a person adds to a living space, the price may be impacted, but other measurements like rooms, bathrooms, etc may not be other than square foot. Along with total price, we may want to include the price-per-square-foot. In the same manner, any total performance metrics per day could easily be impacted if anything is removed or added to the system being monitored.

防止这种情况的最佳技术将涉及确定要测量的整个单元的最小部分。 ETL开发人员可以整体使用该单元,例如每天的总性能指标或某个区域中每所房屋的价格,并且如果基础数据更改格式或其他度量受到影响,则这些度量可能会受到影响。 就房屋而言,如果一个人增加了居住空间,则价格可能会受到影响,但其他度量标准(例如房间,浴室等)可能不是平方英尺。 除了总价格外,我们可能还希望包括每平方英尺的价格。 以相同的方式,如果将任何内容删除或添加到要监视的系统中,则每天的总性能指标很容易受到影响。

In the below two examples, we see the first table with four inserts (two inserts begin newer data for the same product) provides us with no way to obtain the cost per smallest part of the unit (in this case, fluid ounces). With this type of architecture, we may not identify that a cost increase has happened since the price per item is the same, even if the underlying amount is less. In the second table, we add two columns – the unit term (fluid ounce) and the total units.

在下面的两个示例中,我们看到第一个表包含四个插入物(两个插入物开始显示同一产品的新数据)使我们无法获得单位最小部分的成本(在这种情况下为流体盎司)。 对于这种类型的体系结构,即使基础数量更少,我们也可能无法确定成本增加是因为每件产品的价格相同。 在第二个表中,我们添加了两列–单位项(流体盎司)和总单位。

We could have used other unit terms, provided that they allowed us to measure the total amount by a small unit which allows us to compare when data changes. Using a unit like 32 fluid ounce bottle is not a smaller unit for comparison if a company changes the bottle size (and this happens). A fluid ounce, by contrast, can be derived by any size, regardless of the measuring system. The same applies to a gallon versus two liters; if the measuring unit changes, we can still use a smaller part of the total for price comparison. In the second table, we then run a query to look at the price per fluid ounce – the smaller unit.

我们可以使用其他单位术语,只要它们允许我们用一个小的单位来衡量总额,这可以让我们在数据更改时进行比较。 如果公司更改了瓶的大小(这种情况会发生),则使用32盎司液体瓶这样的设备进行比较并不是一个较小的单位。 相比之下,无论采用何种测量系统,都可以通过任何大小得出液体盎司。 加仑与两升的对比也一样; 如果计量单位发生变化,我们仍然可以使用总数的一小部分进行价格比较。 在第二张表中,我们然后运行查询以查看每液体盎司的价格-较小的单位。

CREATE TABLE tbStageProcessB_ExA(
	ItemID SMALLINT,
	Item VARCHAR(25),
	Details VARCHAR(50),
	ItemPrice DECIMAL(13,5),
)
 
--- Old values:
INSERT INTO tbStageProcessB_ExA
VALUES (1,'Bottled Water','32 count 20floz bottled water',4.59)
	, (2,'Water','1 gallon water',0.99)
 
--- New values:
INSERT INTO tbStageProcessB_ExA
VALUES (1,'Bottled Water','32 count 12floz bottled water',4.59) --- bottled water changes from 20 fl oz to 12 fl oz
	, (2,'Water','2 litre water',0.99) --- unit changes completely
 
 
CREATE TABLE tbStageProcessB_ExB(
	ItemID SMALLINT,
	Item VARCHAR(25),
	Details VARCHAR(50),
	ItemPrice DECIMAL(13,5),
	UnitTerm VARCHAR(25),
	Units DECIMAL(9,2)
)
 
--- Old values:
INSERT INTO tbStageProcessB_ExB
VALUES (1,'Bottled Water','32 count 20floz bottled water',4.59,'fl oz',640) -- units: (32x20)
	, (2,'Water','1 gallon water',0.99,'fl oz',128)
 
--- New values:
INSERT INTO tbStageProcessB_ExB
VALUES (1,'Bottled Water','32 count 12floz bottled water',4.59,'fl oz',384) -- units: (32x12)
	, (2,'Water','2 litre water',0.99,'fl oz',67.6)
 
SELECT 
	Item
	, ItemPrice/Units PricePerUnit
FROM tbStageProcessB_ExB

In some cases, a smaller unit is provided while in other cases, we must derive it. In dealing with real estate data, as an example, I’ve often seen price per square foot included as a measurement, but there are times where this must be added. Using a similar example as the above example, if a homeowner expands a room (not adds a new room) and raises the price based on the price per square foot, we would see why in the total square footage, not a number of rooms.

在某些情况下,提供的单位较小,而在其他情况下,我们必须导出它。 举例来说,在处理房地产数据时,我经常看到每平方英尺的价格作为衡量标准,但有时必须加价。 使用与上述示例类似的示例,如果房主扩展房间(不添加新房间)并基于每平方英尺的价格提高价格,我们将看到为什么在总平方英尺中而不是房间数中。

What about situations where we have a daily aggregate without details and the measurement timeframe changes? In the below example, we aggregate disk usage over an eleven-hour cycle, yet in the new data, we see only a daily aggregate of disk usage where daily would be considered the ten hours only. In these cases, we have a few options – one of which is to track our own measurements. In the below example, we’ll look at the percent of usage by each hour, then re-derive this from the total.

如果我们每天都没有详细信息,并且测量时间框架发生变化,该怎么办? 在下面的示例中,我们汇总了一个11小时周期内的磁盘使用情况,但是在新数据中,我们仅看到了每日的磁盘使用情况汇总,其中每天被视为仅十个小时。 在这些情况下,我们有几种选择–其中一种是跟踪我们自己的测量。 在下面的示例中,我们将查看每小时的使用百分比,然后从总数中重新得出该百分比。

--- Get the numeric daily average
DECLARE @ndavg BIGINT
;WITH GetAvg AS(
	SELECT DATEPART(DD,OthnDate) OthnDay, SUM(OthnMetric) OthnDayTotal 
	FROM tblDrive 
	GROUP BY DATEPART(DD,OthnDate)
)
SELECT @ndavg = AVG(OthnDayTotal)
FROM GetAvg
 
---- Report on our percentages (we'll save this in the next step):
SELECT
	OthnHour
	, (AVG(OthnMetric)/CAST(@ndavg AS DECIMAL(17,8))) PercentDiskUse
FROM tblDrive
GROUP BY OthnHour

DECLARE @ndavg BIGINT
;WITH GetAvg AS(
	SELECT DATEPART(DD,OthnDate) OthnDay, SUM(OthnMetric) OthnDayTotal 
	FROM tblDrive 
	GROUP BY DATEPART(DD,OthnDate)
)
SELECT @ndavg = AVG(OthnDayTotal)
FROM GetAvg
 
--- Save template:
SELECT
	OthnHour
	, (AVG(OthnMetric)/CAST(@ndavg AS DECIMAL(17,8))) PercentDiskUse
INTO tbPercentFormat
FROM tblDrive
GROUP BY OthnHour

We create this table format for deriving the new data that we now receive daily instead of hourly. In the below example of a variable, we have a new daily measurement that we derive hourly data from our format table:

我们创建此表格格式是为了导出现在每天而不是每小时接收的新数据。 在下面的变量示例中,我们有一个新的每日度量,我们从格式表中获取每小时的数据:

--- Measurement changes with daily value (new value):
DECLARE @newmetricmeasurement BIGINT = 2348713
SELECT 
	OthnHour
	, CAST((PercentDiskUse*@newmetricmeasurement) AS BIGINT) OthnMetric
FROM tbPercentFormat

If nothing has changed in the system we’re measuring and we have a history of metrics that allow us to get an accurate measure of the past use, this may be a helpful measure to compare older to newer data. We’re assuming in the above example that the percent used in each hour will be the same for the new totals, which measure the total usage by day (a day is assumed to be 11 hours here). We can only assume this because nothing has changed in our system and we have enough aggregate data for each hour of usage in our old data. Even with these two points being true, these data by hour may not be correct.

如果我们要测量的系统没有任何变化,并且我们拥有可以使我们对过去使用情况进行准确度量的度量标准历史记录,那么这可能是比较旧数据与新数据的有用度量。 在上面的示例中,我们假设每小时使用的百分比对于新的总计而言是相同的,新的总计将按天衡量总使用量(此处假设一天为11个小时)。 我们只能假设这是因为系统中没有任何变化,而且对于旧数据的每个小时使用,我们都有足够的聚合数据。 即使这两个点是正确的,按小时计算的这些数据可能也不正确。

In the cases where a tool changed its measuring system, I contacted the company and discussed the appropriate way to compare old to new data and they were able to provide an algorithm for this. These situations also proved the value of sometimes developing an independent tool since measurements can change. If I have to change a tool’s measurements which will affect new and old data for comparisons, I always begin the tool with how to derive comparable data from new to old.

在工具更改其测量系统的情况下,我与公司联系并讨论了将旧数据与新数据进行比较的合适方法,他们能够为此提供一种算法。 这些情况也证明了有时开发独立工具的价值,因为测量可以改变。 如果必须更改工具的度量标准以影响新旧数据进行比较,则我总是以如何从新旧数据中得出可比较数据的方式来开始使用该工具。

Unfortunately, metrics measurements can change by time, amount or another measure. Sometimes, we do not receive an algorithm which allows us to compare old data to new. While this gives us an idea of a possible solution, we would want to ensure that it’s compatible with our system, such as no new application changes, ETL loaders being added, etc.

不幸的是,度量标准度量可以随时间,数量或其他度量而改变。 有时,我们没有收到允许我们将旧数据与新数据进行比较的算法。 尽管这为我们提供了一个可能的解决方案的想法,但我们希望确保它与我们的系统兼容,例如不进行任何新的应用程序更改,添加ETL加载程序等。

结论 (Conclusion)

We will see this challenge in data periodically and if we use small measures, or add small measures to our calculations, we may be able to avoid complications with clients identifying errors. We should approach using dimensions carefully with this as well, especially if the unit system changes (such as metric to English). We can add a dimension for conversions in these cases, while still using small measurement units. In addition to using a small unit of measurement so we can accurately compare if a change occurs, we should make sure that other factors also remain the same. In the example of metrics, if the change of metric measurements also happened with a change in processes on the server or servers being measured, this would mean we don’t have an accurate comparison.

我们会定期在数据中看到这一挑战,如果我们使用较小的度量标准,或者在计算中添加较小的度量标准,我们也许可以避免客户识别错误带来的麻烦。 我们也应该谨慎使用尺寸,特别是如果单位制发生变化(例如公制变为英制)。 在这种情况下,我们可以为转化添加维度,同时仍然使用较小的计量单位。 除了使用较小的度量单位,以便我们可以准确比较是否发生更改外,我们还应确保其他因素也保持不变。 在指标示例中,如果指标测量值的变化也与被测服务器上的进程变化同时发生,则这意味着我们没有准确的比较。

参考资料 (References)

翻译自: https://www.sqlshack.com/sql-server-using-lowest-unit-of-measurement-in-t-sql/

t–sql pl–sql

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值