这边看,这边瞧_时间在我这边

这边看,这边瞧

If you do much data analysis it won’t be long before you work with data measured over a range of times. When you do see time-series data, you’ll find that time scales and time units have some very quirky properties.

如果您要进行大量数据分析,那么很快就可以处理经过一段时间测量的数据。 当您看到时间序列数据时,您会发现时间刻度和时间单位具有一些非常古怪的属性。

一次又一次 (Time after Time)

You might think that time is measured on a ratio scale given its ever finer divisions (i.e., hours, minutes, seconds). Yet it doesn’t make sense to refer to a ratio of two times any more than the ratio of two location coordinates. The starting point is also arbitrary. So time clearly isn’t measured on a ratio scale but it can be measured on interval or ordinal scales. Time units are also used for durations, however durations can be measured on a ratio scale. Durations can be used in ratios and they have a starting point of zero, but no negative numbers.

您可能会认为时间是用比例尺来衡量的,因为它的划分越来越精细(例如,小时,分钟,秒)。 但是,引用比两个位置坐标的比率多两倍的比率是没有意义的。 起点也是任意的。 因此时间显然不是用比例尺来衡量的,而是可以用间隔或序数来衡量的。 时间单位也用于持续时间,但是持续时间可以按比例尺进行测量。 持续时间可以按比例使用,并且起点为零,但没有负数。

Time measurements can be linear or cyclic. Year is linear, and can be measured on either an interval scale or an ordinal scale. For example, the year 1953 can be expressed as an integer (ordinal scale) or a decimal (interval scale). Furthermore, all values of linear time are unique. The year 1953 happened once and will never recur. Linear time is like a river. You start at some point and go with the flow. You can’t get back to your starting point, but it still exists somewhere in time.

时间测量可以是线性的也可以是循环的。 Year是线性的,可以间隔间隔或序数尺度进行度量。 例如,1953年可以表示为整数(常规比例)或十进制(间隔比例)。 此外,线性时间的所有值都是唯一的。 1953年发生过一次,将永远不会再发生。 线性时间就像一条河。 您从某个点开始,然后顺其自然。 您无法回到起点,但它仍然存在于某个时间。

Some time scales repeat. If day one is a Monday, then so is day eight. Likewise, month one is the same as month thirteen. So time can also be treated as being measured on a repeating ordinal scale. Durations don’t repeat; one day isn’t the same as eight days.

重复一些时间尺度。 如果第一天是星期一,那么第八天也是如此。 同样,第一个月与十三个月相同。 因此,也可以将时间视为按重复序数尺度进行测量。 持续时间不重复; 一天不等于八天。

Image for post
StarsApart, licensed under StarsApart授权,根据 CC BY 2.0 CC BY 2.0许可

有人真的知道现在几点吗?(Does Anybody Really Know What Time It Is?)

Most measurement scales are based on factors of ten. With time, though, there are 60 seconds per minute, 60 minutes per hour, and 24 hours per day. Blame the Babylonians for starting this craziness and every civilization for the next 4,000 years for being content with the status quo. In contrast, calendars have evolved from the Hellenic calendar (~850 BC), the Roman calendar (~750 BC), the Julian calendar (46 BC), to the Gregorian calendar (1582).

大多数测量标度基于十个因子。 但是,随着时间的流逝,每分钟60秒,每小时60分钟和每天24小时。 责备巴比伦人发动了这种疯狂行为,并在接下来的4000年里对每个文明都满足于现状。 相比之下,日历已从希腊日历(〜850 BC),罗马日历(〜750 BC),朱利安日历(46 BC)演变成公历(1582)。

Everybody knows about seconds, minutes, hours, days, months, years, and even decades, centuries, and millennia, but there are many other units used for time. A jiffy is either one tick of a computer’s system clock (about 0.01 second) or the time required for light to travel one centimeter (about 33.3564 picoseconds). A New York second is the time between when a traffic signal turns from red to green and when the driver behind you honks his horn, about a second and a half. An inna minute is the time between when you ask a teenager to do something and the time he or she complies, usually about ten to thirty minutes. A warhol is being famous for fifteen minutes; a kilowarhol is being famous for approximately ten days. A moment is a medieval unit of time equal to about a minute and a half. A fortnight is two weeks. A platonic year is an astronomical unit measuring the time required for planets to align (about 26,000 calendar years).

每个人都知道几秒钟,几分钟,几小时,几天,几月,几年,甚至几十年,几百年和一千年,但是还有许多其他单位用于时间。 抖动是计算机系统时钟的一滴答声(大约0.01秒),或者是光传播一厘米所需的时间(大约33.3564皮秒)。 纽约秒是交通信号灯从红色变为绿色到您身后的驾驶员鸣喇叭之间的时间,大约一秒钟半。 Inna分钟是指您要求青少年做某事与他/她顺从之间的时间,通常为十到三十分钟。 沃霍尔闻名十五分钟; 一千公斤保释金闻名十天。 瞬间是中世纪的时间单位,大约等于一分半钟。 两周是两个星期。 柏拉图年份是一个天文单位,它测量行星对齐所需的时间(约26,000个日历年)。

There have been several systems in which time units were based on factors of ten, most notably by the Chinese (before the 17th century) and in France (during the 18th century). Decimal time divided a day (i.e., one rotation of the earth) into 10 metric hours, each hour into 100 metric minutes, and each minute into 100 metric seconds, sometimes termed a blink. A blink is 0.864 standard second, which is about twice the time it takes for you to blink your eye.

在几种系统中,时间单位是基于十个因子的,其中最著名的是中国人(17世纪之前)和法国(18世纪期间)。 小数时间将一天(即地球旋转一圈)划分为10公制小时,每小时划分为100公制分钟,每分钟划分为100公制秒,有时称为眨眼。 眨眼时间为0.864标准秒,大约是眨眼时间的两倍。

Image for post
iris[ux], licensed under iris [ux]授权,根据 CC BY-NC-ND 2.0 CC BY-NC-ND 2.0许可

Then there’s geologic time, which is subdivided into eon, eras, periods, epochs, and ages. The divisions are based on the rocks that were formed at the time and the fossils that occur within them. Consequently, the divisions aren’t all the same lengths and there aren’t the same numbers of subdivisions in each division. For example, the Paleozoic era is twice as long as the Mesozoic era, and four times longer than the Cenozoic era (which admittedly, is still in progress). Likewise, some periods are four times longer than others. Moreover, the lengths of the divisions can change as more is learned about the history of the Earth. The units of the scale are also different in different parts of the world. Geologic time is an ordinal scale devised because measurements of the interval scale on which it is based (i.e., years) lacks accuracy and precision. Geologic time uses perhaps the most unusual scale of any measurement. Geologists have a penchant for unusual scales, like mineral hardness, diamond grading, and rock structures.

然后是地质时间,细分为时代,时代,时期,时代和时代。 这些划分基于当时形成的岩石和其中发生的化石。 因此,划分的长度不尽相同,并且每个划分中的细分数也不相同。 例如,古生代时代是中生代时代的两倍,是新生代时代(公认的仍在进行中)的四倍。 同样,某些时期比其他时期长四倍。 此外,随着对地球历史的了解越来越多,分裂的长度可能会改变。 规模的单位在世界不同地区也不同。 地质时间是设计的序数尺度,因为它所基于的间隔尺度(即年)的测量缺乏准确性和准确性。 地质时间可能使用任何测量中最不寻常的标度。 地质学家喜欢异常的尺度,例如矿物硬度钻石分级岩石结构

Astronomical time is confusing, relatively, and it’s different if you’re on board the Enterprise or the Galactica. So the point is this — measuring time is complicated, not to mention time-consuming. But there’s even more to it than that.

相对而言天文时间令人困惑,如果您在企业版或Galactica上,则时间有所不同。 因此,关键是-测量时间很复杂,更不用说费时了。 但是,还有更多的东西。

Image for post

季节的时间 (Time Of The Season)

Selecting an appropriate time scale is especially important because the scale can dictate the resolution and types of analyses that can be done. Resolution is an important matter. Select an interval that is too small and your database may become unmanageably large. Select an interval that is too large and you may not have enough resolution to investigate the time unit you are interested in. A good rule-of-thumb is to select an interval that is at least one time unit smaller than your unit of interest. For example, if you are interested in yearly trends, collect measurements every month. If you only collect measurements yearly, you won’t be able to assess the variability that occurs within a year. If you collect measurements more often than daily, you may have to rollup the data to make it manageable.

选择合适的时间范围特别重要,因为时间范围可以决定解决方案的分辨率和类型。 解决是重要的事情。 选择一个太小的间隔,数据库可能变得难以管理。 选择一个太大的间隔,您可能没有足够的分辨率来研究您感兴趣的时间单位。一个好的经验法则是选择一个比您感兴趣的单位小至少一个时间单位的时间间隔。 例如,如果您对年度趋势感兴趣,请每月收集一次测量。 如果仅每年收集一次测量,则将无法评估一年内发生的变化。 如果您比每天收集更多的测量值,则可能必须汇总数据以使其易于管理。

慢慢来 (Take Your Time)

Time formats can be difficult to deal with. Most data analysis software offer a dozen or more different formats for what you see. Behind the spreadsheet format, though, the database has a number, which is the distance the time is from an arbitrary starting point, in an arbitrary unit of time, almost always days. Convert a date-time format to a number format, and you’ll see what I mean. The software formatting allows you to recognize values as times while the numbers allow the software to calculate statistics. This quirk of time formatting also presents a potential for disaster if you use more than one piece of software, which use different starting points or time units. Always check that the formatted dates are the same between applications.

时间格式可能很难处理。 大多数数据分析软件会为您提供十二种或更多种不同的格式 但是,在电子表格格式的背后,数据库有一个数字,即时间到任意起点的距离,以任意时间单位表示,几乎总是几天。 将日期时间格式转换为数字格式,您将明白我的意思。 软件格式允许您将值识别为时间,而数字则允许软件计算统计信息。 如果您使用多个软件,这些软件使用不同的起点或时间单位,那么这种时间格式设置的怪癖也可能带来灾难。 始终检查应用程序之间的格式化日期是否相同。

Image for post

时间会证明一切 (Time Will Tell)

Time-series data are probably the most difficult type of data to analyze. Measurements involving time are usually autocorrelated, so using conventional statistical procedures can produce biased results. Besides their scale of measurement, there are several other aspects of temporal variables that add to the confusion.

时间序列数据可能是最难分析的数据类型。 通常,涉及时间的度量是自相关的,因此使用常规统计程序可能会产生偏差的结果。 除了它们的度量范围外,还有其他一些时间变量方面,这增加了混乱。

  • Ch-Ch-Ch-Ch-Changes — Time-series data can exhibit a variety of patterns, including step changes, linear and nonlinear trends, and cyclic fluctuations. The effects may be superimposed on each other within a given time period or spread over many different time periods. For example, a change in the discharge of a river may be attributable to abrupt and ephemeral causes such as failure of a dam or a sudden downpour (shocks), abrupt and long-term causes such as natural changes in a drainage way or a man-made diversion (step changes), long-term causes such as drought or changes in water consumption (trends), repetitive changes such as seasonal cycles related to rainfall or irrigation (cyclic fluctuations) as well as random variations. Confounded effects are often impossible to separate, especially if the data record is short or the sampled intervals are irregular or too large.

    Ch-Ch-Ch-Ch-Changes-时间序列数据可以表现出多种模式,包括阶跃变化,线性和非线性趋势以及周期性波动。 这些效果可以在给定的时间段内相互叠加,也可以分布在许多不同的时间段内。 例如,河流流量的变化可能归因于突然和短暂的原因,例如大坝的故障或突然的倾盆大雨(冲击),突然而长期的原因,例如排水方式或人的自然变化分流(阶跃变化),长期原因(例如干旱或耗水量(趋势)),重复性变化(例如与降雨或灌溉相关的季节性周期(周期性波动)以及随机变化)。 混淆的效果通常是无法分离的,尤其是在数据记录较短或采样间隔不规则或太大的情况下。

Image for post
  • One Day at a Time — Time-series measurements may not all be collected at a single instant in time. Some measurements are composites over time. For example, a flow measurement (e.g., stream, air) may be an instantaneous discharge or a total discharge over a selected time period. A sample may be collected at one time or be a composite of several samples collected at discrete time intervals and combined into a single sample container. The period over which each measurement is averaged is called the support. Obviously, you can’t evaluate a given time interval if your support is the same or larger than the interval.

    一次一天-时间序列的测量值可能不会在一个瞬间立即全部收集。 有些测量是随着时间的推移而形成的。 例如,流量测量值(例如,气流,空气)可以是选定时间段内的瞬时排放或总排放。 样品可以一次采集,也可以是几个离散时间间隔采集的样品的合成,然后合并成一个样品容器。 平均每个测量值的时间称为支撑。 显然,如果您的支持等于或大于该时间间隔,则无法评估给定的时间间隔。

  • For the Times They Are a Changing — There is a dilemma involving time-series that are measured over many years. It goes like this. As knowledge and technology improve, the greater the chance that there will be improvements in sampling and analysis procedures that will reduce the overall variability of more recent measurements. That leads to violations of one of the fundamental assumption of parametric statistical procedures, equality of variances (also called homoscedasticity). Sometimes, you just can’t win.

    对于时代而言,它们是不断变化的—存在一个困扰多年的时间序列的难题。 就像这样随着知识和技术的进步,采样和分析程序得到改进的可能性就越大,这将减少最近测量值的总体可变性。 这会导致违反参数统计程序的基本假设之一,即方差相等(也称为均方差)。 有时候,你就是赢不了。

  • In the Year 2525 … — With most types of analysis, both statistical and deterministic, data analysts collect data over the entire range of the area of interest. If you want to analyze a chemical reaction at 100 degrees, you might analyze the reaction at temperatures between 80 degrees and 120 degrees. You wouldn’t, however, test the reaction at 40 to 80 degrees and extrapolate to what might happen at 100 degrees. In fact, scientists are taught never to extrapolate outside the range of their data. With time-series data, though, you have to extrapolate because you almost always want to know what will happen in the future. If you wait to see what actually happens, then it’s no longer interesting because it’s the past. And in the ultimate of ironies, you often can extrapolate time-series data because they are … autocorrelated. So the same property that makes time-series data difficult to analyze is what allows them to be extrapolated to future times, a process called forecasting. Mother Nature has a wicked sense of humor.

    在2525年…… –通过大多数类型的分析,包括统计分析和确定性分析,数据分析师都可以收集感兴趣区域整个范围内的数据。 如果要在100度下分析化学React,则可以在80度至120度之间的温度下分析React。 但是,您不会在40至80度下测试React并推断到100度时可能发生的情况。 实际上,教给科学家绝不要在其数据范围之外进行推断。 但是,对于时序数据,您必须进行推断,因为您几乎总是想知道将来会发生什么。 如果您等待观察实际发生的情况,那么它就不再是有趣的了,因为它已经过去了。 具有讽刺意味的是,您经常可以推断时间序列数据,因为它们是…自动相关的。 因此,使时序数据难以分析的相同属性是使它们能够外推到未来时间的过程,这个过程称为Forecasting 。 大自然母亲具有邪恶的幽默感。

  • Time Keeps on Slipping into the Future — With other types of data, even autocorrelated spatial data, you can verify predictions whenever the need arises. With predictions for a time-series, forecasts, you have to wait until the time in question arrives. Then you have just one chance. You can’t go back if something goes wrong and you miss collecting the verification data. Hence, you can’t control verification.

    时间不断流向未来-使用其他类型的数据,甚至是自动相关的空间数据,您都可以在需要时验证预测。 对于时间序列的预测(预报) ,您必须等到所讨论的时间到来。 然后,您只有一次机会。 如果出现问题,您将无法返回,并且错过了收集验证数据的机会。 因此,您无法控制验证

So those are a few points about how time is measured and analyzed. There’s more to it than that, but I’ll save those thoughts for another time.

因此,这些是关于如何测量和分析时间的几点。 除此之外,还有更多,但我会再将这些想法保留下来

翻译自: https://medium.com/swlh/time-is-on-my-side-5828e68b4e00

这边看,这边瞧

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值