0604笔记2——pandas时序的文档

本文深入探讨了Pandas库在处理时间序列数据方面的强大能力,包括解析多种格式的日期时间、生成固定频率的日期序列、时区转换、时间序列重采样以及高效操作周期数据等功能。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Time series / date functionality 时间序列/日期功能¶
pandas contains extensive capabilities and features for working with time series data for all domains. Using the NumPy datetime64 and timedelta64 dtypes, pandas has consolidated a large number of features from other Python libraries like scikits.timeseries as well as created a tremendous amount of new functionality for manipulating time series data.

熊猫具有广泛的能力和功能,可以处理所有领域的时间序列数据。通过使用 NumPy datetime64和 timedelta64 dtypes,pandas 整合了来自其他 Python 库如 scikits.timeseries 的大量特性,并创建了大量处理时间序列数据的新功能。

For example, pandas supports:

例如,熊猫支持:

Parsing time series information from various sources and formats

解析来自不同来源和格式的时间序列信息

In [1]: import datetime

In [2]: dti = pd.to_datetime(
…: [“1/1/2018”, np.datetime64(“2018-01-01”), datetime.datetime(2018, 1, 1)]
…: )
…:

In [3]: dti
Out[3]: DatetimeIndex([‘2018-01-01’, ‘2018-01-01’, ‘2018-01-01’], dtype=‘datetime64[ns]’, freq=None)
Generate sequences of fixed-frequency dates and time spans

生成固定频率的日期和时间跨度序列

In [4]: dti = pd.date_range(“2018-01-01”, periods=3, freq=“H”)

In [5]: dti
Out[5]:
DatetimeIndex([‘2018-01-01 00:00:00’, ‘2018-01-01 01:00:00’,
‘2018-01-01 02:00:00’],
dtype=‘datetime64[ns]’, freq=‘H’)
Manipulating and converting date times with timezone information

使用时区信息操作和转换日期时间

In [6]: dti = dti.tz_localize(“UTC”)

In [7]: dti
Out[7]:
DatetimeIndex([‘2018-01-01 00:00:00+00:00’, ‘2018-01-01 01:00:00+00:00’,
‘2018-01-01 02:00:00+00:00’],
dtype=‘datetime64[ns, UTC]’, freq=‘H’)

In [8]: dti.tz_convert(“US/Pacific”)
Out[8]:
DatetimeIndex([‘2017-12-31 16:00:00-08:00’, ‘2017-12-31 17:00:00-08:00’,
‘2017-12-31 18:00:00-08:00’],
dtype=‘datetime64[ns, US/Pacific]’, freq=‘H’)
Resampling or converting a time series to a particular frequency

重采样或将时间序列转换为特定频率

In [9]: idx = pd.date_range(“2018-01-01”, periods=5, freq=“H”)

In [10]: ts = pd.Series(range(len(idx)), index=idx)

In [11]: ts
Out[11]:
2018-01-01 00:00:00 0
2018-01-01 01:00:00 1
2018-01-01 02:00:00 2
2018-01-01 03:00:00 3
2018-01-01 04:00:00 4
Freq: H, dtype: int64

In [12]: ts.resample(“2H”).mean()
Out[12]:
2018-01-01 00:00:00 0.5
2018-01-01 02:00:00 2.5
2018-01-01 04:00:00 4.0
Freq: 2H, dtype: float64
Performing date and time arithmetic with absolute or relative time increments

以绝对或相对时间增量执行日期和时间算术

In [13]: friday = pd.Timestamp(“2018-01-05”)

In [14]: friday.day_name()
Out[14]: ‘Friday’

Add 1 day

In [15]: saturday = friday + pd.Timedelta(“1 day”)

In [16]: saturday.day_name()
Out[16]: ‘Saturday’

Add 1 business day (Friday --> Monday)

In [17]: monday = friday + pd.offsets.BDay()

In [18]: monday.day_name()
Out[18]: ‘Monday’
pandas provides a relatively compact and self-contained set of tools for performing the above tasks and more.

熊猫提供了一套相对紧凑和自给自足的工具来执行上述任务等等。

Overview 概览
pandas captures 4 general time related concepts:

熊猫抓住了4个与时间相关的概念:

Date times: A specific date and time with timezone support. Similar to datetime.datetime from the standard library.

Date times: 具有时区支持的特定日期和时间。类似于标准库中的 datetime.datetime。

Time deltas: An absolute time duration. Similar to datetime.timedelta from the standard library.

Time deltas: 绝对时间持续时间,类似于标准库中的 datetime.timedelta。

Time spans: A span of time defined by a point in time and its associated frequency.

时间跨度: 由一个时间点及其相关频率所确定的时间跨度。

Date offsets: A relative time duration that respects calendar arithmetic. Similar to dateutil.relativedelta.relativedelta from the dateutil package.

日期偏移量: 一个相对时间持续时间,它遵循日历算法。

Concept

概念

Scalar Class

标量类

Array Class

数组类

pandas Data Type

熊猫数据类型

Primary Creation Method

初步创作方法

Date times

日期时间

Timestamp

时间戳

DatetimeIndex

日期时间索引

datetime64[ns] or datetime64[ns, tz]

Datetime64[ ns ]或 datetime64[ ns,tz ]

to_datetime or date_range

日期时间或日期范围

Time deltas

时间三角洲

Timedelta

TimedeltaIndex

timedelta64[ns]

时代周刊

to_timedelta or timedelta_range

时间增量或时间增量范围

Time spans

时间跨度

Period

句号

PeriodIndex

period[freq]

句号[频率]

Period or period_range

周期或周期范围

Date offsets

日期偏移

DateOffset

日期偏移量

None

没有

None

没有

DateOffset

日期偏移量

For time series data, it’s conventional to represent the time component in the index of a Series or DataFrame so manipulations can be performed with respect to the time element.

对于时间序列数据,通常表示序列或 DataFrame 索引中的时间分量,以便可以针对时间元素执行操作。

In [19]: pd.Series(range(3), index=pd.date_range(“2000”, freq=“D”, periods=3))
Out[19]:
2000-01-01 0
2000-01-02 1
2000-01-03 2
Freq: D, dtype: int64
However, Series and DataFrame can directly also support the time component as data itself.

但是,Series 和 DataFrame 也可以直接支持时间组件作为数据本身。

In [20]: pd.Series(pd.date_range(“2000”, freq=“D”, periods=3))
Out[20]:
0 2000-01-01
1 2000-01-02
2 2000-01-03
dtype: datetime64[ns]
Series and DataFrame have extended data type support and functionality for datetime, timedelta and Period data when passed into those constructors. DateOffset data however will be stored as object data.

Series 和 DataFrame 对传递到这些构造函数中的 datetime、 timedelta 和 Period 数据具有扩展数据类型支持和功能。然而,DateOffset 数据将作为对象数据存储。

In [21]: pd.Series(pd.period_range(“1/1/2011”, freq=“M”, periods=3))
Out[21]:
0 2011-01
1 2011-02
2 2011-03
dtype: period[M]

In [22]: pd.Series([pd.DateOffset(1), pd.DateOffset(2)])
Out[22]:
0
1 <2 * DateOffsets>
dtype: object

In [23]: pd.Series(pd.date_range(“1/1/2011”, freq=“M”, periods=3))
Out[23]:
0 2011-01-31
1 2011-02-28
2 2011-03-31
dtype: datetime64[ns]
Lastly, pandas represents null date times, time deltas, and time spans as NaT which is useful for representing missing or null date like values and behaves similar as np.nan does for float data.

最后,pandas 将 null date times、 time deltas 和 time span 表示为 NaT,这对于表示类似于空日期的缺失值或空日期值非常有用,其行为类似于 np.nan 对于 float data 的行为。

In [24]: pd.Timestamp(pd.NaT)
Out[24]: NaT

In [25]: pd.Timedelta(pd.NaT)
Out[25]: NaT

In [26]: pd.Period(pd.NaT)
Out[26]: NaT

Equality acts as np.nan would

In [27]: pd.NaT == pd.NaT
Out[27]: False
Timestamps vs. time spans 时间戳与时间跨度
Timestamped data is the most basic type of time series data that associates values with points in time. For pandas objects it means using the points in time.

时间戳数据是时间序列数据的最基本类型,它将值与时间点关联起来。对于大熊猫来说,这意味着使用时间点。

In [28]: pd.Timestamp(datetime.datetime(2012, 5, 1))
Out[28]: Timestamp(‘2012-05-01 00:00:00’)

In [29]: pd.Timestamp(“2012-05-01”)
Out[29]: Timestamp(‘2012-05-01 00:00:00’)

In [30]: pd.Timestamp(2012, 5, 1)
Out[30]: Timestamp(‘2012-05-01 00:00:00’)
However, in many cases it is more natural to associate things like change variables with a time span instead. The span represented by Period can be specified explicitly, or inferred from datetime string format.

然而,在许多情况下,将诸如变更变量之类的事情与时间跨度相关联更为自然。可以显式指定 Period 表示的跨度,也可以从日期时间字符串格式推断。

For example:

例如:

In [31]: pd.Period(“2011-01”)
Out[31]: Period(‘2011-01’, ‘M’)

In [32]: pd.Period(“2012-05”, freq=“D”)
Out[32]: Period(‘2012-05-01’, ‘D’)
Timestamp and Period can serve as an index. Lists of Timestamp and Period are automatically coerced to DatetimeIndex and PeriodIndex respectively.

时间戳和时间段可以作为索引。Timestamp 列表和 Period 列表分别自动被强制转换为 datetime 索引和 perioddindex。

In [33]: dates = [
…: pd.Timestamp(“2012-05-01”),
…: pd.Timestamp(“2012-05-02”),
…: pd.Timestamp(“2012-05-03”),
…: ]
…:

In [34]: ts = pd.Series(np.random.randn(3), dates)

In [35]: type(ts.index)
Out[35]: pandas.core.indexes.datetimes.DatetimeIndex

In [36]: ts.index
Out[36]: DatetimeIndex([‘2012-05-01’, ‘2012-05-02’, ‘2012-05-03’], dtype=‘datetime64[ns]’, freq=None)

In [37]: ts
Out[37]:
2012-05-01 0.469112
2012-05-02 -0.282863
2012-05-03 -1.509059
dtype: float64

In [38]: periods = [pd.Period(“2012-01”), pd.Period(“2012-02”), pd.Period(“2012-03”)]

In [39]: ts = pd.Series(np.random.randn(3), periods)

In [40]: type(ts.index)
Out[40]: pandas.core.indexes.period.PeriodIndex

In [41]: ts.index
Out[41]: PeriodIndex([‘2012-01’, ‘2012-02’, ‘2012-03’], dtype=‘period[M]’, freq=‘M’)

In [42]: ts
Out[42]:
2012-01 -1.135632
2012-02 1.212112
2012-03 -0.173215
Freq: M, dtype: float64
pandas allows you to capture both representations and convert between them. Under the hood, pandas represents timestamps using instances of Timestamp and sequences of timestamps using instances of DatetimeIndex. For regular time spans, pandas uses Period objects for scalar values and PeriodIndex for sequences of spans. Better support for irregular intervals with arbitrary start and end points are forth-coming in future releases.

熊猫允许你捕捉两种表现形式,并在它们之间进行转换。在本质上,pandas 表示使用 Timestamp 实例的时间戳和使用 datetimindex 实例的时间戳序列。对于常规时间跨度,pandas 对标量值使用 Period 对象,对跨度序列使用 PeriodIndex。更好地支持不规则的间隔和任意的开始点和结束点,这将在未来的版本中出现。

Converting to timestamps 转换为时间戳
To convert a Series or list-like object of date-like objects e.g. strings, epochs, or a mixture, you can use the to_datetime function. When passed a Series, this returns a Series (with the same index), while a list-like is converted to a DatetimeIndex:

若要转换类似日期的对象的 Series 或类似列表的对象,例如字符串、 epochs 或混合物,可以使用 To _ datetime 函数。当传递一个 Series 时,它返回一个 Series (具有相同的索引) ,而一个类似 list 的值被转换为一个 DatetimeIndex:

In [43]: pd.to_datetime(pd.Series([“Jul 31, 2009”, “2010-01-10”, None]))
Out[43]:
0 2009-07-31
1 2010-01-10
2 NaT
dtype: datetime64[ns]

In [44]: pd.to_datetime([“2005/11/23”, “2010.12.31”])
Out[44]: DatetimeIndex([‘2005-11-23’, ‘2010-12-31’], dtype=‘datetime64[ns]’, freq=None)
If you use dates which start with the day first (i.e. European style), you can pass the dayfirst flag:

如果你使用的日期以第一天开始(即欧洲风格) ,你可以通过日先旗:

In [45]: pd.to_datetime([“04-01-2012 10:00”], dayfirst=True)
Out[45]: DatetimeIndex([‘2012-01-04 10:00:00’], dtype=‘datetime64[ns]’, freq=None)

In [46]: pd.to_datetime([“14-01-2012”, “01-14-2012”], dayfirst=True)
Out[46]: DatetimeIndex([‘2012-01-14’, ‘2012-01-14’], dtype=‘datetime64[ns]’, freq=None)
Warning

警告

You see in the above example that dayfirst isn’t strict, so if a date can’t be parsed with the day being first it will be parsed as if dayfirst were False.

您可以在上面的例子中看到,dayfirst 并不严格,因此如果日期不能被解析为 day being first,那么它将被解析,就像 dayfirst 是 False 一样。

If you pass a single string to to_datetime, it returns a single Timestamp. Timestamp can also accept string input, but it doesn’t accept string parsing options like dayfirst or format, so use to_datetime if these are required.

如果将一个字符串传递给 _ datetime,它将返回一个 Timestamp。Timestamp 也可以接受字符串输入,但它不接受诸如 dayfirst 或 format 之类的字符串解析选项,因此,如果需要,可以使用 _ datetime。

In [47]: pd.to_datetime(“2010/11/12”)
Out[47]: Timestamp(‘2010-11-12 00:00:00’)

In [48]: pd.Timestamp(“2010/11/12”)
Out[48]: Timestamp(‘2010-11-12 00:00:00’)
You can also use the DatetimeIndex constructor directly:

您还可以直接使用 DatetimeIndex 构造函数:

In [49]: pd.DatetimeIndex([“2018-01-01”, “2018-01-03”, “2018-01-05”])
Out[49]: DatetimeIndex([‘2018-01-01’, ‘2018-01-03’, ‘2018-01-05’], dtype=‘datetime64[ns]’, freq=None)
The string ‘infer’ can be passed in order to set the frequency of the index as the inferred frequency upon creation:

可以传递字符串“推断”,以便在创建时将索引的频率设置为推断的频率:

In [50]: pd.DatetimeIndex([“2018-01-01”, “2018-01-03”, “2018-01-05”], freq=“infer”)
Out[50]: DatetimeIndex([‘2018-01-01’, ‘2018-01-03’, ‘2018-01-05’], dtype=‘datetime64[ns]’, freq=‘2D’)
Providing a format argument 提供格式参数
In addition to the required datetime string, a format argument can be passed to ensure specific parsing. This could also potentially speed up the conversion considerably.

除了必需的日期时间字符串之外,还可以传递一个 format 参数,以确保进行特定的解析。这也可能极大地加快转换速度。

In [51]: pd.to_datetime(“2010/11/12”, format="%Y/%m/%d")
Out[51]: Timestamp(‘2010-11-12 00:00:00’)

In [52]: pd.to_datetime(“12-11-2010 00:00”, format="%d-%m-%Y %H:%M")
Out[52]: Timestamp(‘2010-11-12 00:00:00’)
For more information on the choices available when specifying the format option, see the Python datetime documentation.

有关在指定格式选项时可用的选项的更多信息,请参见 Python 日期时间文档。

Assembling datetime from multiple DataFrame columns 从多个 DataFrame 列组合日期时间
You can also pass a DataFrame of integer or string columns to assemble into a Series of Timestamps.

您还可以传递整数或字符串列的 DataFrame,以组装成一个时间戳系列。

In [53]: df = pd.DataFrame(
…: {“year”: [2015, 2016], “month”: [2, 3], “day”: [4, 5], “hour”: [2, 3]}
…: )
…:

In [54]: pd.to_datetime(df)
Out[54]:
0 2015-02-04 02:00:00
1 2016-03-05 03:00:00
dtype: datetime64[ns]
You can pass only the columns that you need to assemble.

您只能传递需要组装的列。

In [55]: pd.to_datetime(df[[“year”, “month”, “day”]])
Out[55]:
0 2015-02-04
1 2016-03-05
dtype: datetime64[ns]
pd.to_datetime looks for standard designations of the datetime component in the column names, including:

To _ datetime 在列名中查找 datetime 组件的标准指定,包括:

required: year, month, day

要求: 年,月,日

optional: hour, minute, second, millisecond, microsecond, nanosecond

可选: 小时,分钟,秒,毫秒,微秒,纳秒

Invalid data 无效数据
The default behavior, errors=‘raise’, is to raise when unparsable:

默认行为 errors = ‘ raise’是在不可分析时引发的:

In [2]: pd.to_datetime([‘2009/07/31’, ‘asd’], errors=‘raise’)
ValueError: Unknown string format
Pass errors=‘ignore’ to return the original input when unparsable:

当不可解析时,传递 errors = ‘ ignore’以返回原始输入:

In [56]: pd.to_datetime([“2009/07/31”, “asd”], errors=“ignore”)
Out[56]: Index([‘2009/07/31’, ‘asd’], dtype=‘object’)
Pass errors=‘coerce’ to convert unparsable data to NaT (not a time):

Pass errors = ‘胁迫’将不可解析的数据转换为 NaT (不是一次) :

In [57]: pd.to_datetime([“2009/07/31”, “asd”], errors=“coerce”)
Out[57]: DatetimeIndex([‘2009-07-31’, ‘NaT’], dtype=‘datetime64[ns]’, freq=None)
Epoch timestamps 时间戳
pandas supports converting integer or float epoch times to Timestamp and DatetimeIndex. The default unit is nanoseconds, since that is how Timestamp objects are stored internally. However, epochs are often stored in another unit which can be specified. These are computed from the starting point specified by the origin parameter.

Pandas 支持将整数或浮点纪元时间转换为 Timestamp 和 DatetimeIndex。默认单位是纳秒,因为 Timestamp 对象就是这样在内部存储的。然而,时元常常存储在另一个可以指定的单位中。这些是从原点参数指定的起点计算出来的。

In [58]: pd.to_datetime(
…: [1349720105, 1349806505, 1349892905, 1349979305, 1350065705], unit=“s”
…: )
…:
Out[58]:
DatetimeIndex([‘2012-10-08 18:15:05’, ‘2012-10-09 18:15:05’,
‘2012-10-10 18:15:05’, ‘2012-10-11 18:15:05’,
‘2012-10-12 18:15:05’],
dtype=‘datetime64[ns]’, freq=None)

In [59]: pd.to_datetime(
…: [1349720105100, 1349720105200, 1349720105300, 1349720105400, 1349720105500],
…: unit=“ms”,
…: )
…:
Out[59]:
DatetimeIndex([‘2012-10-08 18:15:05.100000’, ‘2012-10-08 18:15:05.200000’,
‘2012-10-08 18:15:05.300000’, ‘2012-10-08 18:15:05.400000’,
‘2012-10-08 18:15:05.500000’],
dtype=‘datetime64[ns]’, freq=None)
Note

注意

The unit parameter does not use the same strings as the format parameter that was discussed above). The available units are listed on the documentation for pandas.to_datetime().

Unit 参数与上面讨论的 format 参数不使用相同的字符串。可用的单位列在 pandas.to _ datetime ()的文档中。

Changed in version 1.0.0.

在1.0.0版本中更改。

Constructing a Timestamp or DatetimeIndex with an epoch timestamp with the tz argument specified will raise a ValueError. If you have epochs in wall time in another timezone, you can read the epochs as timezone-naive timestamps and then localize to the appropriate timezone:

使用指定的 tz 参数构造带有 epoch 时间戳的 Timestamp 或 DatetimeIndex 将引发 ValueError。如果你在另一个时区有时间纪元,你可以把这个时间纪元当作时区天真的时间戳,然后把它本地化到合适的时区:

In [60]: pd.Timestamp(1262347200000000000).tz_localize(“US/Pacific”)
Out[60]: Timestamp(‘2010-01-01 12:00:00-0800’, tz=‘US/Pacific’)

In [61]: pd.DatetimeIndex([1262347200000000000]).tz_localize(“US/Pacific”)
Out[61]: DatetimeIndex([‘2010-01-01 12:00:00-08:00’], dtype=‘datetime64[ns, US/Pacific]’, freq=None)
Note

注意

Epoch times will be rounded to the nearest nanosecond.

时间将舍入到最接近的纳秒。

Warning

警告

Conversion of float epoch times can lead to inaccurate and unexpected results. Python floats have about 15 digits precision in decimal. Rounding during conversion from float to high precision Timestamp is unavoidable. The only way to achieve exact precision is to use a fixed-width types (e.g. an int64).

浮动历元时间的转换可能导致不准确和意外的结果。Python floats 的十进制精度大约为15位。在从浮点数到高精度时间戳的转换过程中,舍入是不可避免的。实现精确精确的唯一方法是使用固定宽度的类型(例如 int64)。

In [62]: pd.to_datetime([1490195805.433, 1490195805.433502912], unit=“s”)
Out[62]: DatetimeIndex([‘2017-03-22 15:16:45.433000088’, ‘2017-03-22 15:16:45.433502913’], dtype=‘datetime64[ns]’, freq=None)

In [63]: pd.to_datetime(1490195805433502912, unit=“ns”)
Out[63]: Timestamp(‘2017-03-22 15:16:45.433502912’)
See also

参见

Using the origin Parameter

使用原点参数

From timestamps to epoch 从时间戳到纪元
To invert the operation from above, namely, to convert from a Timestamp to a ‘unix’ epoch:

从上面反转操作,即从时间戳转换为 unix 的新纪元:

In [64]: stamps = pd.date_range(“2012-10-08 18:15:05”, periods=4, freq=“D”)

In [65]: stamps
Out[65]:
DatetimeIndex([‘2012-10-08 18:15:05’, ‘2012-10-09 18:15:05’,
‘2012-10-10 18:15:05’, ‘2012-10-11 18:15:05’],
dtype=‘datetime64[ns]’, freq=‘D’)
We subtract the epoch (midnight at January 1, 1970 UTC) and then floor divide by the “unit” (1 second).

我们减去纪元(1970年1月1日午夜) ,然后用地板除以“单位”(1秒)。

In [66]: (stamps - pd.Timestamp(“1970-01-01”)) // pd.Timedelta(“1s”)
Out[66]: Int64Index([1349720105, 1349806505, 1349892905, 1349979305], dtype=‘int64’)
Using the 使用origin Parameter 参数
Using the origin parameter, one can specify an alternative starting point for creation of a DatetimeIndex. For example, to use 1960-01-01 as the starting date:

使用 origin 参数,可以指定创建 DatetimeIndex 的可选起点。例如,使用1960-01-01作为起始日期:

In [67]: pd.to_datetime([1, 2, 3], unit=“D”, origin=pd.Timestamp(“1960-01-01”))
Out[67]: DatetimeIndex([‘1960-01-02’, ‘1960-01-03’, ‘1960-01-04’], dtype=‘datetime64[ns]’, freq=None)
The default is set at origin=‘unix’, which defaults to 1970-01-01 00:00:00. Commonly called ‘unix epoch’ or POSIX time.

缺省值设置为 origin = ‘ unix’,缺省值为1970-01-0100:00:00,通常称为‘ unix epoch’或 POSIX time。

In [68]: pd.to_datetime([1, 2, 3], unit=“D”)
Out[68]: DatetimeIndex([‘1970-01-02’, ‘1970-01-03’, ‘1970-01-04’], dtype=‘datetime64[ns]’, freq=None)
Generating ranges of timestamps 时间戳的生成范围
To generate an index with timestamps, you can use either the DatetimeIndex or Index constructor and pass in a list of datetime objects:

要生成带有时间戳的索引,可以使用 DatetimeIndex 或 Index 构造函数,并传递一个日期时间对象列表:

In [69]: dates = [
…: datetime.datetime(2012, 5, 1),
…: datetime.datetime(2012, 5, 2),
…: datetime.datetime(2012, 5, 3),
…: ]
…:

Note the frequency information

In [70]: index = pd.DatetimeIndex(dates)

In [71]: index
Out[71]: DatetimeIndex([‘2012-05-01’, ‘2012-05-02’, ‘2012-05-03’], dtype=‘datetime64[ns]’, freq=None)

Automatically converted to DatetimeIndex

In [72]: index = pd.Index(dates)

In [73]: index
Out[73]: DatetimeIndex([‘2012-05-01’, ‘2012-05-02’, ‘2012-05-03’], dtype=‘datetime64[ns]’, freq=None)
In practice this becomes very cumbersome because we often need a very long index with a large number of timestamps. If we need timestamps on a regular frequency, we can use the date_range() and bdate_range() functions to create a DatetimeIndex. The default frequency for date_range is a calendar day while the default for bdate_range is a business day:

在实践中,这会变得非常麻烦,因为我们经常需要一个非常长的索引,其中包含大量的时间戳。如果我们需要一个固定频率的时间戳,我们可以使用 date _ range ()和 bdate _ range ()函数来创建 datetime index。Date _ range 的默认频率是日历日,而 bdate _ range 的默认频率是营业日:

In [74]: start = datetime.datetime(2011, 1, 1)

In [75]: end = datetime.datetime(2012, 1, 1)

In [76]: index = pd.date_range(start, end)

In [77]: index
Out[77]:
DatetimeIndex([‘2011-01-01’, ‘2011-01-02’, ‘2011-01-03’, ‘2011-01-04’,
‘2011-01-05’, ‘2011-01-06’, ‘2011-01-07’, ‘2011-01-08’,
‘2011-01-09’, ‘2011-01-10’,

‘2011-12-23’, ‘2011-12-24’, ‘2011-12-25’, ‘2011-12-26’,
‘2011-12-27’, ‘2011-12-28’, ‘2011-12-29’, ‘2011-12-30’,
‘2011-12-31’, ‘2012-01-01’],
dtype=‘datetime64[ns]’, length=366, freq=‘D’)

In [78]: index = pd.bdate_range(start, end)

In [79]: index
Out[79]:
DatetimeIndex([‘2011-01-03’, ‘2011-01-04’, ‘2011-01-05’, ‘2011-01-06’,
‘2011-01-07’, ‘2011-01-10’, ‘2011-01-11’, ‘2011-01-12’,
‘2011-01-13’, ‘2011-01-14’,

‘2011-12-19’, ‘2011-12-20’, ‘2011-12-21’, ‘2011-12-22’,
‘2011-12-23’, ‘2011-12-26’, ‘2011-12-27’, ‘2011-12-28’,
‘2011-12-29’, ‘2011-12-30’],
dtype=‘datetime64[ns]’, length=260, freq=‘B’)
Convenience functions like date_range and bdate_range can utilize a variety of frequency aliases:

像 date _ range 和 bdate _ range 这样的方便函数可以使用多种频率别名:

In [80]: pd.date_range(start, periods=1000, freq=“M”)
Out[80]:
DatetimeIndex([‘2011-01-31’, ‘2011-02-28’, ‘2011-03-31’, ‘2011-04-30’,
‘2011-05-31’, ‘2011-06-30’, ‘2011-07-31’, ‘2011-08-31’,
‘2011-09-30’, ‘2011-10-31’,

‘2093-07-31’, ‘2093-08-31’, ‘2093-09-30’, ‘2093-10-31’,
‘2093-11-30’, ‘2093-12-31’, ‘2094-01-31’, ‘2094-02-28’,
‘2094-03-31’, ‘2094-04-30’],
dtype=‘datetime64[ns]’, length=1000, freq=‘M’)

In [81]: pd.bdate_range(start, periods=250, freq=“BQS”)
Out[81]:
DatetimeIndex([‘2011-01-03’, ‘2011-04-01’, ‘2011-07-01’, ‘2011-10-03’,
‘2012-01-02’, ‘2012-04-02’, ‘2012-07-02’, ‘2012-10-01’,
‘2013-01-01’, ‘2013-04-01’,

‘2071-01-01’, ‘2071-04-01’, ‘2071-07-01’, ‘2071-10-01’,
‘2072-01-01’, ‘2072-04-01’, ‘2072-07-01’, ‘2072-10-03’,
‘2073-01-02’, ‘2073-04-03’],
dtype=‘datetime64[ns]’, length=250, freq=‘BQS-JAN’)
date_range and bdate_range make it easy to generate a range of dates using various combinations of parameters like start, end, periods, and freq. The start and end dates are strictly inclusive, so dates outside of those specified will not be generated:

Date _ range 和 bdate _ range 使得使用开始、结束、周期和频率等参数的各种组合很容易生成一个日期范围。开始日期和结束日期严格包括在内,因此不会生成指定日期之外的日期:

In [82]: pd.date_range(start, end, freq=“BM”)
Out[82]:
DatetimeIndex([‘2011-01-31’, ‘2011-02-28’, ‘2011-03-31’, ‘2011-04-29’,
‘2011-05-31’, ‘2011-06-30’, ‘2011-07-29’, ‘2011-08-31’,
‘2011-09-30’, ‘2011-10-31’, ‘2011-11-30’, ‘2011-12-30’],
dtype=‘datetime64[ns]’, freq=‘BM’)

In [83]: pd.date_range(start, end, freq=“W”)
Out[83]:
DatetimeIndex([‘2011-01-02’, ‘2011-01-09’, ‘2011-01-16’, ‘2011-01-23’,
‘2011-01-30’, ‘2011-02-06’, ‘2011-02-13’, ‘2011-02-20’,
‘2011-02-27’, ‘2011-03-06’, ‘2011-03-13’, ‘2011-03-20’,
‘2011-03-27’, ‘2011-04-03’, ‘2011-04-10’, ‘2011-04-17’,
‘2011-04-24’, ‘2011-05-01’, ‘2011-05-08’, ‘2011-05-15’,
‘2011-05-22’, ‘2011-05-29’, ‘2011-06-05’, ‘2011-06-12’,
‘2011-06-19’, ‘2011-06-26’, ‘2011-07-03’, ‘2011-07-10’,
‘2011-07-17’, ‘2011-07-24’, ‘2011-07-31’, ‘2011-08-07’,
‘2011-08-14’, ‘2011-08-21’, ‘2011-08-28’, ‘2011-09-04’,
‘2011-09-11’, ‘2011-09-18’, ‘2011-09-25’, ‘2011-10-02’,
‘2011-10-09’, ‘2011-10-16’, ‘2011-10-23’, ‘2011-10-30’,
‘2011-11-06’, ‘2011-11-13’, ‘2011-11-20’, ‘2011-11-27’,
‘2011-12-04’, ‘2011-12-11’, ‘2011-12-18’, ‘2011-12-25’,
‘2012-01-01’],
dtype=‘datetime64[ns]’, freq=‘W-SUN’)

In [84]: pd.bdate_range(end=end, periods=20)
Out[84]:
DatetimeIndex([‘2011-12-05’, ‘2011-12-06’, ‘2011-12-07’, ‘2011-12-08’,
‘2011-12-09’, ‘2011-12-12’, ‘2011-12-13’, ‘2011-12-14’,
‘2011-12-15’, ‘2011-12-16’, ‘2011-12-19’, ‘2011-12-20’,
‘2011-12-21’, ‘2011-12-22’, ‘2011-12-23’, ‘2011-12-26’,
‘2011-12-27’, ‘2011-12-28’, ‘2011-12-29’, ‘2011-12-30’],
dtype=‘datetime64[ns]’, freq=‘B’)

In [85]: pd.bdate_range(start=start, periods=20)
Out[85]:
DatetimeIndex([‘2011-01-03’, ‘2011-01-04’, ‘2011-01-05’, ‘2011-01-06’,
‘2011-01-07’, ‘2011-01-10’, ‘2011-01-11’, ‘2011-01-12’,
‘2011-01-13’, ‘2011-01-14’, ‘2011-01-17’, ‘2011-01-18’,
‘2011-01-19’, ‘2011-01-20’, ‘2011-01-21’, ‘2011-01-24’,
‘2011-01-25’, ‘2011-01-26’, ‘2011-01-27’, ‘2011-01-28’],
dtype=‘datetime64[ns]’, freq=‘B’)
Specifying start, end, and periods will generate a range of evenly spaced dates from start to end inclusively, with periods number of elements in the resulting DatetimeIndex:

指定开始、结束和周期将生成一系列从开始到结束的均匀间隔的日期,包括生成的 DatetimeIndex 中的元素周期数:

In [86]: pd.date_range(“2018-01-01”, “2018-01-05”, periods=5)
Out[86]:
DatetimeIndex([‘2018-01-01’, ‘2018-01-02’, ‘2018-01-03’, ‘2018-01-04’,
‘2018-01-05’],
dtype=‘datetime64[ns]’, freq=None)

In [87]: pd.date_range(“2018-01-01”, “2018-01-05”, periods=10)
Out[87]:
DatetimeIndex([‘2018-01-01 00:00:00’, ‘2018-01-01 10:40:00’,
‘2018-01-01 21:20:00’, ‘2018-01-02 08:00:00’,
‘2018-01-02 18:40:00’, ‘2018-01-03 05:20:00’,
‘2018-01-03 16:00:00’, ‘2018-01-04 02:40:00’,
‘2018-01-04 13:20:00’, ‘2018-01-05 00:00:00’],
dtype=‘datetime64[ns]’, freq=None)
Custom frequency ranges 自定义频率范围
bdate_range can also generate a range of custom frequency dates by using the weekmask and holidays parameters. These parameters will only be used if a custom frequency string is passed.

Bdate _ range 还可以通过使用周掩码和节假日参数生成一系列自定义频率日期。只有在传递自定义频率字符串时才使用这些参数。

In [88]: weekmask = “Mon Wed Fri”

In [89]: holidays = [datetime.datetime(2011, 1, 5), datetime.datetime(2011, 3, 14)]

In [90]: pd.bdate_range(start, end, freq=“C”, weekmask=weekmask, holidays=holidays)
Out[90]:
DatetimeIndex([‘2011-01-03’, ‘2011-01-07’, ‘2011-01-10’, ‘2011-01-12’,
‘2011-01-14’, ‘2011-01-17’, ‘2011-01-19’, ‘2011-01-21’,
‘2011-01-24’, ‘2011-01-26’,

‘2011-12-09’, ‘2011-12-12’, ‘2011-12-14’, ‘2011-12-16’,
‘2011-12-19’, ‘2011-12-21’, ‘2011-12-23’, ‘2011-12-26’,
‘2011-12-28’, ‘2011-12-30’],
dtype=‘datetime64[ns]’, length=154, freq=‘C’)

In [91]: pd.bdate_range(start, end, freq=“CBMS”, weekmask=weekmask)
Out[91]:
DatetimeIndex([‘2011-01-03’, ‘2011-02-02’, ‘2011-03-02’, ‘2011-04-01’,
‘2011-05-02’, ‘2011-06-01’, ‘2011-07-01’, ‘2011-08-01’,
‘2011-09-02’, ‘2011-10-03’, ‘2011-11-02’, ‘2011-12-02’],
dtype=‘datetime64[ns]’, freq=‘CBMS’)
See also

参见

Custom business days

定制营业日

Timestamp limitations 时间戳限制
Since pandas represents timestamps in nanosecond resolution, the time span that can be represented using a 64-bit integer is limited to approximately 584 years:

由于熊猫以纳秒分辨率表示时间戳,因此可以使用64位整数表示的时间跨度限制为大约584年:

In [92]: pd.Timestamp.min
Out[92]: Timestamp(‘1677-09-21 00:12:43.145225’)

In [93]: pd.Timestamp.max
Out[93]: Timestamp(‘2262-04-11 23:47:16.854775807’)
See also

参见

Representing out-of-bounds spans

表示界外范围

Indexing 索引
One of the main uses for DatetimeIndex is as an index for pandas objects. The DatetimeIndex class contains many time series related optimizations:

DatetimeIndex 的主要用途之一是作为熊猫对象的索引。DatetimeIndex 类包含许多与时间序列相关的优化:

A large range of dates for various offsets are pre-computed and cached under the hood in order to make generating subsequent date ranges very fast (just have to grab a slice).

为了使生成后续的日期范围非常快速(只需要获取一个片段) ,各种偏移量的大范围日期都被预先计算和缓存在引擎盖下。

Fast shifting using the shift method on pandas objects.

熊猫目标快速移位的移位方法。

Unioning of overlapping DatetimeIndex objects with the same frequency is very fast (important for fast data alignment).

统一频率相同的重叠 DatetimeIndex 对象非常快(对于快速数据对齐很重要)。

Quick access to date fields via properties such as year, month, etc.

通过属性(如年、月等)快速访问日期字段。

Regularization functions like snap and very fast asof logic.

正则化函数类似于 snap 和非常快速的逻辑。

DatetimeIndex objects have all the basic functionality of regular Index objects, and a smorgasbord of advanced time series specific methods for easy frequency processing.

DatetimeIndex 对象具有常规 Index 对象的所有基本功能,以及一大堆高级时间序列特定方法,用于简单的频率处理。

See also

参见

Reindexing methods

驯鹿法

Note

注意

While pandas does not force you to have a sorted date index, some of these methods may have unexpected or incorrect behavior if the dates are unsorted.

虽然熊猫不强制您使用排序日期索引,但是如果日期未排序,其中一些方法可能会出现意外或不正确的行为。

DatetimeIndex can be used like a regular index and offers all of its intelligent functionality like selection, slicing, etc.

DatetimeIndex 可以像常规索引一样使用,并提供其所有智能功能,如选择、切片等。

In [94]: rng = pd.date_range(start, end, freq=“BM”)

In [95]: ts = pd.Series(np.random.randn(len(rng)), index=rng)

In [96]: ts.index
Out[96]:
DatetimeIndex([‘2011-01-31’, ‘2011-02-28’, ‘2011-03-31’, ‘2011-04-29’,
‘2011-05-31’, ‘2011-06-30’, ‘2011-07-29’, ‘2011-08-31’,
‘2011-09-30’, ‘2011-10-31’, ‘2011-11-30’, ‘2011-12-30’],
dtype=‘datetime64[ns]’, freq=‘BM’)

In [97]: ts[:5].index
Out[97]:
DatetimeIndex([‘2011-01-31’, ‘2011-02-28’, ‘2011-03-31’, ‘2011-04-29’,
‘2011-05-31’],
dtype=‘datetime64[ns]’, freq=‘BM’)

In [98]: ts[::2].index
Out[98]:
DatetimeIndex([‘2011-01-31’, ‘2011-03-31’, ‘2011-05-31’, ‘2011-07-29’,
‘2011-09-30’, ‘2011-11-30’],
dtype=‘datetime64[ns]’, freq=‘2BM’)
Partial string indexing 部分字符串索引
Dates and strings that parse to timestamps can be passed as indexing parameters:

解析时间戳的日期和字符串可以作为索引参数传递:

In [99]: ts[“1/31/2011”]
Out[99]: 0.11920871129693428

In [100]: ts[datetime.datetime(2011, 12, 25):]
Out[100]:
2011-12-30 0.56702
Freq: BM, dtype: float64

In [101]: ts[“10/31/2011”:“12/31/2011”]
Out[101]:
2011-10-31 0.271860
2011-11-30 -0.424972
2011-12-30 0.567020
Freq: BM, dtype: float64
To provide convenience for accessing longer time series, you can also pass in the year or year and month as strings:

为了方便访问较长的时间序列,您还可以将年或年和月作为字符串传递:

In [102]: ts[“2011”]
Out[102]:
2011-01-31 0.119209
2011-02-28 -1.044236
2011-03-31 -0.861849
2011-04-29 -2.104569
2011-05-31 -0.494929
2011-06-30 1.071804
2011-07-29 0.721555
2011-08-31 -0.706771
2011-09-30 -1.039575
2011-10-31 0.271860
2011-11-30 -0.424972
2011-12-30 0.567020
Freq: BM, dtype: float64

In [103]: ts[“2011-6”]
Out[103]:
2011-06-30 1.071804
Freq: BM, dtype: float64
This type of slicing will work on a DataFrame with a DatetimeIndex as well. Since the partial string selection is a form of label slicing, the endpoints will be included. This would include matching times on an included date:

这种类型的切片也可以在带有 DatetimeIndex 的 DataFrame 上工作。由于部分字符串选择是标签切片的一种形式,因此将包含端点。这将包括一个特定日期的匹配时间:

Warning

警告

Indexing DataFrame rows with a single string with getitem (e.g. frame[dtstring]) is deprecated starting with pandas 1.2.0 (given the ambiguity whether it is indexing the rows or selecting a column) and will be removed in a future version. The equivalent with .loc (e.g. frame.loc[dtstring]) is still supported.

使用 getitem (例如 frame [ dtstring ])用单个字符串索引 dataprame 行从 pandas 1.2.0开始就不建议使用了(考虑到是索引行还是选择列的模糊性) ,并将在未来版本中删除。相当于。Loc (例如 frame.loc [ dtstring ])仍然受支持。

In [104]: dft = pd.DataFrame(
…: np.random.randn(100000, 1),
…: columns=[“A”],
…: index=pd.date_range(“20130101”, periods=100000, freq=“T”),
…: )
…:

In [105]: dft
Out[105]:
A
2013-01-01 00:00:00 0.276232
2013-01-01 00:01:00 -1.087401
2013-01-01 00:02:00 -0.673690
2013-01-01 00:03:00 0.113648
2013-01-01 00:04:00 -1.478427
… …
2013-03-11 10:35:00 -0.747967
2013-03-11 10:36:00 -0.034523
2013-03-11 10:37:00 -0.201754
2013-03-11 10:38:00 -1.509067
2013-03-11 10:39:00 -1.693043

[100000 rows x 1 columns]

In [106]: dft.loc[“2013”]
Out[106]:
A
2013-01-01 00:00:00 0.276232
2013-01-01 00:01:00 -1.087401
2013-01-01 00:02:00 -0.673690
2013-01-01 00:03:00 0.113648
2013-01-01 00:04:00 -1.478427
… …
2013-03-11 10:35:00 -0.747967
2013-03-11 10:36:00 -0.034523
2013-03-11 10:37:00 -0.201754
2013-03-11 10:38:00 -1.509067
2013-03-11 10:39:00 -1.693043

[100000 rows x 1 columns]
This starts on the very first time in the month, and includes the last date and time for the month:

这从每个月的第一次开始,包括这个月的最后一次日期和时间:

In [107]: dft[“2013-1”:“2013-2”]
Out[107]:
A
2013-01-01 00:00:00 0.276232
2013-01-01 00:01:00 -1.087401
2013-01-01 00:02:00 -0.673690
2013-01-01 00:03:00 0.113648
2013-01-01 00:04:00 -1.478427
… …
2013-02-28 23:55:00 0.850929
2013-02-28 23:56:00 0.976712
2013-02-28 23:57:00 -2.693884
2013-02-28 23:58:00 -1.575535
2013-02-28 23:59:00 -1.573517

[84960 rows x 1 columns]
This specifies a stop time that includes all of the times on the last day:

这指定了一个停止时间,包括最后一天的所有时间:

In [108]: dft[“2013-1”:“2013-2-28”]
Out[108]:
A
2013-01-01 00:00:00 0.276232
2013-01-01 00:01:00 -1.087401
2013-01-01 00:02:00 -0.673690
2013-01-01 00:03:00 0.113648
2013-01-01 00:04:00 -1.478427
… …
2013-02-28 23:55:00 0.850929
2013-02-28 23:56:00 0.976712
2013-02-28 23:57:00 -2.693884
2013-02-28 23:58:00 -1.575535
2013-02-28 23:59:00 -1.573517

[84960 rows x 1 columns]
This specifies an exact stop time (and is not the same as the above):

这指定了一个精确的停止时间(并且不同于上面的) :

In [109]: dft[“2013-1”:“2013-2-28 00:00:00”]
Out[109]:
A
2013-01-01 00:00:00 0.276232
2013-01-01 00:01:00 -1.087401
2013-01-01 00:02:00 -0.673690
2013-01-01 00:03:00 0.113648
2013-01-01 00:04:00 -1.478427
… …
2013-02-27 23:56:00 1.197749
2013-02-27 23:57:00 0.720521
2013-02-27 23:58:00 -0.072718
2013-02-27 23:59:00 -0.681192
2013-02-28 00:00:00 -0.557501

[83521 rows x 1 columns]
We are stopping on the included end-point as it is part of the index:

我们在包含的终点停止,因为它是索引的一部分:

In [110]: dft[“2013-1-15”:“2013-1-15 12:30:00”]
Out[110]:
A
2013-01-15 00:00:00 -0.984810
2013-01-15 00:01:00 0.941451
2013-01-15 00:02:00 1.559365
2013-01-15 00:03:00 1.034374
2013-01-15 00:04:00 -1.480656
… …
2013-01-15 12:26:00 0.371454
2013-01-15 12:27:00 -0.930806
2013-01-15 12:28:00 -0.069177
2013-01-15 12:29:00 0.066510
2013-01-15 12:30:00 -0.003945

[751 rows x 1 columns]
DatetimeIndex partial string indexing also works on a DataFrame with a MultiIndex:

Dattimeindex 部分字符串索引也可以在使用 MultiIndex 的 DataFrame 上工作:

In [111]: dft2 = pd.DataFrame(
…: np.random.randn(20, 1),
…: columns=[“A”],
…: index=pd.MultiIndex.from_product(
…: [pd.date_range(“20130101”, periods=10, freq=“12H”), [“a”, “b”]]
…: ),
…: )
…:

In [112]: dft2
Out[112]:
A
2013-01-01 00:00:00 a -0.298694
b 0.823553
2013-01-01 12:00:00 a 0.943285
b -1.479399
2013-01-02 00:00:00 a -1.643342
… …
2013-01-04 12:00:00 b 0.069036
2013-01-05 00:00:00 a 0.122297
b 1.422060
2013-01-05 12:00:00 a 0.370079
b 1.016331

[20 rows x 1 columns]

In [113]: dft2.loc[“2013-01-05”]
Out[113]:
A
2013-01-05 00:00:00 a 0.122297
b 1.422060
2013-01-05 12:00:00 a 0.370079
b 1.016331

In [114]: idx = pd.IndexSlice

In [115]: dft2 = dft2.swaplevel(0, 1).sort_index()

In [116]: dft2.loc[idx[:, “2013-01-05”], :]
Out[116]:
A
a 2013-01-05 00:00:00 0.122297
2013-01-05 12:00:00 0.370079
b 2013-01-05 00:00:00 1.422060
2013-01-05 12:00:00 1.016331
New in version 0.25.0.

新版本0.25.0。

Slicing with string indexing also honors UTC offset.

带有字符串索引的切片还可以荣誉 UTC 偏移量。

In [117]: df = pd.DataFrame([0], index=pd.DatetimeIndex([“2019-01-01”], tz=“US/Pacific”))

In [118]: df
Out[118]:
0
2019-01-01 00:00:00-08:00 0

In [119]: df[“2019-01-01 12:00:00+04:00”:“2019-01-01 13:00:00+04:00”]
Out[119]:
0
2019-01-01 00:00:00-08:00 0
Slice vs. exact match 切片与精确匹配
The same string used as an indexing parameter can be treated either as a slice or as an exact match depending on the resolution of the index. If the string is less accurate than the index, it will be treated as a slice, otherwise as an exact match.

作为索引参数使用的相同字符串可以视为片或完全匹配,具体取决于索引的分辨率。如果字符串不如索引精确,则将其视为片,否则视为精确匹配。

Consider a Series object with a minute resolution index:

考虑一个具有分钟分辨率索引的 Series 对象:

In [120]: series_minute = pd.Series(
…: [1, 2, 3],
…: pd.DatetimeIndex(
…: [“2011-12-31 23:59:00”, “2012-01-01 00:00:00”, “2012-01-01 00:02:00”]
…: ),
…: )
…:

In [121]: series_minute.index.resolution
Out[121]: ‘minute’
A timestamp string less accurate than a minute gives a Series object.

如果时间戳字符串的精确度低于一分钟,则为 Series 对象。

In [122]: series_minute[“2011-12-31 23”]
Out[122]:
2011-12-31 23:59:00 1
dtype: int64
A timestamp string with minute resolution (or more accurate), gives a scalar instead, i.e. it is not casted to a slice.

具有分钟分辨率(或更精确)的时间戳字符串给出一个标量,也就是说,它不是铸造成一个片。

In [123]: series_minute[“2011-12-31 23:59”]
Out[123]: 1

In [124]: series_minute[“2011-12-31 23:59:00”]
Out[124]: 1
If index resolution is second, then the minute-accurate timestamp gives a Series.

如果索引解析度是第二位的,那么分钟精确的时间戳给出一个序列。

In [125]: series_second = pd.Series(
…: [1, 2, 3],
…: pd.DatetimeIndex(
…: [“2011-12-31 23:59:59”, “2012-01-01 00:00:00”, “2012-01-01 00:00:01”]
…: ),
…: )
…:

In [126]: series_second.index.resolution
Out[126]: ‘second’

In [127]: series_second[“2011-12-31 23:59”]
Out[127]:
2011-12-31 23:59:59 1
dtype: int64
If the timestamp string is treated as a slice, it can be used to index DataFrame with .loc[] as well.

如果将时间戳字符串视为片,那么也可以使用.loc []对 DataFrame 进行索引。

In [128]: dft_minute = pd.DataFrame(
…: {“a”: [1, 2, 3], “b”: [4, 5, 6]}, index=series_minute.index
…: )
…:

In [129]: dft_minute.loc[“2011-12-31 23”]
Out[129]:
a b
2011-12-31 23:59:00 1 4
Warning

警告

However, if the string is treated as an exact match, the selection in DataFrame’s [] will be column-wise and not row-wise, see Indexing Basics. For example dft_minute[‘2011-12-31 23:59’] will raise KeyError as ‘2012-12-31 23:59’ has the same resolution as the index and there is no column with such name:

但是,如果字符串被视为完全匹配,那么 DataFrame []中的选择将是按列而不是按行进行的,请参见 Indexing Basics。例如,dft _ minute [’2011-12-3123:59’]将提高 KeyError,因为’2012-12-3123:59’与索引具有相同的分辨率,并且没有这样名称的列:

To always have unambiguous selection, whether the row is treated as a slice or a single selection, use .loc.

要始终具有明确的选择,无论将行视为片还是单个选择,请使用. loc。

In [130]: dft_minute.loc[“2011-12-31 23:59”]
Out[130]:
a 1
b 4
Name: 2011-12-31 23:59:00, dtype: int64
Note also that DatetimeIndex resolution cannot be less precise than day.

还请注意,DatetimeIndex 解析不能低于天。

In [131]: series_monthly = pd.Series(
…: [1, 2, 3], pd.DatetimeIndex([“2011-12”, “2012-01”, “2012-02”])
…: )
…:

In [132]: series_monthly.index.resolution
Out[132]: ‘day’

In [133]: series_monthly[“2011-12”] # returns Series
Out[133]:
2011-12-01 1
dtype: int64
Exact indexing 精确的索引
As discussed in previous section, indexing a DatetimeIndex with a partial string depends on the “accuracy” of the period, in other words how specific the interval is in relation to the resolution of the index. In contrast, indexing with Timestamp or datetime objects is exact, because the objects have exact meaning. These also follow the semantics of including both endpoints.

正如前一节所讨论的,使用部分字符串为 DatetimeIndex 索引取决于周期的“准确性”,换句话说,相对于索引的解析度,间隔的具体程度。相比之下,使用 Timestamp 或 datetime 对象进行索引是精确的,因为对象具有精确的含义。这些还遵循包含两个端点的语义。

These Timestamp and datetime objects have exact hours, minutes, and seconds, even though they were not explicitly specified (they are 0).

这些 Timestamp 和 datetime 对象具有精确的小时、分钟和秒,尽管它们没有明确指定(它们为0)。

In [134]: dft[datetime.datetime(2013, 1, 1): datetime.datetime(2013, 2, 28)]
Out[134]:
A
2013-01-01 00:00:00 0.276232
2013-01-01 00:01:00 -1.087401
2013-01-01 00:02:00 -0.673690
2013-01-01 00:03:00 0.113648
2013-01-01 00:04:00 -1.478427
… …
2013-02-27 23:56:00 1.197749
2013-02-27 23:57:00 0.720521
2013-02-27 23:58:00 -0.072718
2013-02-27 23:59:00 -0.681192
2013-02-28 00:00:00 -0.557501

[83521 rows x 1 columns]
With no defaults.

没有违约。

In [135]: dft[
…: datetime.datetime(2013, 1, 1, 10, 12, 0): datetime.datetime(
…: 2013, 2, 28, 10, 12, 0
…: )
…: ]
…:
Out[135]:
A
2013-01-01 10:12:00 0.565375
2013-01-01 10:13:00 0.068184
2013-01-01 10:14:00 0.788871
2013-01-01 10:15:00 -0.280343
2013-01-01 10:16:00 0.931536
… …
2013-02-28 10:08:00 0.148098
2013-02-28 10:09:00 -0.388138
2013-02-28 10:10:00 0.139348
2013-02-28 10:11:00 0.085288
2013-02-28 10:12:00 0.950146

[83521 rows x 1 columns]
Truncating & fancy indexing 截断和花式索引
A truncate() convenience function is provided that is similar to slicing. Note that truncate assumes a 0 value for any unspecified date component in a DatetimeIndex in contrast to slicing which returns any partially matching dates:

提供了类似于分片的 truncate ()便利函数。注意,truncate 对 DatetimeIndex 中任何未指定的日期组件假定一个0值,而切片则返回任何部分匹配的日期:

In [136]: rng2 = pd.date_range(“2011-01-01”, “2012-01-01”, freq=“W”)

In [137]: ts2 = pd.Series(np.random.randn(len(rng2)), index=rng2)

In [138]: ts2.truncate(before=“2011-11”, after=“2011-12”)
Out[138]:
2011-11-06 0.437823
2011-11-13 -0.293083
2011-11-20 -0.059881
2011-11-27 1.252450
Freq: W-SUN, dtype: float64

In [139]: ts2[“2011-11”:“2011-12”]
Out[139]:
2011-11-06 0.437823
2011-11-13 -0.293083
2011-11-20 -0.059881
2011-11-27 1.252450
2011-12-04 0.046611
2011-12-11 0.059478
2011-12-18 -0.286539
2011-12-25 0.841669
Freq: W-SUN, dtype: float64
Even complicated fancy indexing that breaks the DatetimeIndex frequency regularity will result in a DatetimeIndex, although frequency is lost:

即使是打破了 DatetimeIndex 频率规律的复杂的幻想索引也会导致 DatetimeIndex,尽管频率会丢失:

In [140]: ts2[[0, 2, 6]].index
Out[140]: DatetimeIndex([‘2011-01-02’, ‘2011-01-16’, ‘2011-02-13’], dtype=‘datetime64[ns]’, freq=None)
Time/date components 时间/日期组件
There are several time/date properties that one can access from Timestamp or a collection of timestamps like a DatetimeIndex.

有几个时间/日期属性可以从 Timestamp 或一组时间戳(如 datetimindex)中访问。

Property

物业

Description

描述

year

The year of the datetime

日期时间的年份

month

The month of the datetime

日期时间的月份

day

The days of the datetime

日期时间的天数

hour

一小时

The hour of the datetime

日期时间的小时

minute

一分钟

The minutes of the datetime

日期时间的分钟

second

第二

The seconds of the datetime

日期时间的秒数

microsecond

微秒

The microseconds of the datetime

日期时间的微秒

nanosecond

毫微秒

The nanoseconds of the datetime

日期时间的纳秒

date

日期

Returns datetime.date (does not contain timezone information)

返回 datetime.date (不包含时区信息)

time

时间

Returns datetime.time (does not contain timezone information)

返回 datetime.time (不包含时区信息)

timetz

时刻表

Returns datetime.time as local time with timezone information

使用时区信息将 datetime.time 返回为本地时间

dayofyear

每日

The ordinal day of year

一年的第几天

day_of_year

一年中的一天

The ordinal day of year

一年的第几天

weekofyear

每年一周

The week ordinal of the year

每年的第几周

week

一周

The week ordinal of the year

每年的第几周

dayofweek

每周一天

The number of the day of the week with Monday=0, Sunday=6

星期一的天数 = 0,星期日 = 6

day_of_week

星期几

The number of the day of the week with Monday=0, Sunday=6

星期一的天数 = 0,星期日 = 6

weekday

工作日

The number of the day of the week with Monday=0, Sunday=6

星期一的天数 = 0,星期日 = 6

quarter

四分之一

Quarter of the date: Jan-Mar = 1, Apr-Jun = 2, etc.

日期的四分之一: 1-3 = 1,4-6 = 2,等等。

days_in_month

每月的天数

The number of days in the month of the datetime

日期时间月份中的天数

is_month_start

是一个月的开始

Logical indicating if first day of month (defined by frequency)

逻辑指示是否月的第一天(由频率定义)

is_month_end

月底

Logical indicating if last day of month (defined by frequency)

逻辑表示月份的最后一天(由频率定义)

is_quarter_start

是四分之一的开始

Logical indicating if first day of quarter (defined by frequency)

逻辑表示季度的第一天(由频率定义)

is_quarter_end

就是季度末

Logical indicating if last day of quarter (defined by frequency)

逻辑表示季度的最后一天(由频率定义)

is_year_start

今年开始

Logical indicating if first day of year (defined by frequency)

逻辑表示一年的第一天(由频率定义)

is_year_end

今年年底

Logical indicating if last day of year (defined by frequency)

逻辑表示一年中的最后一天(由频率定义)

is_leap_year

是闰年

Logical indicating if the date belongs to a leap year

逻辑指示日期是否属于闰年

Furthermore, if you have a Series with datetimelike values, then you can access these properties via the .dt accessor, as detailed in the section on .dt accessors.

此外,如果您有一个带有日期类似值的 Series,那么您可以通过。Dt 访问器,详见。Dt 访问器。

New in version 1.1.0.

新版本1.1.0。

You may obtain the year, week and day components of the ISO year from the ISO 8601 standard:

你可从 iso8601国际编码标准取得 iso2001年的年、周及日组成部分:

In [141]: idx = pd.date_range(start=“2019-12-29”, freq=“D”, periods=4)

In [142]: idx.isocalendar()
Out[142]:
year week day
2019-12-29 2019 52 7
2019-12-30 2020 1 1
2019-12-31 2020 1 2
2020-01-01 2020 1 3

In [143]: idx.to_series().dt.isocalendar()
Out[143]:
year week day
2019-12-29 2019 52 7
2019-12-30 2020 1 1
2019-12-31 2020 1 2
2020-01-01 2020 1 3
DateOffset objects 日期偏移量对象
In the preceding examples, frequency strings (e.g. ‘D’) were used to specify a frequency that defined:

在前面的例子中,频率字符串(例如“ d”)被用来指定一个定义为:

how the date times in DatetimeIndex were spaced when using date_range()

当使用 date _ range ()时,DatetimeIndex 中的日期时间是如何分隔的

the frequency of a Period or PeriodIndex

周期或周期的频率

These frequency strings map to a DateOffset object and its subclasses. A DateOffset is similar to a Timedelta that represents a duration of time but follows specific calendar duration rules. For example, a Timedelta day will always increment datetimes by 24 hours, while a DateOffset day will increment datetimes to the same time the next day whether a day represents 23, 24 or 25 hours due to daylight savings time. However, all DateOffset subclasses that are an hour or smaller (Hour, Minute, Second, Milli, Micro, Nano) behave like Timedelta and respect absolute time.

这些频率字符串映射到 DateOffset 对象及其子类。DateOffset 类似于 Timedelta,它表示时间的持续时间,但遵循特定的日历持续时间规则。例如,Timedelta 日总是将日期时间增加24小时,而 DateOffset 日则将日期时间增加到第二天的同一时间,无论一天是由于夏时制时间而代表23小时、24小时还是25小时。然而,所有小于或等于1小时的 DateOffset 子类(小时、分钟、秒、毫米、微米、纳米)都像 Timedelta 一样工作,并且尊重绝对时间。

The basic DateOffset acts similar to dateutil.relativedelta (relativedelta documentation) that shifts a date time by the corresponding calendar duration specified. The arithmetic operator (+) or the apply method can be used to perform the shift.

基本的 DateOffset 类似于 dateutil.relativedelta (相对增量文档) ,根据指定的相应日历持续时间改变日期时间。算术运算符(+)或 apply 方法可用于执行移位。

This particular day contains a day light savings time transition

In [144]: ts = pd.Timestamp(“2016-10-30 00:00:00”, tz=“Europe/Helsinki”)

Respects absolute time

In [145]: ts + pd.Timedelta(days=1)
Out[145]: Timestamp(‘2016-10-30 23:00:00+0200’, tz=‘Europe/Helsinki’)

Respects calendar time

In [146]: ts + pd.DateOffset(days=1)
Out[146]: Timestamp(‘2016-10-31 00:00:00+0200’, tz=‘Europe/Helsinki’)

In [147]: friday = pd.Timestamp(“2018-01-05”)

In [148]: friday.day_name()
Out[148]: ‘Friday’

Add 2 business days (Friday --> Tuesday)

In [149]: two_business_days = 2 * pd.offsets.BDay()

In [150]: two_business_days.apply(friday)
Out[150]: Timestamp(‘2018-01-09 00:00:00’)

In [151]: friday + two_business_days
Out[151]: Timestamp(‘2018-01-09 00:00:00’)

In [152]: (friday + two_business_days).day_name()
Out[152]: ‘Tuesday’
Most DateOffsets have associated frequencies strings, or offset aliases, that can be passed into freq keyword arguments. The available date offsets and associated frequency strings can be found below:

大多数 dateoffset 都有相关的频率字符串或偏移别名,可以传递到 freq 关键字参数中。可用的日期偏移量和相关的频率字符串可以在下面找到:

Date Offset

日期偏移

Frequency String

频率字符串

Description

描述

DateOffset

日期偏移量

None

没有

Generic offset class, defaults to absolute 24 hours

泛型偏移类,默认为绝对24小时

BDay or BusinessDay

生日或商业日

‘B’

B

business day (weekday)

营业日(工作日)

CDay or CustomBusinessDay

或 CustomBusinessDay

‘C’

C

custom business day

定制营业日

Week

第一周

‘W’

‘ w’

one week, optionally anchored on a day of the week

一个星期,可以选择在一周的某一天抛锚

WeekOfMonth

每周/每月

‘WOM’

“口碑”

the x-th day of the y-th week of each month

每个月的第 y 周的第 x 天

LastWeekOfMonth

本月最后一周

‘LWOM’

“ LWOM”

the x-th day of the last week of each month

每月最后一周的第十天

MonthEnd

月底

‘M’

M

calendar month end

日历月底

MonthBegin

月份开始

‘MS’

‘ MS’

calendar month begin

日历月开始

BMonthEnd or BusinessMonthEnd

月结或商业月结

‘BM’

‘ BM’

business month end

营业月结

BMonthBegin or BusinessMonthBegin

BMonthBegin 或 BusinessMonthBegin

‘BMS’

‘ BMS’

business month begin

营业月开始

CBMonthEnd or CustomBusinessMonthEnd

或 CustomBusinessMonthEnd

‘CBM’

煤层气

custom business month end

定制业务月底

CBMonthBegin or CustomBusinessMonthBegin

或 CustomBusinessMonthBegin

‘CBMS’

“ CBMS”

custom business month begin

定制营业月开始

SemiMonthEnd

女名女子名

‘SM’

‘ SM’

15th (or other day_of_month) and calendar month end

月十五日(或其他日期)及日历月底

SemiMonthBegin

‘SMS’

短信

15th (or other day_of_month) and calendar month begin

15日(或其他日期 _ 月份)和日历月份开始

QuarterEnd

季末

‘Q’

‘ q’

calendar quarter end

日历季末

QuarterBegin

‘QS’

‘ QS’

calendar quarter begin

日历季度开始

BQuarterEnd

'BQ

‘ BQ

business quarter end

业务季度末

BQuarterBegin

‘BQS’

‘ BQS’

business quarter begin

营业季度开始

FY5253Quarter

5253quarter

‘REQ’

‘ REQ’

retail (aka 52-53 week) quarter

零售(52-53周)季度

YearEnd

年底

‘A’

‘ a’

calendar year end

历年年底

YearBegin

年份开始

‘AS’ or ‘BYS’

“ AS”或“ BYS”

calendar year begin

历年开始

BYearEnd

再见

‘BA’

‘ BA’

business year end

营业年度结束

BYearBegin

‘BAS’

楼宇自控系统

business year begin

营业年度开始

FY5253

5253财年

‘RE’

‘ RE’

retail (aka 52-53 week) year

零售(52-53周)一年

Easter

复活节

None

没有

Easter holiday

复活节假期

BusinessHour

‘BH’

‘ BH’

business hour

营业时间

CustomBusinessHour

‘CBH’

‘ CBH’

custom business hour

定制营业时间

Day

白天

‘D’

D

one absolute day

绝对的一天

Hour

一小时

‘H’

‘ h’

one hour

一小时

Minute

一分钟

‘T’ or ‘min’

‘ t’或‘ min’

one minute

一分钟

Second

第二

‘S’

S

one second

等一下

Milli

女名女子名

‘L’ or ‘ms’

‘ l’或‘ ms’

one millisecond

一毫秒

Micro

微型

‘U’ or ‘us’

‘ u’或‘ us’

one microsecond

一微秒

Nano

‘N’

‘ n’

one nanosecond

一纳秒

DateOffsets additionally have rollforward() and rollback() methods for moving a date forward or backward respectively to a valid offset date relative to the offset. For example, business offsets will roll dates that land on the weekends (Saturday and Sunday) forward to Monday since business offsets operate on the weekdays.

另外,dateoffset 还有 rollforward ()和 rollback ()方法,分别用于将日期向前或向后移动到相对于偏移量的有效偏移日期。例如,业务补偿将把周末(星期六和星期日)着陆的日期提前到星期一,因为业务补偿在工作日运作。

In [153]: ts = pd.Timestamp(“2018-01-06 00:00:00”)

In [154]: ts.day_name()
Out[154]: ‘Saturday’

BusinessHour’s valid offset dates are Monday through Friday

In [155]: offset = pd.offsets.BusinessHour(start=“09:00”)

Bring the date to the closest offset date (Monday)

In [156]: offset.rollforward(ts)
Out[156]: Timestamp(‘2018-01-08 09:00:00’)

Date is brought to the closest offset date first and then the hour is added

In [157]: ts + offset
Out[157]: Timestamp(‘2018-01-08 10:00:00’)
These operations preserve time (hour, minute, etc) information by default. To reset time to midnight, use normalize() before or after applying the operation (depending on whether you want the time information included in the operation).

默认情况下,这些操作保存时间(小时、分钟等)信息。若要将时间重置为午夜,请在应用操作之前或之后使用 normalize ()(取决于您是否希望操作中包含时间信息)。

In [158]: ts = pd.Timestamp(“2014-01-01 09:00”)

In [159]: day = pd.offsets.Day()

In [160]: day.apply(ts)
Out[160]: Timestamp(‘2014-01-02 09:00:00’)

In [161]: day.apply(ts).normalize()
Out[161]: Timestamp(‘2014-01-02 00:00:00’)

In [162]: ts = pd.Timestamp(“2014-01-01 22:00”)

In [163]: hour = pd.offsets.Hour()

In [164]: hour.apply(ts)
Out[164]: Timestamp(‘2014-01-01 23:00:00’)

In [165]: hour.apply(ts).normalize()
Out[165]: Timestamp(‘2014-01-01 00:00:00’)

In [166]: hour.apply(pd.Timestamp(“2014-01-01 23:30”)).normalize()
Out[166]: Timestamp(‘2014-01-02 00:00:00’)
Parametric offsets 参数偏移
Some of the offsets can be “parameterized” when created to result in different behaviors. For example, the Week offset for generating weekly data accepts a weekday parameter which results in the generated dates always lying on a particular day of the week:

当创建一些偏移量以导致不同的行为时,它们可以被“参数化”。例如,用于生成每周数据的 Week 偏移量接受一个工作日参数,该参数导致生成的日期总是位于一周中的某一天:

In [167]: d = datetime.datetime(2008, 8, 18, 9, 0)

In [168]: d
Out[168]: datetime.datetime(2008, 8, 18, 9, 0)

In [169]: d + pd.offsets.Week()
Out[169]: Timestamp(‘2008-08-25 09:00:00’)

In [170]: d + pd.offsets.Week(weekday=4)
Out[170]: Timestamp(‘2008-08-22 09:00:00’)

In [171]: (d + pd.offsets.Week(weekday=4)).weekday()
Out[171]: 4

In [172]: d - pd.offsets.Week()
Out[172]: Timestamp(‘2008-08-11 09:00:00’)
The normalize option will be effective for addition and subtraction.

规格化选项对于加减法是有效的。

In [173]: d + pd.offsets.Week(normalize=True)
Out[173]: Timestamp(‘2008-08-25 00:00:00’)

In [174]: d - pd.offsets.Week(normalize=True)
Out[174]: Timestamp(‘2008-08-11 00:00:00’)
Another example is parameterizing YearEnd with the specific ending month:

另一个例子是用特定的结束月份来参数化年终值:

In [175]: d + pd.offsets.YearEnd()
Out[175]: Timestamp(‘2008-12-31 09:00:00’)

In [176]: d + pd.offsets.YearEnd(month=6)
Out[176]: Timestamp(‘2009-06-30 09:00:00’)
Using offsets with 使用偏移Series / DatetimeIndex
Offsets can be used with either a Series or DatetimeIndex to apply the offset to each element.

偏移量可以与 Series 或 DatetimeIndex 一起使用,以对每个元素应用偏移量。

In [177]: rng = pd.date_range(“2012-01-01”, “2012-01-03”)

In [178]: s = pd.Series(rng)

In [179]: rng
Out[179]: DatetimeIndex([‘2012-01-01’, ‘2012-01-02’, ‘2012-01-03’], dtype=‘datetime64[ns]’, freq=‘D’)

In [180]: rng + pd.DateOffset(months=2)
Out[180]: DatetimeIndex([‘2012-03-01’, ‘2012-03-02’, ‘2012-03-03’], dtype=‘datetime64[ns]’, freq=None)

In [181]: s + pd.DateOffset(months=2)
Out[181]:
0 2012-03-01
1 2012-03-02
2 2012-03-03
dtype: datetime64[ns]

In [182]: s - pd.DateOffset(months=2)
Out[182]:
0 2011-11-01
1 2011-11-02
2 2011-11-03
dtype: datetime64[ns]
If the offset class maps directly to a Timedelta (Day, Hour, Minute, Second, Micro, Milli, Nano) it can be used exactly like a Timedelta - see the Timedelta section for more examples.

如果偏移类直接映射到时间三角洲(日,小时,分钟,秒,微米,毫米,纳米) ,它可以像时间三角洲一样使用——查看时间三角洲部分获得更多示例。

In [183]: s - pd.offsets.Day(2)
Out[183]:
0 2011-12-30
1 2011-12-31
2 2012-01-01
dtype: datetime64[ns]

In [184]: td = s - pd.Series(pd.date_range(“2011-12-29”, “2011-12-31”))

In [185]: td
Out[185]:
0 3 days
1 3 days
2 3 days
dtype: timedelta64[ns]

In [186]: td + pd.offsets.Minute(15)
Out[186]:
0 3 days 00:15:00
1 3 days 00:15:00
2 3 days 00:15:00
dtype: timedelta64[ns]
Note that some offsets (such as BQuarterEnd) do not have a vectorized implementation. They can still be used but may calculate significantly slower and will show a PerformanceWarning

注意,有些偏移量(例如 BQuarterEnd)没有向量化的实现。它们仍然可以被使用,但是计算速度可能会明显变慢,并且会显示一个 PerformanceWarning

In [187]: rng + pd.offsets.BQuarterEnd()
Out[187]: DatetimeIndex([‘2012-03-30’, ‘2012-03-30’, ‘2012-03-30’], dtype=‘datetime64[ns]’, freq=None)
Custom business days 定制营业日
The CDay or CustomBusinessDay class provides a parametric BusinessDay class which can be used to create customized business day calendars which account for local holidays and local weekend conventions.

CDay 或 CustomBusinessDay 类提供了一个参数化的 BusinessDay 类,可用于创建自定义的工作日日历,其中包括当地的节假日和当地的周末惯例。

As an interesting example, let’s look at Egypt where a Friday-Saturday weekend is observed.

作为一个有趣的例子,让我们来看看埃及,那里有一个周五到周六的周末。

In [188]: weekmask_egypt = “Sun Mon Tue Wed Thu”

They also observe International Workers’ Day so let’s

add that for a couple of years

In [189]: holidays = [
…: “2012-05-01”,
…: datetime.datetime(2013, 5, 1),
…: np.datetime64(“2014-05-01”),
…: ]
…:

In [190]: bday_egypt = pd.offsets.CustomBusinessDay(
…: holidays=holidays,
…: weekmask=weekmask_egypt,
…: )
…:

In [191]: dt = datetime.datetime(2013, 4, 30)

In [192]: dt + 2 * bday_egypt
Out[192]: Timestamp(‘2013-05-05 00:00:00’)
Let’s map to the weekday names:

让我们映射到工作日的名称:

In [193]: dts = pd.date_range(dt, periods=5, freq=bday_egypt)

In [194]: pd.Series(dts.weekday, dts).map(pd.Series(“Mon Tue Wed Thu Fri Sat Sun”.split()))
Out[194]:
2013-04-30 Tue
2013-05-02 Thu
2013-05-05 Sun
2013-05-06 Mon
2013-05-07 Tue
Freq: C, dtype: object
Holiday calendars can be used to provide the list of holidays. See the holiday calendar section for more information.

可以使用假日日历提供假日列表。有关详细信息,请参阅假日日历部分。

In [195]: from pandas.tseries.holiday import USFederalHolidayCalendar

In [196]: bday_us = pd.offsets.CustomBusinessDay(calendar=USFederalHolidayCalendar())

Friday before MLK Day

In [197]: dt = datetime.datetime(2014, 1, 17)

Tuesday after MLK Day (Monday is skipped because it’s a holiday)

In [198]: dt + bday_us
Out[198]: Timestamp(‘2014-01-21 00:00:00’)
Monthly offsets that respect a certain holiday calendar can be defined in the usual way.

按照某一节日日历的每月补偿可以用通常的方式来定义。

In [199]: bmth_us = pd.offsets.CustomBusinessMonthBegin(calendar=USFederalHolidayCalendar())

Skip new years

In [200]: dt = datetime.datetime(2013, 12, 17)

In [201]: dt + bmth_us
Out[201]: Timestamp(‘2014-01-02 00:00:00’)

Define date index with custom offset

In [202]: pd.date_range(start=“20100101”, end=“20120101”, freq=bmth_us)
Out[202]:
DatetimeIndex([‘2010-01-04’, ‘2010-02-01’, ‘2010-03-01’, ‘2010-04-01’,
‘2010-05-03’, ‘2010-06-01’, ‘2010-07-01’, ‘2010-08-02’,
‘2010-09-01’, ‘2010-10-01’, ‘2010-11-01’, ‘2010-12-01’,
‘2011-01-03’, ‘2011-02-01’, ‘2011-03-01’, ‘2011-04-01’,
‘2011-05-02’, ‘2011-06-01’, ‘2011-07-01’, ‘2011-08-01’,
‘2011-09-01’, ‘2011-10-03’, ‘2011-11-01’, ‘2011-12-01’],
dtype=‘datetime64[ns]’, freq=‘CBMS’)
Note

注意

The frequency string ‘C’ is used to indicate that a CustomBusinessDay DateOffset is used, it is important to note that since CustomBusinessDay is a parameterised type, instances of CustomBusinessDay may differ and this is not detectable from the ‘C’ frequency string. The user therefore needs to ensure that the ‘C’ frequency string is used consistently within the user’s application.

使用频率字符串‘ c’表示使用了 CustomBusinessDay DateOffset,需要注意的是,由于 CustomBusinessDay 是一个参数化类型,CustomBusinessDay 的实例可能会有所不同,这是不能从‘ c’频率字符串中检测到的。因此,用户需要确保在用户的应用程序中始终如一地使用 c 频率字符串。

Business hour 营业时间
The BusinessHour class provides a business hour representation on BusinessDay, allowing to use specific start and end times.

BusinessHour 类提供了 BusinessDay 的业务时间表示,允许使用特定的开始和结束时间。

By default, BusinessHour uses 9:00 - 17:00 as business hours. Adding BusinessHour will increment Timestamp by hourly frequency. If target Timestamp is out of business hours, move to the next business hour then increment it. If the result exceeds the business hours end, the remaining hours are added to the next business day.

默认情况下,BusinessHour 使用9:00-17:00作为营业时间。增加 BusinessHour 将按小时频率增加时间戳。如果目标时间戳超出营业时间,请移动到下一个营业时间,然后增加它。如果结果超过营业时间结束,剩余时间将被加入下一个营业日。

In [203]: bh = pd.offsets.BusinessHour()

In [204]: bh
Out[204]: <BusinessHour: BH=09:00-17:00>

2014-08-01 is Friday

In [205]: pd.Timestamp(“2014-08-01 10:00”).weekday()
Out[205]: 4

In [206]: pd.Timestamp(“2014-08-01 10:00”) + bh
Out[206]: Timestamp(‘2014-08-01 11:00:00’)

Below example is the same as: pd.Timestamp(‘2014-08-01 09:00’) + bh

In [207]: pd.Timestamp(“2014-08-01 08:00”) + bh
Out[207]: Timestamp(‘2014-08-01 10:00:00’)

If the results is on the end time, move to the next business day

In [208]: pd.Timestamp(“2014-08-01 16:00”) + bh
Out[208]: Timestamp(‘2014-08-04 09:00:00’)

Remainings are added to the next day

In [209]: pd.Timestamp(“2014-08-01 16:30”) + bh
Out[209]: Timestamp(‘2014-08-04 09:30:00’)

Adding 2 business hours

In [210]: pd.Timestamp(“2014-08-01 10:00”) + pd.offsets.BusinessHour(2)
Out[210]: Timestamp(‘2014-08-01 12:00:00’)

Subtracting 3 business hours

In [211]: pd.Timestamp(“2014-08-01 10:00”) + pd.offsets.BusinessHour(-3)
Out[211]: Timestamp(‘2014-07-31 15:00:00’)
You can also specify start and end time by keywords. The argument must be a str with an hour:minute representation or a datetime.time instance. Specifying seconds, microseconds and nanoseconds as business hour results in ValueError.

还可以通过关键字指定开始和结束时间。参数必须是带有 hour: minute 表示形式或 datetime.time 实例的 str。将秒、微秒和纳秒指定为营业时间会导致 ValueError。

In [212]: bh = pd.offsets.BusinessHour(start=“11:00”, end=datetime.time(20, 0))

In [213]: bh
Out[213]: <BusinessHour: BH=11:00-20:00>

In [214]: pd.Timestamp(“2014-08-01 13:00”) + bh
Out[214]: Timestamp(‘2014-08-01 14:00:00’)

In [215]: pd.Timestamp(“2014-08-01 09:00”) + bh
Out[215]: Timestamp(‘2014-08-01 12:00:00’)

In [216]: pd.Timestamp(“2014-08-01 18:00”) + bh
Out[216]: Timestamp(‘2014-08-01 19:00:00’)
Passing start time later than end represents midnight business hour. In this case, business hour exceeds midnight and overlap to the next day. Valid business hours are distinguished by whether it started from valid BusinessDay.

晚于结束的开始时间表示午夜营业时间。在这种情况下,营业时间超过午夜,并重叠到第二天。有效营业时间的区别在于它是否从有效的商业日开始。

In [217]: bh = pd.offsets.BusinessHour(start=“17:00”, end=“09:00”)

In [218]: bh
Out[218]: <BusinessHour: BH=17:00-09:00>

In [219]: pd.Timestamp(“2014-08-01 17:00”) + bh
Out[219]: Timestamp(‘2014-08-01 18:00:00’)

In [220]: pd.Timestamp(“2014-08-01 23:00”) + bh
Out[220]: Timestamp(‘2014-08-02 00:00:00’)

Although 2014-08-02 is Saturday,

it is valid because it starts from 08-01 (Friday).

In [221]: pd.Timestamp(“2014-08-02 04:00”) + bh
Out[221]: Timestamp(‘2014-08-02 05:00:00’)

Although 2014-08-04 is Monday,

it is out of business hours because it starts from 08-03 (Sunday).

In [222]: pd.Timestamp(“2014-08-04 04:00”) + bh
Out[222]: Timestamp(‘2014-08-04 18:00:00’)
Applying BusinessHour.rollforward and rollback to out of business hours results in the next business hour start or previous day’s end. Different from other offsets, BusinessHour.rollforward may output different results from apply by definition.

将 BusinessHour.rollforward 和回滚应用于非营业时间将导致下一个营业时间开始或前一天结束。与其他偏移不同的是,BusinessHour.rollforward 可能会输出与根据定义应用不同的结果。

This is because one day’s business hour end is equal to next day’s business hour start. For example, under the default business hours (9:00 - 17:00), there is no gap (0 minutes) between 2014-08-01 17:00 and 2014-08-04 09:00.

这是因为一天的营业时间结束等于第二天的营业时间开始。例如,在默认营业时间(9:00-17:00)下,2014-08-0117:00和2014-08-0409:00之间没有间隔(0分钟)。

This adjusts a Timestamp to business hour edge

In [223]: pd.offsets.BusinessHour().rollback(pd.Timestamp(“2014-08-02 15:00”))
Out[223]: Timestamp(‘2014-08-01 17:00:00’)

In [224]: pd.offsets.BusinessHour().rollforward(pd.Timestamp(“2014-08-02 15:00”))
Out[224]: Timestamp(‘2014-08-04 09:00:00’)

It is the same as BusinessHour().apply(pd.Timestamp(‘2014-08-01 17:00’)).

And it is the same as BusinessHour().apply(pd.Timestamp(‘2014-08-04 09:00’))

In [225]: pd.offsets.BusinessHour().apply(pd.Timestamp(“2014-08-02 15:00”))
Out[225]: Timestamp(‘2014-08-04 10:00:00’)

BusinessDay results (for reference)

In [226]: pd.offsets.BusinessHour().rollforward(pd.Timestamp(“2014-08-02”))
Out[226]: Timestamp(‘2014-08-04 09:00:00’)

It is the same as BusinessDay().apply(pd.Timestamp(‘2014-08-01’))

The result is the same as rollworward because BusinessDay never overlap.

In [227]: pd.offsets.BusinessHour().apply(pd.Timestamp(“2014-08-02”))
Out[227]: Timestamp(‘2014-08-04 10:00:00’)
BusinessHour regards Saturday and Sunday as holidays. To use arbitrary holidays, you can use CustomBusinessHour offset, as explained in the following subsection.

我们把星期六和星期日当作假期。要使用任意节假日,可以使用 CustomBusinessHour 偏移量,如下小节所述。

Custom business hour 定制营业时间
The CustomBusinessHour is a mixture of BusinessHour and CustomBusinessDay which allows you to specify arbitrary holidays. CustomBusinessHour works as the same as BusinessHour except that it skips specified custom holidays.

CustomBusinessHour 是 BusinessHour 和 CustomBusinessDay 的混合体,允许您指定任意节假日。CustomBusinessHour 的工作原理与 BusinessHour 相同,只是它会跳过特定的定制假日。

In [228]: from pandas.tseries.holiday import USFederalHolidayCalendar

In [229]: bhour_us = pd.offsets.CustomBusinessHour(calendar=USFederalHolidayCalendar())

Friday before MLK Day

In [230]: dt = datetime.datetime(2014, 1, 17, 15)

In [231]: dt + bhour_us
Out[231]: Timestamp(‘2014-01-17 16:00:00’)

Tuesday after MLK Day (Monday is skipped because it’s a holiday)

In [232]: dt + bhour_us * 2
Out[232]: Timestamp(‘2014-01-21 09:00:00’)
You can use keyword arguments supported by either BusinessHour and CustomBusinessDay.

可以使用 BusinessHour 和 CustomBusinessDay 支持的关键字参数。

In [233]: bhour_mon = pd.offsets.CustomBusinessHour(start=“10:00”, weekmask=“Tue Wed Thu Fri”)

Monday is skipped because it’s a holiday, business hour starts from 10:00

In [234]: dt + bhour_mon * 2
Out[234]: Timestamp(‘2014-01-21 10:00:00’)
Offset aliases 偏移别名
A number of string aliases are given to useful common time series frequencies. We will refer to these aliases as offset aliases.

给出了一些有用的常用时间序列频率的字符串别名。我们将这些别名称为偏移别名。

Alias

别名

Description

描述

B

business day frequency

工作日频率

C

custom business day frequency

定制工作日频率

D

calendar day frequency

日历日频率

W

weekly frequency

周频率

M

month end frequency

月末频率

SM

semi-month end frequency (15th and end of month)

每半个月结束一次(每月15日及每月底)

BM

骨髓基因

business month end frequency

业务月末频率

CBM

煤层气

custom business month end frequency

定制业务月底频率

MS

质谱仪

month start frequency

月启动频率

SMS

短信

semi-month start frequency (1st and 15th)

半个月起始频率(第1和第15)

BMS

房屋建筑管理系统

business month start frequency

营业月启动频率

CBMS

海洋环境监测系统

custom business month start frequency

定制营业月起始频率

Q

quarter end frequency

四分之一结束频率

BQ

生物多样性

business quarter end frequency

业务季度结束频率

QS

质量分数

quarter start frequency

四分之一启动频率

BQS

英国国家标准协会

business quarter start frequency

营业季开始频率

A, Y

A,y

year end frequency

年终频率

BA, BY

文学士,作者

business year end frequency

业务年终频率

AS, YS

美国航空航天局

year start frequency

年启动频率

BAS, BYS

巴斯,北京

business year start frequency

营业年度开始频率

BH

波黑

business hour frequency

营业时间频率

H

hourly frequency

逐时频率

T, min

T,min

minutely frequency

微小频率

S

secondly frequency

第二频率

L, ms

L,ms

milliseconds

毫秒

U, us

我们

microseconds

微秒

N

nanoseconds

毫微秒

Combining aliases 组合化名
As we have seen previously, the alias and the offset instance are fungible in most functions:

正如我们前面所看到的,别名和偏移量实例在大多数函数中是可替换的:

In [235]: pd.date_range(start, periods=5, freq=“B”)
Out[235]:
DatetimeIndex([‘2011-01-03’, ‘2011-01-04’, ‘2011-01-05’, ‘2011-01-06’,
‘2011-01-07’],
dtype=‘datetime64[ns]’, freq=‘B’)

In [236]: pd.date_range(start, periods=5, freq=pd.offsets.BDay())
Out[236]:
DatetimeIndex([‘2011-01-03’, ‘2011-01-04’, ‘2011-01-05’, ‘2011-01-06’,
‘2011-01-07’],
dtype=‘datetime64[ns]’, freq=‘B’)
You can combine together day and intraday offsets:

你可以将日内和日内补偿结合在一起:

In [237]: pd.date_range(start, periods=10, freq=“2h20min”)
Out[237]:
DatetimeIndex([‘2011-01-01 00:00:00’, ‘2011-01-01 02:20:00’,
‘2011-01-01 04:40:00’, ‘2011-01-01 07:00:00’,
‘2011-01-01 09:20:00’, ‘2011-01-01 11:40:00’,
‘2011-01-01 14:00:00’, ‘2011-01-01 16:20:00’,
‘2011-01-01 18:40:00’, ‘2011-01-01 21:00:00’],
dtype=‘datetime64[ns]’, freq=‘140T’)

In [238]: pd.date_range(start, periods=10, freq=“1D10U”)
Out[238]:
DatetimeIndex([ ‘2011-01-01 00:00:00’, ‘2011-01-02 00:00:00.000010’,
‘2011-01-03 00:00:00.000020’, ‘2011-01-04 00:00:00.000030’,
‘2011-01-05 00:00:00.000040’, ‘2011-01-06 00:00:00.000050’,
‘2011-01-07 00:00:00.000060’, ‘2011-01-08 00:00:00.000070’,
‘2011-01-09 00:00:00.000080’, ‘2011-01-10 00:00:00.000090’],
dtype=‘datetime64[ns]’, freq=‘86400000010U’)
Anchored offsets 锚定偏移
For some frequencies you can specify an anchoring suffix:

对于某些频率,你可以指定一个锚定后缀:

Alias

别名

Description

描述

W-SUN

西太阳报

weekly frequency (Sundays). Same as ‘W’

每星期的班次(星期日)。与“ w”相同

W-MON

西-蒙

weekly frequency (Mondays)

每星期(星期一)

W-TUE

美国威斯康星大学

weekly frequency (Tuesdays)

每周班次(星期二)

W-WED

weekly frequency (Wednesdays)

每周一班(星期三)

W-THU

星期四

weekly frequency (Thursdays)

(星期四)

W-FRI

世界粮食计划署

weekly frequency (Fridays)

每周班次(星期五)

W-SAT

波斯湾卫星

weekly frequency (Saturdays)

每周班次(星期六)

(B)Q(S)-DEC

(b) q (s)-十二月

quarterly frequency, year ends in December. Same as ‘Q’

每季一次,每年12月结束。与“ q”相同

(B)Q(S)-JAN

(b) q (s)-1月

quarterly frequency, year ends in January

每季度一次,每年一月结束

(B)Q(S)-FEB

(b) q (s)-2月

quarterly frequency, year ends in February

每季一次,每年二月结束

(B)Q(S)-MAR

(b) q (s)-MAR

quarterly frequency, year ends in March

每季度一次,每年三月结束

(B)Q(S)-APR

(b)问(s)-4月

quarterly frequency, year ends in April

每季一次,每年四月结束

(B)Q(S)-MAY

(b) q (s)-5月

quarterly frequency, year ends in May

每季度一次,每年五月结束

(B)Q(S)-JUN

(b) q (s)-JUN

quarterly frequency, year ends in June

每季度一次,每年六月结束

(B)Q(S)-JUL

(b) q (s)-7月

quarterly frequency, year ends in July

每季一次,每年七月结束

(B)Q(S)-AUG

(b)问(s)-八月

quarterly frequency, year ends in August

每季度一次,每年八月结束

(B)Q(S)-SEP

(b) q (s)-SEP

quarterly frequency, year ends in September

每季一次,每年九月结束

(B)Q(S)-OCT

(b) q (s)-OCT

quarterly frequency, year ends in October

每季一次,每年10月结束

(B)Q(S)-NOV

(b) q (s)-11月

quarterly frequency, year ends in November

每季度一次,每年11月结束

(B)A(S)-DEC

(b) a (s)-十二月

annual frequency, anchored end of December. Same as ‘A’

年频率,锚定于十二月底。与“ a”相同

(B)A(S)-JAN

(b) a (s)-1月

annual frequency, anchored end of January

年频率,锚定在一月底

(B)A(S)-FEB

(b) a (s)-2月

annual frequency, anchored end of February

年频率,锚定在二月底

(B)A(S)-MAR

(b) a (s)-MAR

annual frequency, anchored end of March

年频率,锚定在三月底

(B)A(S)-APR

(b) a (s)-4月

annual frequency, anchored end of April

年频率,锚定在四月底

(B)A(S)-MAY

(b) a (s)-5月

annual frequency, anchored end of May

年频率,锚定在五月底

(B)A(S)-JUN

(b) a (s)-6月

annual frequency, anchored end of June

年频率,锚定在六月底

(B)A(S)-JUL

(b) a (s)-7月

annual frequency, anchored end of July

年频率,锚定在7月底

(B)A(S)-AUG

(b) a (s)-8月

annual frequency, anchored end of August

年频率,锚定在八月底

(B)A(S)-SEP

(b) a (s)-SEP

annual frequency, anchored end of September

年频率,锚定在九月底

(B)A(S)-OCT

(b) a (s)-华侨城

annual frequency, anchored end of October

年频率,锚定在10月底

(B)A(S)-NOV

(b) a (s)-11月

annual frequency, anchored end of November

年频率,锚定在11月底

These can be used as arguments to date_range, bdate_range, constructors for DatetimeIndex, as well as various other timeseries-related functions in pandas.

这些参数可以用作 date _ range、 bdate _ range、 datetimindex 的构造函数以及熊猫中其他各种与时间相关的函数的参数。

Anchored offset semantics 锚定的偏移语义
For those offsets that are anchored to the start or end of specific frequency (MonthEnd, MonthBegin, WeekEnd, etc), the following rules apply to rolling forward and backwards.

对于那些固定在特定频率(MonthEnd、 MonthBegin、 WeekEnd 等)开始或结束的偏移量,以下规则适用于向前和向后滚动。

When n is not 0, if the given date is not on an anchor point, it snapped to the next(previous) anchor point, and moved |n|-1 additional steps forwards or backwards.

当 n 不是0时,如果给定的日期不在定位点上,它就会突然跳到下一个(上一个)定位点,并移动 | n |-1前后。

In [239]: pd.Timestamp(“2014-01-02”) + pd.offsets.MonthBegin(n=1)
Out[239]: Timestamp(‘2014-02-01 00:00:00’)

In [240]: pd.Timestamp(“2014-01-02”) + pd.offsets.MonthEnd(n=1)
Out[240]: Timestamp(‘2014-01-31 00:00:00’)

In [241]: pd.Timestamp(“2014-01-02”) - pd.offsets.MonthBegin(n=1)
Out[241]: Timestamp(‘2014-01-01 00:00:00’)

In [242]: pd.Timestamp(“2014-01-02”) - pd.offsets.MonthEnd(n=1)
Out[242]: Timestamp(‘2013-12-31 00:00:00’)

In [243]: pd.Timestamp(“2014-01-02”) + pd.offsets.MonthBegin(n=4)
Out[243]: Timestamp(‘2014-05-01 00:00:00’)

In [244]: pd.Timestamp(“2014-01-02”) - pd.offsets.MonthBegin(n=4)
Out[244]: Timestamp(‘2013-10-01 00:00:00’)
If the given date is on an anchor point, it is moved |n| points forwards or backwards.

如果给定的日期是在一个锚点上,它会被移动 | n | 前后。

In [245]: pd.Timestamp(“2014-01-01”) + pd.offsets.MonthBegin(n=1)
Out[245]: Timestamp(‘2014-02-01 00:00:00’)

In [246]: pd.Timestamp(“2014-01-31”) + pd.offsets.MonthEnd(n=1)
Out[246]: Timestamp(‘2014-02-28 00:00:00’)

In [247]: pd.Timestamp(“2014-01-01”) - pd.offsets.MonthBegin(n=1)
Out[247]: Timestamp(‘2013-12-01 00:00:00’)

In [248]: pd.Timestamp(“2014-01-31”) - pd.offsets.MonthEnd(n=1)
Out[248]: Timestamp(‘2013-12-31 00:00:00’)

In [249]: pd.Timestamp(“2014-01-01”) + pd.offsets.MonthBegin(n=4)
Out[249]: Timestamp(‘2014-05-01 00:00:00’)

In [250]: pd.Timestamp(“2014-01-31”) - pd.offsets.MonthBegin(n=4)
Out[250]: Timestamp(‘2013-10-01 00:00:00’)
For the case when n=0, the date is not moved if on an anchor point, otherwise it is rolled forward to the next anchor point.

对于 n = 0的情况,如果在定位点上,日期不会移动,否则会前滚到下一个定位点。

In [251]: pd.Timestamp(“2014-01-02”) + pd.offsets.MonthBegin(n=0)
Out[251]: Timestamp(‘2014-02-01 00:00:00’)

In [252]: pd.Timestamp(“2014-01-02”) + pd.offsets.MonthEnd(n=0)
Out[252]: Timestamp(‘2014-01-31 00:00:00’)

In [253]: pd.Timestamp(“2014-01-01”) + pd.offsets.MonthBegin(n=0)
Out[253]: Timestamp(‘2014-01-01 00:00:00’)

In [254]: pd.Timestamp(“2014-01-31”) + pd.offsets.MonthEnd(n=0)
Out[254]: Timestamp(‘2014-01-31 00:00:00’)
Holidays / holiday calendars 假日/假日日历
Holidays and calendars provide a simple way to define holiday rules to be used with CustomBusinessDay or in other analysis that requires a predefined set of holidays. The AbstractHolidayCalendar class provides all the necessary methods to return a list of holidays and only rules need to be defined in a specific holiday calendar class. Furthermore, the start_date and end_date class attributes determine over what date range holidays are generated. These should be overwritten on the AbstractHolidayCalendar class to have the range apply to all calendar subclasses. USFederalHolidayCalendar is the only calendar that exists and primarily serves as an example for developing other calendars.

假日和日历提供了一种简单的方法来定义假日规则,以便与 CustomBusinessDay 或其他需要预定义假日集的分析一起使用。Abstractoldaycalendar 类提供了返回假日列表的所有必要方法,并且只有规则需要在特定的假日日历类中定义。此外,start _ date 和 end _ date 类属性决定了在什么日期范围内生成假日。这些应该在 abstractholiday 类上重写,以使范围适用于所有的 calendar 子类。USFederalHolidayCalendar 是唯一存在的日历,主要作为开发其他日历的示例。

For holidays that occur on fixed dates (e.g., US Memorial Day or July 4th) an observance rule determines when that holiday is observed if it falls on a weekend or some other non-observed day. Defined observance rules are:

对于发生在固定日期的假日(例如,美国阵亡将士纪念日或7月4日) ,遵守规则确定该假日是否属于周末或其他未观察日。明确的遵守规则是:

Rule

规则

Description

描述

nearest_workday

最近的工作日

move Saturday to Friday and Sunday to Monday

把星期六改到星期五,星期日改到星期一

sunday_to_monday

星期天到星期一

move Sunday to following Monday

将星期日改为下星期一

next_monday_or_tuesday

下周一或者周二

move Saturday to Monday and Sunday/Monday to Tuesday

星期六改为星期一,星期日/星期一改为星期二

previous_friday

前一个星期五

move Saturday and Sunday to previous Friday”

将星期六和星期日改为上星期五”

next_monday

下周一

move Saturday and Sunday to following Monday

把星期六和星期日改到下星期一

An example of how holidays and holiday calendars are defined:

假日和假日日历是如何定义的:

In [255]: from pandas.tseries.holiday import (
…: Holiday,
…: USMemorialDay,
…: AbstractHolidayCalendar,
…: nearest_workday,
…: MO,
…: )
…:

In [256]: class ExampleCalendar(AbstractHolidayCalendar):
…: rules = [
…: USMemorialDay,
…: Holiday(“July 4th”, month=7, day=4, observance=nearest_workday),
…: Holiday(
…: “Columbus Day”,
…: month=10,
…: day=1,
…: offset=pd.DateOffset(weekday=MO(2)),
…: ),
…: ]
…:

In [257]: cal = ExampleCalendar()

In [258]: cal.holidays(datetime.datetime(2012, 1, 1), datetime.datetime(2012, 12, 31))
Out[258]: DatetimeIndex([‘2012-05-28’, ‘2012-07-04’, ‘2012-10-08’], dtype=‘datetime64[ns]’, freq=None)
hint 暗示
weekday=MO(2) is same as 2 * Week(weekday=2)

Weekday = MO (2)等于2 * Week (weekday = 2)

Using this calendar, creating an index or doing offset arithmetic skips weekends and holidays (i.e., Memorial Day/July 4th). For example, the below defines a custom business day offset using the ExampleCalendar. Like any other offset, it can be used to create a DatetimeIndex or added to datetime or Timestamp objects.

使用这个日历,创建一个索引或做偏移算术跳过周末和节假日(例如,阵亡将士纪念日/7月4日)。例如,下面使用 ExampleCalendar 定义了自定义的业务日偏移量。与任何其他偏移量一样,它可以用于创建 DatetimeIndex 或添加到 datetime 或 Timestamp 对象。

In [259]: pd.date_range(
…: start=“7/1/2012”, end=“7/10/2012”, freq=pd.offsets.CDay(calendar=cal)
…: ).to_pydatetime()
…:
Out[259]:
array([datetime.datetime(2012, 7, 2, 0, 0),
datetime.datetime(2012, 7, 3, 0, 0),
datetime.datetime(2012, 7, 5, 0, 0),
datetime.datetime(2012, 7, 6, 0, 0),
datetime.datetime(2012, 7, 9, 0, 0),
datetime.datetime(2012, 7, 10, 0, 0)], dtype=object)

In [260]: offset = pd.offsets.CustomBusinessDay(calendar=cal)

In [261]: datetime.datetime(2012, 5, 25) + offset
Out[261]: Timestamp(‘2012-05-29 00:00:00’)

In [262]: datetime.datetime(2012, 7, 3) + offset
Out[262]: Timestamp(‘2012-07-05 00:00:00’)

In [263]: datetime.datetime(2012, 7, 3) + 2 * offset
Out[263]: Timestamp(‘2012-07-06 00:00:00’)

In [264]: datetime.datetime(2012, 7, 6) + offset
Out[264]: Timestamp(‘2012-07-09 00:00:00’)
Ranges are defined by the start_date and end_date class attributes of AbstractHolidayCalendar. The defaults are shown below.

范围是由 AbstractHolidayCalendar 的 start _ date 和 end _ date 类属性定义的。

In [265]: AbstractHolidayCalendar.start_date
Out[265]: Timestamp(‘1970-01-01 00:00:00’)

In [266]: AbstractHolidayCalendar.end_date
Out[266]: Timestamp(‘2200-12-31 00:00:00’)
These dates can be overwritten by setting the attributes as datetime/Timestamp/string.

通过将属性设置为 datetime/Timestamp/string,可以覆盖这些日期。

In [267]: AbstractHolidayCalendar.start_date = datetime.datetime(2012, 1, 1)

In [268]: AbstractHolidayCalendar.end_date = datetime.datetime(2012, 12, 31)

In [269]: cal.holidays()
Out[269]: DatetimeIndex([‘2012-05-28’, ‘2012-07-04’, ‘2012-10-08’], dtype=‘datetime64[ns]’, freq=None)
Every calendar class is accessible by name using the get_calendar function which returns a holiday class instance. Any imported calendar class will automatically be available by this function. Also, HolidayCalendarFactory provides an easy interface to create calendars that are combinations of calendars or calendars with additional rules.

每个 calendar 类都可以使用 get_calendar 函数通过名称访问,该函数返回一个 holiday 类实例。任何导入的日历类都将通过这个函数自动可用。此外,HolidayCalendarFactory 提供了一个简单的界面来创建日历,这些日历是日历或日历与其他规则的组合。

In [270]: from pandas.tseries.holiday import get_calendar, HolidayCalendarFactory, USLaborDay

In [271]: cal = get_calendar(“ExampleCalendar”)

In [272]: cal.rules
Out[272]:
[Holiday: Memorial Day (month=5, day=31, offset=<DateOffset: weekday=MO(-1)>),
Holiday: July 4th (month=7, day=4, observance=<function nearest_workday at 0x7fc1a60de0d0>),
Holiday: Columbus Day (month=10, day=1, offset=<DateOffset: weekday=MO(+2)>)]

In [273]: new_cal = HolidayCalendarFactory(“NewExampleCalendar”, cal, USLaborDay)

In [274]: new_cal.rules
Out[274]:
[Holiday: Labor Day (month=9, day=1, offset=<DateOffset: weekday=MO(+1)>),
Holiday: Memorial Day (month=5, day=31, offset=<DateOffset: weekday=MO(-1)>),
Holiday: July 4th (month=7, day=4, observance=<function nearest_workday at 0x7fc1a60de0d0>),
Holiday: Columbus Day (month=10, day=1, offset=<DateOffset: weekday=MO(+2)>)]
Time series-related instance methods 与时间序列相关的实例方法
Shifting / lagging 移位/滞后
One may want to shift or lag the values in a time series back and forward in time. The method for this is shift(), which is available on all of the pandas objects.

人们可能希望在一个时间序列中向前或向后移动数值或使数值滞后。这个方法是 shift () ,它可用于所有熊猫对象。

In [275]: ts = pd.Series(range(len(rng)), index=rng)

In [276]: ts = ts[:5]

In [277]: ts.shift(1)
Out[277]:
2012-01-01 NaN
2012-01-02 0.0
2012-01-03 1.0
Freq: D, dtype: float64
The shift method accepts an freq argument which can accept a DateOffset class or other timedelta-like object or also an offset alias.

Shift 方法接受 freq 参数,该参数可以接受 DateOffset 类或其他类似 timedelta 的对象,也可以接受偏移别名。

When freq is specified, shift method changes all the dates in the index rather than changing the alignment of the data and the index:

当指定 freq 时,shift 方法更改索引中的所有日期,而不是更改数据和索引的对齐方式:

In [278]: ts.shift(5, freq=“D”)
Out[278]:
2012-01-06 0
2012-01-07 1
2012-01-08 2
Freq: D, dtype: int64

In [279]: ts.shift(5, freq=pd.offsets.BDay())
Out[279]:
2012-01-06 0
2012-01-09 1
2012-01-10 2
dtype: int64

In [280]: ts.shift(5, freq=“BM”)
Out[280]:
2012-05-31 0
2012-05-31 1
2012-05-31 2
dtype: int64
Note that with when freq is specified, the leading entry is no longer NaN because the data is not being realigned.

注意,当指定 freq 时,前导条目不再是 NaN,因为数据没有重新排列。

Frequency conversion 频率转换
The primary function for changing frequencies is the asfreq() method. For a DatetimeIndex, this is basically just a thin, but convenient wrapper around reindex() which generates a date_range and calls reindex.

改变频率的主要功能是 asfreq ()方法。对于 DatetimeIndex,这基本上只是围绕 reindex ()的一个简单但方便的包装器,reindex ()生成 date _ range 并调用 reindex。

In [281]: dr = pd.date_range(“1/1/2010”, periods=3, freq=3 * pd.offsets.BDay())

In [282]: ts = pd.Series(np.random.randn(3), index=dr)

In [283]: ts
Out[283]:
2010-01-01 1.494522
2010-01-06 -0.778425
2010-01-11 -0.253355
Freq: 3B, dtype: float64

In [284]: ts.asfreq(pd.offsets.BDay())
Out[284]:
2010-01-01 1.494522
2010-01-04 NaN
2010-01-05 NaN
2010-01-06 -0.778425
2010-01-07 NaN
2010-01-08 NaN
2010-01-11 -0.253355
Freq: B, dtype: float64
asfreq provides a further convenience so you can specify an interpolation method for any gaps that may appear after the frequency conversion.

Asfreq 提供了进一步的方便,所以您可以指定一个插值方法,任何差距,可能出现后的频率转换。

In [285]: ts.asfreq(pd.offsets.BDay(), method=“pad”)
Out[285]:
2010-01-01 1.494522
2010-01-04 1.494522
2010-01-05 1.494522
2010-01-06 -0.778425
2010-01-07 -0.778425
2010-01-08 -0.778425
2010-01-11 -0.253355
Freq: B, dtype: float64
Filling forward / backward 向前/向后填充
Related to asfreq and reindex is fillna(), which is documented in the missing data section.

与 asfreq 和 reindex 相关的是 fillna () ,它在缺失的数据部分中有文档说明。

Converting to Python datetimes 转换为 Python 日期时间
DatetimeIndex can be converted to an array of Python native datetime.datetime objects using the to_pydatetime method.

可以使用 to _ pydatetime 方法将 DatetimeIndex 转换为 Python 原生 datetim.datetime 对象数组。

Resampling
pandas has a simple, powerful, and efficient functionality for performing resampling operations during frequency conversion (e.g., converting secondly data into 5-minutely data). This is extremely common in, but not limited to, financial applications.

熊猫具有一个简单、强大和高效的功能,可以在频率转换期间执行重采样操作(例如,将第二次数据转换为5分钟的数据)。这在金融应用中非常普遍,但不仅限于此。

resample() is a time-based groupby, followed by a reduction method on each of its groups. See some cookbook examples for some advanced strategies.

Resample ()是一个基于时间的分组,其次是每个分组的约简方法。查看一些烹饪书的例子,了解一些高级策略。

The resample() method can be used directly from DataFrameGroupBy objects, see the groupby docs.

Resample ()方法可以直接从 datatramegroupby 对象中使用,请参见 groupby 文档。

Basics 基本知识
In [286]: rng = pd.date_range(“1/1/2012”, periods=100, freq=“S”)

In [287]: ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)

In [288]: ts.resample(“5Min”).sum()
Out[288]:
2012-01-01 25103
Freq: 5T, dtype: int64
The resample function is very flexible and allows you to specify many different parameters to control the frequency conversion and resampling operation.

重采样功能是非常灵活的,并允许您指定许多不同的参数,以控制频率转换和重采样操作。

Any function available via dispatching is available as a method of the returned object, including sum, mean, std, sem, max, min, median, first, last, ohlc:

通过分派可用的任何函数都可以作为返回对象的方法,包括 sum,mean,std,sem,max,min,median,first,last,ohlc:

In [289]: ts.resample(“5Min”).mean()
Out[289]:
2012-01-01 251.03
Freq: 5T, dtype: float64

In [290]: ts.resample(“5Min”).ohlc()
Out[290]:
open high low close
2012-01-01 308 460 9 205

In [291]: ts.resample(“5Min”).max()
Out[291]:
2012-01-01 460
Freq: 5T, dtype: int64
For downsampling, closed can be set to ‘left’ or ‘right’ to specify which end of the interval is closed:

对于下采样,可以将关闭设置为“左”或“右”,以指定关闭间隔的哪一端:

In [292]: ts.resample(“5Min”, closed=“right”).mean()
Out[292]:
2011-12-31 23:55:00 308.000000
2012-01-01 00:00:00 250.454545
Freq: 5T, dtype: float64

In [293]: ts.resample(“5Min”, closed=“left”).mean()
Out[293]:
2012-01-01 251.03
Freq: 5T, dtype: float64
Parameters like label are used to manipulate the resulting labels. label specifies whether the result is labeled with the beginning or the end of the interval.

像 label 这样的参数用于操作生成的标签。Label 指定结果标记为间隔的开始还是结束。

In [294]: ts.resample(“5Min”).mean() # by default label=‘left’
Out[294]:
2012-01-01 251.03
Freq: 5T, dtype: float64

In [295]: ts.resample(“5Min”, label=“left”).mean()
Out[295]:
2012-01-01 251.03
Freq: 5T, dtype: float64
Warning

警告

The default values for label and closed is ‘left’ for all frequency offsets except for ‘M’, ‘A’, ‘Q’, ‘BM’, ‘BA’, ‘BQ’, and ‘W’ which all have a default of ‘right’.

对于所有频率偏移量,label 和 closed 的默认值都是‘ left’,除了‘ m’、‘ a’、‘ q’、‘ BM’、‘ BA’、‘ BQ’和‘ w’,这些频率偏移量的默认值都是‘ right’。

This might unintendedly lead to looking ahead, where the value for a later time is pulled back to a previous time as in the following example with the BusinessDay frequency:

这可能会无意识地导致向前看,其中的值为以后的时间被拉回到以前的时间,如下面的例子与商业日的频率:

In [296]: s = pd.date_range(“2000-01-01”, “2000-01-05”).to_series()

In [297]: s.iloc[2] = pd.NaT

In [298]: s.dt.day_name()
Out[298]:
2000-01-01 Saturday
2000-01-02 Sunday
2000-01-03 NaN
2000-01-04 Tuesday
2000-01-05 Wednesday
Freq: D, dtype: object

default: label=‘left’, closed=‘left’

In [299]: s.resample(“B”).last().dt.day_name()
Out[299]:
1999-12-31 Sunday
2000-01-03 NaN
2000-01-04 Tuesday
2000-01-05 Wednesday
Freq: B, dtype: object
Notice how the value for Sunday got pulled back to the previous Friday. To get the behavior where the value for Sunday is pushed to Monday, use instead

注意,Sunday 的值是如何被拉回到前一个星期五的。若要获取将周日值推到周一的行为,请使用

In [300]: s.resample(“B”, label=“right”, closed=“right”).last().dt.day_name()
Out[300]:
2000-01-03 Sunday
2000-01-04 Tuesday
2000-01-05 Wednesday
Freq: B, dtype: object
The axis parameter can be set to 0 or 1 and allows you to resample the specified axis for a DataFrame.

Axis 参数可以设置为0或1,并允许您为 DataFrame 重新取样指定的轴。

kind can be set to ‘timestamp’ or ‘period’ to convert the resulting index to/from timestamp and time span representations. By default resample retains the input representation.

类型可以设置为“ timestamp”或“ period”,以将结果索引转换为/来自 timestamp 和 time span 表示形式。默认情况下,resample 保留输入表示形式。

convention can be set to ‘start’ or ‘end’ when resampling period data (detail below). It specifies how low frequency periods are converted to higher frequency periods.

当重采样周期数据时,可以将约定设置为“开始”或“结束”(详见下文)。它指定如何将低频周期转换为高频周期。

Upsampling 增加抽样
For upsampling, you can specify a way to upsample and the limit parameter to interpolate over the gaps that are created:

对于上采样,你可以指定一种上采样的方法和限制参数来插值创建的间隙:

from secondly to every 250 milliseconds

In [301]: ts[:2].resample(“250L”).asfreq()
Out[301]:
2012-01-01 00:00:00.000 308.0
2012-01-01 00:00:00.250 NaN
2012-01-01 00:00:00.500 NaN
2012-01-01 00:00:00.750 NaN
2012-01-01 00:00:01.000 204.0
Freq: 250L, dtype: float64

In [302]: ts[:2].resample(“250L”).ffill()
Out[302]:
2012-01-01 00:00:00.000 308
2012-01-01 00:00:00.250 308
2012-01-01 00:00:00.500 308
2012-01-01 00:00:00.750 308
2012-01-01 00:00:01.000 204
Freq: 250L, dtype: int64

In [303]: ts[:2].resample(“250L”).ffill(limit=2)
Out[303]:
2012-01-01 00:00:00.000 308.0
2012-01-01 00:00:00.250 308.0
2012-01-01 00:00:00.500 308.0
2012-01-01 00:00:00.750 NaN
2012-01-01 00:00:01.000 204.0
Freq: 250L, dtype: float64
Sparse resampling 稀疏重采样
Sparse timeseries are the ones where you have a lot fewer points relative to the amount of time you are looking to resample. Naively upsampling a sparse series can potentially generate lots of intermediate values. When you don’t want to use a method to fill these values, e.g. fill_method is None, then intermediate values will be filled with NaN.

稀疏时间是那些你有很多点相对于时间量少,你正在寻找重采样。天真地对稀疏序列进行上行采样可能会生成大量的中间值。当您不想使用方法来填充这些值时,例如 fill _ method is None,那么中间值将被 NaN 填充。

Since resample is a time-based groupby, the following is a method to efficiently resample only the groups that are not all NaN.

由于重采样是一个基于时间的群组,因此下面的方法只能有效地重采样非全部为 NaN 的群组。

In [304]: rng = pd.date_range(“2014-1-1”, periods=100, freq=“D”) + pd.Timedelta(“1s”)

In [305]: ts = pd.Series(range(100), index=rng)
If we want to resample to the full range of the series:

如果我们想对这个系列的全部内容进行重采样:

In [306]: ts.resample(“3T”).sum()
Out[306]:
2014-01-01 00:00:00 0
2014-01-01 00:03:00 0
2014-01-01 00:06:00 0
2014-01-01 00:09:00 0
2014-01-01 00:12:00 0

2014-04-09 23:48:00 0
2014-04-09 23:51:00 0
2014-04-09 23:54:00 0
2014-04-09 23:57:00 0
2014-04-10 00:00:00 99
Freq: 3T, Length: 47521, dtype: int64
We can instead only resample those groups where we have points as follows:

相反,我们只能对那些我们有以下要点的群体进行重新取样:

In [307]: from functools import partial

In [308]: from pandas.tseries.frequencies import to_offset

In [309]: def round(t, freq):
…: freq = to_offset(freq)
…: return pd.Timestamp((t.value // freq.delta.value) * freq.delta.value)
…:

In [310]: ts.groupby(partial(round, freq=“3T”)).sum()
Out[310]:
2014-01-01 0
2014-01-02 1
2014-01-03 2
2014-01-04 3
2014-01-05 4

2014-04-06 95
2014-04-07 96
2014-04-08 97
2014-04-09 98
2014-04-10 99
Length: 100, dtype: int64
Aggregation 聚合
Similar to the aggregating API, groupby API, and the window API, a Resampler can be selectively resampled.

与聚合 API、 groupby API 和窗口 API 类似,重新取样器可以选择性地重新取样。

Resampling a DataFrame, the default will be to act on all columns with the same function.

重新插入一个 DataFrame,默认情况下使用相同的函数对所有列执行操作。

In [311]: df = pd.DataFrame(
…: np.random.randn(1000, 3),
…: index=pd.date_range(“1/1/2012”, freq=“S”, periods=1000),
…: columns=[“A”, “B”, “C”],
…: )
…:

In [312]: r = df.resample(“3T”)

In [313]: r.mean()
Out[313]:
A B C
2012-01-01 00:00:00 -0.033823 -0.121514 -0.081447
2012-01-01 00:03:00 0.056909 0.146731 -0.024320
2012-01-01 00:06:00 -0.058837 0.047046 -0.052021
2012-01-01 00:09:00 0.063123 -0.026158 -0.066533
2012-01-01 00:12:00 0.186340 -0.003144 0.074752
2012-01-01 00:15:00 -0.085954 -0.016287 -0.050046
We can select a specific column or columns using standard getitem.

我们可以使用标准格式选择一个或多个特定的列。

In [314]: r[“A”].mean()
Out[314]:
2012-01-01 00:00:00 -0.033823
2012-01-01 00:03:00 0.056909
2012-01-01 00:06:00 -0.058837
2012-01-01 00:09:00 0.063123
2012-01-01 00:12:00 0.186340
2012-01-01 00:15:00 -0.085954
Freq: 3T, Name: A, dtype: float64

In [315]: r[[“A”, “B”]].mean()
Out[315]:
A B
2012-01-01 00:00:00 -0.033823 -0.121514
2012-01-01 00:03:00 0.056909 0.146731
2012-01-01 00:06:00 -0.058837 0.047046
2012-01-01 00:09:00 0.063123 -0.026158
2012-01-01 00:12:00 0.186340 -0.003144
2012-01-01 00:15:00 -0.085954 -0.016287
You can pass a list or dict of functions to do aggregation with, outputting a DataFrame:

你可以传递一个列表或者 dict 的函数来进行聚合,输出一个 DataFrame:

In [316]: r[“A”].agg([np.sum, np.mean, np.std])
Out[316]:
sum mean std
2012-01-01 00:00:00 -6.088060 -0.033823 1.043263
2012-01-01 00:03:00 10.243678 0.056909 1.058534
2012-01-01 00:06:00 -10.590584 -0.058837 0.949264
2012-01-01 00:09:00 11.362228 0.063123 1.028096
2012-01-01 00:12:00 33.541257 0.186340 0.884586
2012-01-01 00:15:00 -8.595393 -0.085954 1.035476
On a resampled DataFrame, you can pass a list of functions to apply to each column, which produces an aggregated result with a hierarchical index:

在一个重新被赋值的 datatrame 上,你可以传递一个函数列表来应用到每个列,这个列表会产生一个带有层次索引的聚合结果:

In [317]: r.agg([np.sum, np.mean])
Out[317]:
A B C
sum mean sum mean sum mean
2012-01-01 00:00:00 -6.088060 -0.033823 -21.872530 -0.121514 -14.660515 -0.081447
2012-01-01 00:03:00 10.243678 0.056909 26.411633 0.146731 -4.377642 -0.024320
2012-01-01 00:06:00 -10.590584 -0.058837 8.468289 0.047046 -9.363825 -0.052021
2012-01-01 00:09:00 11.362228 0.063123 -4.708526 -0.026158 -11.975895 -0.066533
2012-01-01 00:12:00 33.541257 0.186340 -0.565895 -0.003144 13.455299 0.074752
2012-01-01 00:15:00 -8.595393 -0.085954 -1.628689 -0.016287 -5.004580 -0.050046
By passing a dict to aggregate you can apply a different aggregation to the columns of a DataFrame:

通过传递 dict 来聚合,您可以对 DataFrame 的列应用不同的聚合:

In [318]: r.agg({“A”: np.sum, “B”: lambda x: np.std(x, ddof=1)})
Out[318]:
A B
2012-01-01 00:00:00 -6.088060 1.001294
2012-01-01 00:03:00 10.243678 1.074597
2012-01-01 00:06:00 -10.590584 0.987309
2012-01-01 00:09:00 11.362228 0.944953
2012-01-01 00:12:00 33.541257 1.095025
2012-01-01 00:15:00 -8.595393 1.035312
The function names can also be strings. In order for a string to be valid it must be implemented on the resampled object:

函数名也可以是字符串。为了使字符串有效,它必须在重写对象上实现:

In [319]: r.agg({“A”: “sum”, “B”: “std”})
Out[319]:
A B
2012-01-01 00:00:00 -6.088060 1.001294
2012-01-01 00:03:00 10.243678 1.074597
2012-01-01 00:06:00 -10.590584 0.987309
2012-01-01 00:09:00 11.362228 0.944953
2012-01-01 00:12:00 33.541257 1.095025
2012-01-01 00:15:00 -8.595393 1.035312
Furthermore, you can also specify multiple aggregation functions for each column separately.

此外,还可以为每个列分别指定多个聚合函数。

In [320]: r.agg({“A”: [“sum”, “std”], “B”: [“mean”, “std”]})
Out[320]:
A B
sum std mean std
2012-01-01 00:00:00 -6.088060 1.043263 -0.121514 1.001294
2012-01-01 00:03:00 10.243678 1.058534 0.146731 1.074597
2012-01-01 00:06:00 -10.590584 0.949264 0.047046 0.987309
2012-01-01 00:09:00 11.362228 1.028096 -0.026158 0.944953
2012-01-01 00:12:00 33.541257 0.884586 -0.003144 1.095025
2012-01-01 00:15:00 -8.595393 1.035476 -0.016287 1.035312
If a DataFrame does not have a datetimelike index, but instead you want to resample based on datetimelike column in the frame, it can passed to the on keyword.

如果一个 DataFrame 没有类似日期的索引,但是你想要基于框架中的类似日期的列重采样,它可以传递给 on 关键字。

In [321]: df = pd.DataFrame(
…: {“date”: pd.date_range(“2015-01-01”, freq=“W”, periods=5), “a”: np.arange(5)},
…: index=pd.MultiIndex.from_arrays(
…: [[1, 2, 3, 4, 5], pd.date_range(“2015-01-01”, freq=“W”, periods=5)],
…: names=[“v”, “d”],
…: ),
…: )
…:

In [322]: df
Out[322]:
date a
v d
1 2015-01-04 2015-01-04 0
2 2015-01-11 2015-01-11 1
3 2015-01-18 2015-01-18 2
4 2015-01-25 2015-01-25 3
5 2015-02-01 2015-02-01 4

In [323]: df.resample(“M”, on=“date”).sum()
Out[323]:
a
date
2015-01-31 6
2015-02-28 4
Similarly, if you instead want to resample by a datetimelike level of MultiIndex, its name or location can be passed to the level keyword.

类似地,如果您想要通过 MultiIndex 的一个日期级别重采样,那么它的名称或位置可以传递给 level 关键字。

In [324]: df.resample(“M”, level=“d”).sum()
Out[324]:
a
d
2015-01-31 6
2015-02-28 4
Iterating through groups 通过组迭代
With the Resampler object in hand, iterating through the grouped data is very natural and functions similarly to itertools.groupby():

有了 Resampler 对象,遍历分组数据非常自然,其功能与 itertools.groupby ()类似:

In [325]: small = pd.Series(
…: range(6),
…: index=pd.to_datetime(
…: [
…: “2017-01-01T00:00:00”,
…: “2017-01-01T00:30:00”,
…: “2017-01-01T00:31:00”,
…: “2017-01-01T01:00:00”,
…: “2017-01-01T03:00:00”,
…: “2017-01-01T03:05:00”,
…: ]
…: ),
…: )
…:

In [326]: resampled = small.resample(“H”)

In [327]: for name, group in resampled:
…: print(“Group: “, name)
…: print(”-” * 27)
…: print(group, end="\n\n")
…:
Group: 2017-01-01 00:00:00

2017-01-01 00:00:00 0
2017-01-01 00:30:00 1
2017-01-01 00:31:00 2
dtype: int64

Group: 2017-01-01 01:00:00

2017-01-01 01:00:00 3
dtype: int64

Group: 2017-01-01 02:00:00

Series([], dtype: int64)

Group: 2017-01-01 03:00:00

2017-01-01 03:00:00 4
2017-01-01 03:05:00 5
dtype: int64
See Iterating through groups or Resampler.iter for more.

更多信息请参阅迭代分组或重新编译器。

Use 使用origin or 或offset to adjust the start of the bins 调整垃圾桶的开头
New in version 1.1.0.

新版本1.1.0。

The bins of the grouping are adjusted based on the beginning of the day of the time series starting point. This works well with frequencies that are multiples of a day (like 30D) or that divide a day evenly (like 90s or 1min). This can create inconsistencies with some frequencies that do not meet this criteria. To change this behavior you can specify a fixed Timestamp with the argument origin.

根据时间序列起始点的日期开始调整分组的回收箱。对于一天的倍数(如30D)或者平均分配一天(如90秒或1分钟)的频率,这种方法很有效。这可能会造成不一致的一些频率,不符合这个标准。若要更改此行为,可以使用参数原点指定一个固定的 Timestamp。

For example:

例如:

In [328]: start, end = “2000-10-01 23:30:00”, “2000-10-02 00:30:00”

In [329]: middle = “2000-10-02 00:00:00”

In [330]: rng = pd.date_range(start, end, freq=“7min”)

In [331]: ts = pd.Series(np.arange(len(rng)) * 3, index=rng)

In [332]: ts
Out[332]:
2000-10-01 23:30:00 0
2000-10-01 23:37:00 3
2000-10-01 23:44:00 6
2000-10-01 23:51:00 9
2000-10-01 23:58:00 12
2000-10-02 00:05:00 15
2000-10-02 00:12:00 18
2000-10-02 00:19:00 21
2000-10-02 00:26:00 24
Freq: 7T, dtype: int64
Here we can see that, when using origin with its default value (‘start_day’), the result after ‘2000-10-02 00:00:00’ are not identical depending on the start of time series:

在这里我们可以看到,当使用 origin 的默认值(‘ start _ day’)时,根据时间序列的开始,‘2000-10-0200:00:00’之后的结果并不完全相同:

In [333]: ts.resample(“17min”, origin=“start_day”).sum()
Out[333]:
2000-10-01 23:14:00 0
2000-10-01 23:31:00 9
2000-10-01 23:48:00 21
2000-10-02 00:05:00 54
2000-10-02 00:22:00 24
Freq: 17T, dtype: int64

In [334]: ts[middle:end].resample(“17min”, origin=“start_day”).sum()
Out[334]:
2000-10-02 00:00:00 33
2000-10-02 00:17:00 45
Freq: 17T, dtype: int64
Here we can see that, when setting origin to ‘epoch’, the result after ‘2000-10-02 00:00:00’ are identical depending on the start of time series:

这里我们可以看到,当把 origin 设置为 epoch 时,2000-10-0200:00:00后的结果是相同的,这取决于时间序列的开始:

In [335]: ts.resample(“17min”, origin=“epoch”).sum()
Out[335]:
2000-10-01 23:18:00 0
2000-10-01 23:35:00 18
2000-10-01 23:52:00 27
2000-10-02 00:09:00 39
2000-10-02 00:26:00 24
Freq: 17T, dtype: int64

In [336]: ts[middle:end].resample(“17min”, origin=“epoch”).sum()
Out[336]:
2000-10-01 23:52:00 15
2000-10-02 00:09:00 39
2000-10-02 00:26:00 24
Freq: 17T, dtype: int64
If needed you can use a custom timestamp for origin:

如果需要,你可以使用一个自定义的原产地时间戳:

In [337]: ts.resample(“17min”, origin=“2001-01-01”).sum()
Out[337]:
2000-10-01 23:30:00 9
2000-10-01 23:47:00 21
2000-10-02 00:04:00 54
2000-10-02 00:21:00 24
Freq: 17T, dtype: int64

In [338]: ts[middle:end].resample(“17min”, origin=pd.Timestamp(“2001-01-01”)).sum()
Out[338]:
2000-10-02 00:04:00 54
2000-10-02 00:21:00 24
Freq: 17T, dtype: int64
If needed you can just adjust the bins with an offset Timedelta that would be added to the default origin. Those two examples are equivalent for this time series:

如果需要,你可以只调整箱偏移时间增量,将被添加到默认的原点。这两个例子在这个时间序列中是等价的:

In [339]: ts.resample(“17min”, origin=“start”).sum()
Out[339]:
2000-10-01 23:30:00 9
2000-10-01 23:47:00 21
2000-10-02 00:04:00 54
2000-10-02 00:21:00 24
Freq: 17T, dtype: int64

In [340]: ts.resample(“17min”, offset=“23h30min”).sum()
Out[340]:
2000-10-01 23:30:00 9
2000-10-01 23:47:00 21
2000-10-02 00:04:00 54
2000-10-02 00:21:00 24
Freq: 17T, dtype: int64
Note the use of ‘start’ for origin on the last example. In that case, origin will be set to the first value of the timeseries.

注意在最后一个例子中对 origin 使用了‘ start’。在这种情况下,origin 将被设置为时间序列的第一个值。

Time span representation 时间跨度表示
Regular intervals of time are represented by Period objects in pandas while sequences of Period objects are collected in a PeriodIndex, which can be created with the convenience function period_range.

熊猫中的定期对象表示规则的时间间隔,定期对象序列表示在 PeriodIndex 中,可以用方便的函数 Period _ 范围创建。

Period 句号
A Period represents a span of time (e.g., a day, a month, a quarter, etc). You can specify the span via freq keyword using a frequency alias like below. Because freq represents a span of Period, it cannot be negative like “-3D”.

Period 表示一段时间(例如,一天、一个月、一个季度等)。您可以使用如下所示的频率别名通过 freq 关键字指定 span。因为 freq 表示一个周期的跨度,所以它不能像“-3D”那样是负的。

In [341]: pd.Period(“2012”, freq=“A-DEC”)
Out[341]: Period(‘2012’, ‘A-DEC’)

In [342]: pd.Period(“2012-1-1”, freq=“D”)
Out[342]: Period(‘2012-01-01’, ‘D’)

In [343]: pd.Period(“2012-1-1 19:00”, freq=“H”)
Out[343]: Period(‘2012-01-01 19:00’, ‘H’)

In [344]: pd.Period(“2012-1-1 19:00”, freq=“5H”)
Out[344]: Period(‘2012-01-01 19:00’, ‘5H’)
Adding and subtracting integers from periods shifts the period by its own frequency. Arithmetic is not allowed between Period with different freq (span).

从周期中加入和减去整数,可以通过周期本身的频率来改变周期。不允许在具有不同频率(跨度)的周期之间进行算术运算。

In [345]: p = pd.Period(“2012”, freq=“A-DEC”)

In [346]: p + 1
Out[346]: Period(‘2013’, ‘A-DEC’)

In [347]: p - 3
Out[347]: Period(‘2009’, ‘A-DEC’)

In [348]: p = pd.Period(“2012-01”, freq=“2M”)

In [349]: p + 2
Out[349]: Period(‘2012-05’, ‘2M’)

In [350]: p - 1
Out[350]: Period(‘2011-11’, ‘2M’)

In [351]: p == pd.Period(“2012-01”, freq=“3M”)

IncompatibleFrequency Traceback (most recent call last)
in
----> 1 p == pd.Period(“2012-01”, freq=“3M”)

/pandas/pandas/_libs/tslibs/period.pyx in pandas._libs.tslibs.period._Period.richcmp()

IncompatibleFrequency: Input has different freq=3M from Period(freq=2M)
If Period freq is daily or higher (D, H, T, S, L, U, N), offsets and timedelta-like can be added if the result can have the same freq. Otherwise, ValueError will be raised.

如果周期频率是每日或更高(d,h,t,s,l,u,n) ,偏移量和时间差可以加上,如果结果可以有相同的频率。否则,将引发 ValueError。

In [352]: p = pd.Period(“2014-07-01 09:00”, freq=“H”)

In [353]: p + pd.offsets.Hour(2)
Out[353]: Period(‘2014-07-01 11:00’, ‘H’)

In [354]: p + datetime.timedelta(minutes=120)
Out[354]: Period(‘2014-07-01 11:00’, ‘H’)

In [355]: p + np.timedelta64(7200, “s”)
Out[355]: Period(‘2014-07-01 11:00’, ‘H’)
In [1]: p + pd.offsets.Minute(5)
Traceback

ValueError: Input has different freq from Period(freq=H)
If Period has other frequencies, only the same offsets can be added. Otherwise, ValueError will be raised.

如果周期有其他频率,则只能添加相同的偏移量。否则,将引发 ValueError。

In [356]: p = pd.Period(“2014-07”, freq=“M”)

In [357]: p + pd.offsets.MonthEnd(3)
Out[357]: Period(‘2014-10’, ‘M’)
In [1]: p + pd.offsets.MonthBegin(3)
Traceback

ValueError: Input has different freq from Period(freq=M)
Taking the difference of Period instances with the same frequency will return the number of frequency units between them:

使用相同频率的周期实例的差值将返回它们之间的频率单位数:

In [358]: pd.Period(“2012”, freq=“A-DEC”) - pd.Period(“2002”, freq=“A-DEC”)
Out[358]: <10 * YearEnds: month=12>
PeriodIndex and period_range 周期指数和周期范围
Regular sequences of Period objects can be collected in a PeriodIndex, which can be constructed using the period_range convenience function:

周期对象的规则序列可以在 PeriodIndex 中收集,它可以使用 Period _ range 便利函数构造:

In [359]: prng = pd.period_range(“1/1/2011”, “1/1/2012”, freq=“M”)

In [360]: prng
Out[360]:
PeriodIndex([‘2011-01’, ‘2011-02’, ‘2011-03’, ‘2011-04’, ‘2011-05’, ‘2011-06’,
‘2011-07’, ‘2011-08’, ‘2011-09’, ‘2011-10’, ‘2011-11’, ‘2011-12’,
‘2012-01’],
dtype=‘period[M]’, freq=‘M’)
The PeriodIndex constructor can also be used directly:

构造函数也可以直接使用:

In [361]: pd.PeriodIndex([“2011-1”, “2011-2”, “2011-3”], freq=“M”)
Out[361]: PeriodIndex([‘2011-01’, ‘2011-02’, ‘2011-03’], dtype=‘period[M]’, freq=‘M’)
Passing multiplied frequency outputs a sequence of Period which has multiplied span.

传递乘频率输出一个已乘跨度的周期序列。

In [362]: pd.period_range(start=“2014-01”, freq=“3M”, periods=4)
Out[362]: PeriodIndex([‘2014-01’, ‘2014-04’, ‘2014-07’, ‘2014-10’], dtype=‘period[3M]’, freq=‘3M’)
If start or end are Period objects, they will be used as anchor endpoints for a PeriodIndex with frequency matching that of the PeriodIndex constructor.

如果开始或结束是 Period 对象,它们将被用作 PeriodIndex 的锚定端点,其频率与 PeriodIndex 构造函数的频率相匹配。

In [363]: pd.period_range(
…: start=pd.Period(“2017Q1”, freq=“Q”), end=pd.Period(“2017Q2”, freq=“Q”), freq=“M”
…: )
…:
Out[363]: PeriodIndex([‘2017-03’, ‘2017-04’, ‘2017-05’, ‘2017-06’], dtype=‘period[M]’, freq=‘M’)
Just like DatetimeIndex, a PeriodIndex can also be used to index pandas objects:

就像 DatetimeIndex 一样,PeriodIndex 也可以用来索引熊猫对象:

In [364]: ps = pd.Series(np.random.randn(len(prng)), prng)

In [365]: ps
Out[365]:
2011-01 -2.916901
2011-02 0.514474
2011-03 1.346470
2011-04 0.816397
2011-05 2.258648
2011-06 0.494789
2011-07 0.301239
2011-08 0.464776
2011-09 -1.393581
2011-10 0.056780
2011-11 0.197035
2011-12 2.261385
2012-01 -0.329583
Freq: M, dtype: float64
PeriodIndex supports addition and subtraction with the same rule as Period.

PeriodIndex 支持与句号相同的加法和减法规则。

In [366]: idx = pd.period_range(“2014-07-01 09:00”, periods=5, freq=“H”)

In [367]: idx
Out[367]:
PeriodIndex([‘2014-07-01 09:00’, ‘2014-07-01 10:00’, ‘2014-07-01 11:00’,
‘2014-07-01 12:00’, ‘2014-07-01 13:00’],
dtype=‘period[H]’, freq=‘H’)

In [368]: idx + pd.offsets.Hour(2)
Out[368]:
PeriodIndex([‘2014-07-01 11:00’, ‘2014-07-01 12:00’, ‘2014-07-01 13:00’,
‘2014-07-01 14:00’, ‘2014-07-01 15:00’],
dtype=‘period[H]’, freq=‘H’)

In [369]: idx = pd.period_range(“2014-07”, periods=5, freq=“M”)

In [370]: idx
Out[370]: PeriodIndex([‘2014-07’, ‘2014-08’, ‘2014-09’, ‘2014-10’, ‘2014-11’], dtype=‘period[M]’, freq=‘M’)

In [371]: idx + pd.offsets.MonthEnd(3)
Out[371]: PeriodIndex([‘2014-10’, ‘2014-11’, ‘2014-12’, ‘2015-01’, ‘2015-02’], dtype=‘period[M]’, freq=‘M’)
PeriodIndex has its own dtype named period, refer to Period Dtypes.

PeriodIndex 有自己的 d 类型,名为 Period,参考 Period Dtypes。

Period dtypes 周期 d 类型
PeriodIndex has a custom period dtype. This is a pandas extension dtype similar to the timezone aware dtype (datetime64[ns, tz]).

Periodina 有一个自定义的 period dtype。这是一个熊猫扩展 dtype,类似于时区感知的 dtype (datetime64[ ns,tz ])。

The period dtype holds the freq attribute and is represented with period[freq] like period[D] or period[M], using frequency strings.

句点 dtype 保存了 freq 属性,并用句点[ freq ]类似的句点[ d ]或句点[ m ]表示,使用频率字符串。

In [372]: pi = pd.period_range(“2016-01-01”, periods=3, freq=“M”)

In [373]: pi
Out[373]: PeriodIndex([‘2016-01’, ‘2016-02’, ‘2016-03’], dtype=‘period[M]’, freq=‘M’)

In [374]: pi.dtype
Out[374]: period[M]
The period dtype can be used in .astype(…). It allows one to change the freq of a PeriodIndex like .asfreq() and convert a DatetimeIndex to PeriodIndex like to_period():

周期 dtype 可以用在。Astype (…).它允许一个人改变一个 PeriodIndex 的频率像。Asfreq ()并将 DatetimeIndex 转换为 PeriodIndex like to _ period () :

change monthly freq to daily freq

In [375]: pi.astype(“period[D]”)
Out[375]: PeriodIndex([‘2016-01-31’, ‘2016-02-29’, ‘2016-03-31’], dtype=‘period[D]’, freq=‘D’)

convert to DatetimeIndex

In [376]: pi.astype(“datetime64[ns]”)
Out[376]: DatetimeIndex([‘2016-01-01’, ‘2016-02-01’, ‘2016-03-01’], dtype=‘datetime64[ns]’, freq=‘MS’)

convert to PeriodIndex

In [377]: dti = pd.date_range(“2011-01-01”, freq=“M”, periods=3)

In [378]: dti
Out[378]: DatetimeIndex([‘2011-01-31’, ‘2011-02-28’, ‘2011-03-31’], dtype=‘datetime64[ns]’, freq=‘M’)

In [379]: dti.astype(“period[M]”)
Out[379]: PeriodIndex([‘2011-01’, ‘2011-02’, ‘2011-03’], dtype=‘period[M]’, freq=‘M’)
PeriodIndex partial string indexing PeriodIndex 部分字符串索引
PeriodIndex now supports partial string slicing with non-monotonic indexes.

PeriodIndex 现在支持带有非单调索引的部分字符串切片。

New in version 1.1.0.

新版本1.1.0。

You can pass in dates and strings to Series and DataFrame with PeriodIndex, in the same manner as DatetimeIndex. For details, refer to DatetimeIndex Partial String Indexing.

您可以使用 PeriodIndex 将日期和字符串传递给 Series 和 DataFrame,方式与 DatetimeIndex 相同。有关详细信息,请参阅 DatetimeIndex 部分字符串索引。

In [380]: ps[“2011-01”]
Out[380]: -2.9169013294054507

In [381]: ps[datetime.datetime(2011, 12, 25):]
Out[381]:
2011-12 2.261385
2012-01 -0.329583
Freq: M, dtype: float64

In [382]: ps[“10/31/2011”:“12/31/2011”]
Out[382]:
2011-10 0.056780
2011-11 0.197035
2011-12 2.261385
Freq: M, dtype: float64
Passing a string representing a lower frequency than PeriodIndex returns partial sliced data.

传递一个表示比 PeriodIndex 更低频率的字符串将返回部分分片的数据。

In [383]: ps[“2011”]
Out[383]:
2011-01 -2.916901
2011-02 0.514474
2011-03 1.346470
2011-04 0.816397
2011-05 2.258648
2011-06 0.494789
2011-07 0.301239
2011-08 0.464776
2011-09 -1.393581
2011-10 0.056780
2011-11 0.197035
2011-12 2.261385
Freq: M, dtype: float64

In [384]: dfp = pd.DataFrame(
…: np.random.randn(600, 1),
…: columns=[“A”],
…: index=pd.period_range(“2013-01-01 9:00”, periods=600, freq=“T”),
…: )
…:

In [385]: dfp
Out[385]:
A
2013-01-01 09:00 -0.538468
2013-01-01 09:01 -1.365819
2013-01-01 09:02 -0.969051
2013-01-01 09:03 -0.331152
2013-01-01 09:04 -0.245334
… …
2013-01-01 18:55 0.522460
2013-01-01 18:56 0.118710
2013-01-01 18:57 0.167517
2013-01-01 18:58 0.922883
2013-01-01 18:59 1.721104

[600 rows x 1 columns]

In [386]: dfp.loc[“2013-01-01 10H”]
Out[386]:
A
2013-01-01 10:00 -0.308975
2013-01-01 10:01 0.542520
2013-01-01 10:02 1.061068
2013-01-01 10:03 0.754005
2013-01-01 10:04 0.352933
… …
2013-01-01 10:55 -0.865621
2013-01-01 10:56 -1.167818
2013-01-01 10:57 -2.081748
2013-01-01 10:58 -0.527146
2013-01-01 10:59 0.802298

[60 rows x 1 columns]
As with DatetimeIndex, the endpoints will be included in the result. The example below slices data starting from 10:00 to 11:59.

与 DatetimeIndex 一样,终结点将包含在结果中。下面的示例从10:00到11:59对数据进行分片。

In [387]: dfp[“2013-01-01 10H”:“2013-01-01 11H”]
Out[387]:
A
2013-01-01 10:00 -0.308975
2013-01-01 10:01 0.542520
2013-01-01 10:02 1.061068
2013-01-01 10:03 0.754005
2013-01-01 10:04 0.352933
… …
2013-01-01 11:55 -0.590204
2013-01-01 11:56 1.539990
2013-01-01 11:57 -1.224826
2013-01-01 11:58 0.578798
2013-01-01 11:59 -0.685496

[120 rows x 1 columns]
Frequency conversion and resampling with PeriodIndex 基于 PeriodIndex 的变频和重采样技术
The frequency of Period and PeriodIndex can be converted via the asfreq method. Let’s start with the fiscal year 2011, ending in December:

周期和周期指数的频率可以用 asfreq 方法进行转换。让我们从截至12月的2011财年开始:

In [388]: p = pd.Period(“2011”, freq=“A-DEC”)

In [389]: p
Out[389]: Period(‘2011’, ‘A-DEC’)
We can convert it to a monthly frequency. Using the how parameter, we can specify whether to return the starting or ending month:

我们可以把它转换成每月的频率。使用 how 参数,我们可以指定是返回开始月还是结束月:

In [390]: p.asfreq(“M”, how=“start”)
Out[390]: Period(‘2011-01’, ‘M’)

In [391]: p.asfreq(“M”, how=“end”)
Out[391]: Period(‘2011-12’, ‘M’)
The shorthands ‘s’ and ‘e’ are provided for convenience:

为了方便起见,我们提供了“ s”和“ e”:

In [392]: p.asfreq(“M”, “s”)
Out[392]: Period(‘2011-01’, ‘M’)

In [393]: p.asfreq(“M”, “e”)
Out[393]: Period(‘2011-12’, ‘M’)
Converting to a “super-period” (e.g., annual frequency is a super-period of quarterly frequency) automatically returns the super-period that includes the input period:

转换为“超级周期”(例如,年度频率是一个超级周期的季度频率)自动返回包括输入周期的超级周期:

In [394]: p = pd.Period(“2011-12”, freq=“M”)

In [395]: p.asfreq(“A-NOV”)
Out[395]: Period(‘2012’, ‘A-NOV’)
Note that since we converted to an annual frequency that ends the year in November, the monthly period of December 2011 is actually in the 2012 A-NOV period.

请注意,由于我们转换为年度频率,结束的一年在11月,每月的期间2011年12月实际上是在2012年 A-NOV 期间。

Period conversions with anchored frequencies are particularly useful for working with various quarterly data common to economics, business, and other fields. Many organizations define quarters relative to the month in which their fiscal year starts and ends. Thus, first quarter of 2011 could start in 2010 or a few months into 2011. Via anchored frequencies, pandas works for all quarterly frequencies Q-JAN through Q-DEC.

使用锚定频率的周期转换对于处理经济、商业和其他领域常见的各种季度数据特别有用。许多组织根据其财政年度开始和结束的月份来定义季度。因此,2011年的第一季度可能会在2010年或者2011年后的几个月开始。通过锚定频率,大熊猫工程所有季度频率 Q-JAN 至 Q-DEC。

Q-DEC define regular calendar quarters:

Q-DEC 定义了定期的日历季度:

In [396]: p = pd.Period(“2012Q1”, freq=“Q-DEC”)

In [397]: p.asfreq(“D”, “s”)
Out[397]: Period(‘2012-01-01’, ‘D’)

In [398]: p.asfreq(“D”, “e”)
Out[398]: Period(‘2012-03-31’, ‘D’)
Q-MAR defines fiscal year end in March:

Q-MAR 对三月份结束的财政年度的定义是:

In [399]: p = pd.Period(“2011Q4”, freq=“Q-MAR”)

In [400]: p.asfreq(“D”, “s”)
Out[400]: Period(‘2011-01-01’, ‘D’)

In [401]: p.asfreq(“D”, “e”)
Out[401]: Period(‘2011-03-31’, ‘D’)
Converting between representations 表示之间的转换
Timestamped data can be converted to PeriodIndex-ed data using to_period and vice-versa using to_timestamp:

时间戳数据可以使用 _ period 转换为 PeriodIndex-ed 数据,反之亦然,使用 _ timestamp:

In [402]: rng = pd.date_range(“1/1/2012”, periods=5, freq=“M”)

In [403]: ts = pd.Series(np.random.randn(len(rng)), index=rng)

In [404]: ts
Out[404]:
2012-01-31 1.931253
2012-02-29 -0.184594
2012-03-31 0.249656
2012-04-30 -0.978151
2012-05-31 -0.873389
Freq: M, dtype: float64

In [405]: ps = ts.to_period()

In [406]: ps
Out[406]:
2012-01 1.931253
2012-02 -0.184594
2012-03 0.249656
2012-04 -0.978151
2012-05 -0.873389
Freq: M, dtype: float64

In [407]: ps.to_timestamp()
Out[407]:
2012-01-01 1.931253
2012-02-01 -0.184594
2012-03-01 0.249656
2012-04-01 -0.978151
2012-05-01 -0.873389
Freq: MS, dtype: float64
Remember that ‘s’ and ‘e’ can be used to return the timestamps at the start or end of the period:

请记住,“ s”和“ e”可以用于返回周期开始或结束时的时间戳:

In [408]: ps.to_timestamp(“D”, how=“s”)
Out[408]:
2012-01-01 1.931253
2012-02-01 -0.184594
2012-03-01 0.249656
2012-04-01 -0.978151
2012-05-01 -0.873389
Freq: MS, dtype: float64
Converting between period and timestamp enables some convenient arithmetic functions to be used. In the following example, we convert a quarterly frequency with year ending in November to 9am of the end of the month following the quarter end:

在周期和时间戳之间进行转换,可以使用一些方便的算术函数。在下面的例子中,我们把截至11月底的一年每季转换为季末后每月上午9时的频率:

In [409]: prng = pd.period_range(“1990Q1”, “2000Q4”, freq=“Q-NOV”)

In [410]: ts = pd.Series(np.random.randn(len(prng)), prng)

In [411]: ts.index = (prng.asfreq(“M”, “e”) + 1).asfreq(“H”, “s”) + 9

In [412]: ts.head()
Out[412]:
1990-03-01 09:00 -0.109291
1990-06-01 09:00 -0.637235
1990-09-01 09:00 -1.735925
1990-12-01 09:00 2.096946
1991-03-01 09:00 -1.039926
Freq: H, dtype: float64
Representing out-of-bounds spans 表示界外范围
If you have data that is outside of the Timestamp bounds, see Timestamp limitations, then you can use a PeriodIndex and/or Series of Periods to do computations.

如果您的数据不在时间戳限制范围内,请参阅时间戳限制,那么您可以使用 perioddindex 和/或一系列周期来进行计算。

In [413]: span = pd.period_range(“1215-01-01”, “1381-01-01”, freq=“D”)

In [414]: span
Out[414]:
PeriodIndex([‘1215-01-01’, ‘1215-01-02’, ‘1215-01-03’, ‘1215-01-04’,
‘1215-01-05’, ‘1215-01-06’, ‘1215-01-07’, ‘1215-01-08’,
‘1215-01-09’, ‘1215-01-10’,

‘1380-12-23’, ‘1380-12-24’, ‘1380-12-25’, ‘1380-12-26’,
‘1380-12-27’, ‘1380-12-28’, ‘1380-12-29’, ‘1380-12-30’,
‘1380-12-31’, ‘1381-01-01’],
dtype=‘period[D]’, length=60632, freq=‘D’)
To convert from an int64 based YYYYMMDD representation.

从基于 int64的 yyymmdd 表示转换。

In [415]: s = pd.Series([20121231, 20141130, 99991231])

In [416]: s
Out[416]:
0 20121231
1 20141130
2 99991231
dtype: int64

In [417]: def conv(x):
…: return pd.Period(year=x // 10000, month=x // 100 % 100, day=x % 100, freq=“D”)
…:

In [418]: s.apply(conv)
Out[418]:
0 2012-12-31
1 2014-11-30
2 9999-12-31
dtype: period[D]

In [419]: s.apply(conv)[2]
Out[419]: Period(‘9999-12-31’, ‘D’)
These can easily be converted to a PeriodIndex:

这些可以很容易地转换为一个 PeriodIndex:

In [420]: span = pd.PeriodIndex(s.apply(conv))

In [421]: span
Out[421]: PeriodIndex([‘2012-12-31’, ‘2014-11-30’, ‘9999-12-31’], dtype=‘period[D]’, freq=‘D’)
Time zone handling 时区处理
pandas provides rich support for working with timestamps in different time zones using the pytz and dateutil libraries or datetime.timezone objects from the standard library.

Pandas 提供了丰富的支持,可以使用标准库中的 pytz 和 dateutil 库或 datetime.timezone 对象处理不同时区的时间戳。

Working with time zones 与时区打交道
By default, pandas objects are time zone unaware:

默认情况下,熊猫对象是不知道时区的:

In [422]: rng = pd.date_range(“3/6/2012 00:00”, periods=15, freq=“D”)

In [423]: rng.tz is None
Out[423]: True
To localize these dates to a time zone (assign a particular time zone to a naive date), you can use the tz_localize method or the tz keyword argument in date_range(), Timestamp, or DatetimeIndex. You can either pass pytz or dateutil time zone objects or Olson time zone database strings. Olson time zone strings will return pytz time zone objects by default. To return dateutil time zone objects, append dateutil/ before the string.

要将这些日期定位到一个时区(将特定的时区分配给一个天真的日期) ,可以使用 date _ range ()、 Timestamp 或 DatetimeIndex 中的 tz _ localize 方法或 tz 关键字参数。您可以传递 pytz 或 dateutil 时区对象或 Olson 时区数据库字符串。默认情况下,Olson 时区字符串将返回 pytz 时区对象。要返回 dateutil 时区对象,请在字符串之前附加 dateutil/。

In pytz you can find a list of common (and less common) time zones using from pytz import common_timezones, all_timezones.

在 pytz 中,你可以找到一个使用 pytz import common _ timezones,所有时区的常用(和不常用)时区列表。

dateutil uses the OS time zones so there isn’t a fixed list available. For common zones, the names are the same as pytz.

使用 OS 时区,所以没有固定的列表可用。对于公共区域,名称与 pytz 相同。

In [424]: import dateutil

pytz

In [425]: rng_pytz = pd.date_range(“3/6/2012 00:00”, periods=3, freq=“D”, tz=“Europe/London”)

In [426]: rng_pytz.tz
Out[426]: <DstTzInfo ‘Europe/London’ LMT-1 day, 23:59:00 STD>

dateutil

In [427]: rng_dateutil = pd.date_range(“3/6/2012 00:00”, periods=3, freq=“D”)

In [428]: rng_dateutil = rng_dateutil.tz_localize(“dateutil/Europe/London”)

In [429]: rng_dateutil.tz
Out[429]: tzfile(’/usr/share/zoneinfo/Europe/London’)

dateutil - utc special case

In [430]: rng_utc = pd.date_range(
…: “3/6/2012 00:00”,
…: periods=3,
…: freq=“D”,
…: tz=dateutil.tz.tzutc(),
…: )
…:

In [431]: rng_utc.tz
Out[431]: tzutc()
New in version 0.25.0.

新版本0.25.0。

datetime.timezone

In [432]: rng_utc = pd.date_range(
…: “3/6/2012 00:00”,
…: periods=3,
…: freq=“D”,
…: tz=datetime.timezone.utc,
…: )
…:

In [433]: rng_utc.tz
Out[433]: datetime.timezone.utc
Note that the UTC time zone is a special case in dateutil and should be constructed explicitly as an instance of dateutil.tz.tzutc. You can also construct other time zones objects explicitly first.

注意,UTC 时区是 dateutil 中的一个特例,应该显式地构造为 dateutil.tz.tzutc 的实例。您还可以首先显式地构造其他时区对象。

In [434]: import pytz

pytz

In [435]: tz_pytz = pytz.timezone(“Europe/London”)

In [436]: rng_pytz = pd.date_range(“3/6/2012 00:00”, periods=3, freq=“D”)

In [437]: rng_pytz = rng_pytz.tz_localize(tz_pytz)

In [438]: rng_pytz.tz == tz_pytz
Out[438]: True

dateutil

In [439]: tz_dateutil = dateutil.tz.gettz(“Europe/London”)

In [440]: rng_dateutil = pd.date_range(“3/6/2012 00:00”, periods=3, freq=“D”, tz=tz_dateutil)

In [441]: rng_dateutil.tz == tz_dateutil
Out[441]: True
To convert a time zone aware pandas object from one time zone to another, you can use the tz_convert method.

要将时区感知的 pandas 对象从一个时区转换为另一个时区,可以使用 tz _ convert 方法。

In [442]: rng_pytz.tz_convert(“US/Eastern”)
Out[442]:
DatetimeIndex([‘2012-03-05 19:00:00-05:00’, ‘2012-03-06 19:00:00-05:00’,
‘2012-03-07 19:00:00-05:00’],
dtype=‘datetime64[ns, US/Eastern]’, freq=None)
Note

注意

When using pytz time zones, DatetimeIndex will construct a different time zone object than a Timestamp for the same time zone input. A DatetimeIndex can hold a collection of Timestamp objects that may have different UTC offsets and cannot be succinctly represented by one pytz time zone instance while one Timestamp represents one point in time with a specific UTC offset.

在使用 pytz 时区时,对于相同的时区输入,DatetimeIndex 将构造不同于 Timestamp 的时区对象。DatetimeIndex 可以保存一组 Timestamp 对象,这些对象可能具有不同的 UTC 偏移量,不能由一个 pytz 时区实例简洁地表示,而一个 Timestamp 表示具有特定 UTC 偏移量的一个时间点。

In [443]: dti = pd.date_range(“2019-01-01”, periods=3, freq=“D”, tz=“US/Pacific”)

In [444]: dti.tz
Out[444]: <DstTzInfo ‘US/Pacific’ LMT-1 day, 16:07:00 STD>

In [445]: ts = pd.Timestamp(“2019-01-01”, tz=“US/Pacific”)

In [446]: ts.tz
Out[446]: <DstTzInfo ‘US/Pacific’ PST-1 day, 16:00:00 STD>
Warning

警告

Be wary of conversions between libraries. For some time zones, pytz and dateutil have different definitions of the zone. This is more of a problem for unusual time zones than for ‘standard’ zones like US/Eastern.

注意图书馆之间的转换。对于某些时区,pytz 和 dateutil 对该区有不同的定义。与美国/东部这样的“标准”地区相比,这种情况在非正常时区更为严重。

Warning

警告

Be aware that a time zone definition across versions of time zone libraries may not be considered equal. This may cause problems when working with stored data that is localized using one version and operated on with a different version. See here for how to handle such a situation.

请注意,跨时区库版本的时区定义可能并不相等。当使用一个版本进行本地化并使用另一个版本进行操作的存储数据时,这可能会导致问题。点击这里了解如何处理这种情况。

Warning

警告

For pytz time zones, it is incorrect to pass a time zone object directly into the datetime.datetime constructor (e.g., datetime.datetime(2011, 1, 1, tz=pytz.timezone(‘US/Eastern’)). Instead, the datetime needs to be localized using the localize method on the pytz time zone object.

对于 pytz 时区,直接将时区对象传递到 datetime.datetime 构造函数(例如,datetime.datetime (2011,1,1,tz = pytz.timezone (‘ US/Eastern’))是不正确的。相反,需要使用 pytz 时区对象上的 localize 方法对日期时间进行本地化。

Warning

警告

Be aware that for times in the future, correct conversion between time zones (and UTC) cannot be guaranteed by any time zone library because a timezone’s offset from UTC may be changed by the respective government.

请注意,任何时区库都不能保证将来时区(和 UTC)之间的正确转换,因为时区与 UTC 之间的时区偏移量可能由相应的政府更改。

Warning

警告

If you are using dates beyond 2038-01-18, due to current deficiencies in the underlying libraries caused by the year 2038 problem, daylight saving time (DST) adjustments to timezone aware dates will not be applied. If and when the underlying libraries are fixed, the DST transitions will be applied.

如果您使用的日期在2038-01-18之后,由于当前基础库的不足之处,由于2038年问题时区感知日期(夏时制)的调整将不适用。如果底层库是固定的,那么将应用 DST 转换。

For example, for two dates that are in British Summer Time (and so would normally be GMT+1), both the following asserts evaluate as true:

例如,对于英国夏令时中的两个日期(通常是 GMT + 1) ,下面两个断言的计算结果都为 true:

In [447]: d_2037 = “2037-03-31T010101”

In [448]: d_2038 = “2038-03-31T010101”

In [449]: DST = “Europe/London”

In [450]: assert pd.Timestamp(d_2037, tz=DST) != pd.Timestamp(d_2037, tz=“GMT”)

In [451]: assert pd.Timestamp(d_2038, tz=DST) == pd.Timestamp(d_2038, tz=“GMT”)
Under the hood, all timestamps are stored in UTC. Values from a time zone aware DatetimeIndex or Timestamp will have their fields (day, hour, minute, etc.) localized to the time zone. However, timestamps with the same UTC value are still considered to be equal even if they are in different time zones:

在引擎盖下,所有时间戳都存储在 UTC 中。来自时区感知的 DatetimeIndex 或 Timestamp 的值将其字段(日、小时、分钟等)本地化到时区。不过,即使位于不同时区,具有相同协调世界时值的时间戳仍被视为相等:

In [452]: rng_eastern = rng_utc.tz_convert(“US/Eastern”)

In [453]: rng_berlin = rng_utc.tz_convert(“Europe/Berlin”)

In [454]: rng_eastern[2]
Out[454]: Timestamp(‘2012-03-07 19:00:00-0500’, tz=‘US/Eastern’, freq=‘D’)

In [455]: rng_berlin[2]
Out[455]: Timestamp(‘2012-03-08 01:00:00+0100’, tz=‘Europe/Berlin’, freq=‘D’)

In [456]: rng_eastern[2] == rng_berlin[2]
Out[456]: True
Operations between Series in different time zones will yield UTC Series, aligning the data on the UTC timestamps:

在不同时区的系列之间的操作将产生 UTC 系列,对齐 UTC 时间戳上的数据:

In [457]: ts_utc = pd.Series(range(3), pd.date_range(“20130101”, periods=3, tz=“UTC”))

In [458]: eastern = ts_utc.tz_convert(“US/Eastern”)

In [459]: berlin = ts_utc.tz_convert(“Europe/Berlin”)

In [460]: result = eastern + berlin

In [461]: result
Out[461]:
2013-01-01 00:00:00+00:00 0
2013-01-02 00:00:00+00:00 2
2013-01-03 00:00:00+00:00 4
Freq: D, dtype: int64

In [462]: result.index
Out[462]:
DatetimeIndex([‘2013-01-01 00:00:00+00:00’, ‘2013-01-02 00:00:00+00:00’,
‘2013-01-03 00:00:00+00:00’],
dtype=‘datetime64[ns, UTC]’, freq=‘D’)
To remove time zone information, use tz_localize(None) or tz_convert(None). tz_localize(None) will remove the time zone yielding the local time representation. tz_convert(None) will remove the time zone after converting to UTC time.

若要删除时区信息,请使用 tz _ localize (None)或 tz _ convert (None)。Tz _ localize (None)将删除产生本地时间表示的时区。在转换到 UTC 时间后,tz _ convert (None)将删除时区。

In [463]: didx = pd.date_range(start=“2014-08-01 09:00”, freq=“H”, periods=3, tz=“US/Eastern”)

In [464]: didx
Out[464]:
DatetimeIndex([‘2014-08-01 09:00:00-04:00’, ‘2014-08-01 10:00:00-04:00’,
‘2014-08-01 11:00:00-04:00’],
dtype=‘datetime64[ns, US/Eastern]’, freq=‘H’)

In [465]: didx.tz_localize(None)
Out[465]:
DatetimeIndex([‘2014-08-01 09:00:00’, ‘2014-08-01 10:00:00’,
‘2014-08-01 11:00:00’],
dtype=‘datetime64[ns]’, freq=None)

In [466]: didx.tz_convert(None)
Out[466]:
DatetimeIndex([‘2014-08-01 13:00:00’, ‘2014-08-01 14:00:00’,
‘2014-08-01 15:00:00’],
dtype=‘datetime64[ns]’, freq=‘H’)

tz_convert(None) is identical to tz_convert(‘UTC’).tz_localize(None)

In [467]: didx.tz_convert(“UTC”).tz_localize(None)
Out[467]:
DatetimeIndex([‘2014-08-01 13:00:00’, ‘2014-08-01 14:00:00’,
‘2014-08-01 15:00:00’],
dtype=‘datetime64[ns]’, freq=None)
Fold 折叠
New in version 1.1.0.

新版本1.1.0。

For ambiguous times, pandas supports explicitly specifying the keyword-only fold argument. Due to daylight saving time, one wall clock time can occur twice when shifting from summer to winter time; fold describes whether the datetime-like corresponds to the first (0) or the second time (1) the wall clock hits the ambiguous time. Fold is supported only for constructing from naive datetime.datetime (see datetime documentation for details) or from Timestamp or for constructing from components (see below). Only dateutil timezones are supported (see dateutil documentation for dateutil methods that deal with ambiguous datetimes) as pytz timezones do not support fold (see pytz documentation for details on how pytz deals with ambiguous datetimes). To localize an ambiguous datetime with pytz, please use Timestamp.tz_localize(). In general, we recommend to rely on Timestamp.tz_localize() when localizing ambiguous datetimes if you need direct control over how they are handled.

对于不明确的时间,pandas 支持显式指定仅关键字的 fold 参数。由于夏时制,当从夏季到冬季时间转换时,一个挂钟时间可以出现两次; fold 描述的是类似于日期时间的时间是对应于第一次(0)还是第二次(1)挂钟到达模糊时间。Fold 仅支持从天真的 datetime.datetime (有关详细信息,请参阅 datetimdocumentation)或从 Timestamp 构造,或者从组件构造(请参阅下面的内容)。由于 pytz 时区不支持 fold (有关 pytz 如何处理不明确的日期时间的详细信息,请参阅 pytz 文档) ,因此只支持 dateutil 时区(参见 dateutil 文档,用于处理不明确的日期时间的 dateutil 方法)。要使用 pytz 定位一个不明确的日期时间,请使用 Timestamp.tz _ localize ()。一般来说,如果您需要直接控制如何处理模糊的日期时间,我们建议在本地化时使用 Timestamp.tz _ localize ()。

In [468]: pd.Timestamp(
…: datetime.datetime(2019, 10, 27, 1, 30, 0, 0),
…: tz=“dateutil/Europe/London”,
…: fold=0,
…: )
…:
Out[468]: Timestamp(‘2019-10-27 01:30:00+0100’, tz=‘dateutil//usr/share/zoneinfo/Europe/London’)

In [469]: pd.Timestamp(
…: year=2019,
…: month=10,
…: day=27,
…: hour=1,
…: minute=30,
…: tz=“dateutil/Europe/London”,
…: fold=1,
…: )
…:
Out[469]: Timestamp(‘2019-10-27 01:30:00+0000’, tz=‘dateutil//usr/share/zoneinfo/Europe/London’)
Ambiguous times when localizing 本地化时的模糊时间
tz_localize may not be able to determine the UTC offset of a timestamp because daylight savings time (DST) in a local time zone causes some times to occur twice within one day (“clocks fall back”). The following options are available:

Tz _ localize 可能无法确定时间戳的 UTC 偏移量,因为本地时区中的夏时制(DST)导致某些时间在一天内出现两次(“时钟回落”)。以下方案可供选择:

‘raise’: Raises a pytz.AmbiguousTimeError (the default behavior)

‘ raise’: 引发一个 pytz. condioustimeerror (默认行为)

‘infer’: Attempt to determine the correct offset base on the monotonicity of the timestamps

“推断”: 尝试根据时间戳的单调性来确定正确的偏移量

‘NaT’: Replaces ambiguous times with NaT

‘ NaT’: 用 NaT 替换模糊时间

bool: True represents a DST time, False represents non-DST time. An array-like of bool values is supported for a sequence of times.

Bool: True 表示 DST 时间,False 表示非 DST 时间。对于时间序列,支持类似于数组的 bool 值。

In [470]: rng_hourly = pd.DatetimeIndex(
…: [“11/06/2011 00:00”, “11/06/2011 01:00”, “11/06/2011 01:00”, “11/06/2011 02:00”]
…: )
…:
This will fail as there are ambiguous times (‘11/06/2011 01:00’)

这将失败,因为有模糊的时间(’11/06/201101:00’)

In [2]: rng_hourly.tz_localize(‘US/Eastern’)
AmbiguousTimeError: Cannot infer dst time from Timestamp(‘2011-11-06 01:00:00’), try using the ‘ambiguous’ argument
Handle these ambiguous times by specifying the following.

通过指定以下内容来处理这些不明确的时间。

In [471]: rng_hourly.tz_localize(“US/Eastern”, ambiguous=“infer”)
Out[471]:
DatetimeIndex([‘2011-11-06 00:00:00-04:00’, ‘2011-11-06 01:00:00-04:00’,
‘2011-11-06 01:00:00-05:00’, ‘2011-11-06 02:00:00-05:00’],
dtype=‘datetime64[ns, US/Eastern]’, freq=None)

In [472]: rng_hourly.tz_localize(“US/Eastern”, ambiguous=“NaT”)
Out[472]:
DatetimeIndex([‘2011-11-06 00:00:00-04:00’, ‘NaT’, ‘NaT’,
‘2011-11-06 02:00:00-05:00’],
dtype=‘datetime64[ns, US/Eastern]’, freq=None)

In [473]: rng_hourly.tz_localize(“US/Eastern”, ambiguous=[True, True, False, False])
Out[473]:
DatetimeIndex([‘2011-11-06 00:00:00-04:00’, ‘2011-11-06 01:00:00-04:00’,
‘2011-11-06 01:00:00-05:00’, ‘2011-11-06 02:00:00-05:00’],
dtype=‘datetime64[ns, US/Eastern]’, freq=None)
Nonexistent times when localizing 不存在本地化的时候
A DST transition may also shift the local time ahead by 1 hour creating nonexistent local times (“clocks spring forward”). The behavior of localizing a timeseries with nonexistent times can be controlled by the nonexistent argument. The following options are available:

DST 转换也可以将本地时间提前1小时,创建不存在的本地时间(“时钟弹簧向前”)。使用不存在的时间对时间进行本地化的行为可以由不存在的参数控制。以下方案可供选择:

‘raise’: Raises a pytz.NonExistentTimeError (the default behavior)

‘ raise’: 引发 pytz. NonExistentTimeError 错误(默认行为)

‘NaT’: Replaces nonexistent times with NaT

‘ NaT’: 用 NaT 替换不存在的时间

‘shift_forward’: Shifts nonexistent times forward to the closest real time

前移: 将不存在的时间前移到最接近的实时

‘shift_backward’: Shifts nonexistent times backward to the closest real time

向后移动: 将不存在的时间向后移动到最接近的实时

timedelta object: Shifts nonexistent times by the timedelta duration

Timedelta 对象: 根据 timedelta duration 调整不存在的时间

In [474]: dti = pd.date_range(start=“2015-03-29 02:30:00”, periods=3, freq=“H”)

2:30 is a nonexistent time

Localization of nonexistent times will raise an error by default.

默认情况下,不存在时间的本地化将引发错误。

In [2]: dti.tz_localize(‘Europe/Warsaw’)
NonExistentTimeError: 2015-03-29 02:30:00
Transform nonexistent times to NaT or shift the times.

将不存在的时间转换为 NaT 或者转换时间。

In [475]: dti
Out[475]:
DatetimeIndex([‘2015-03-29 02:30:00’, ‘2015-03-29 03:30:00’,
‘2015-03-29 04:30:00’],
dtype=‘datetime64[ns]’, freq=‘H’)

In [476]: dti.tz_localize(“Europe/Warsaw”, nonexistent=“shift_forward”)
Out[476]:
DatetimeIndex([‘2015-03-29 03:00:00+02:00’, ‘2015-03-29 03:30:00+02:00’,
‘2015-03-29 04:30:00+02:00’],
dtype=‘datetime64[ns, Europe/Warsaw]’, freq=None)

In [477]: dti.tz_localize(“Europe/Warsaw”, nonexistent=“shift_backward”)
Out[477]:
DatetimeIndex([‘2015-03-29 01:59:59.999999999+01:00’,
‘2015-03-29 03:30:00+02:00’,
‘2015-03-29 04:30:00+02:00’],
dtype=‘datetime64[ns, Europe/Warsaw]’, freq=None)

In [478]: dti.tz_localize(“Europe/Warsaw”, nonexistent=pd.Timedelta(1, unit=“H”))
Out[478]:
DatetimeIndex([‘2015-03-29 03:30:00+02:00’, ‘2015-03-29 03:30:00+02:00’,
‘2015-03-29 04:30:00+02:00’],
dtype=‘datetime64[ns, Europe/Warsaw]’, freq=None)

In [479]: dti.tz_localize(“Europe/Warsaw”, nonexistent=“NaT”)
Out[479]:
DatetimeIndex([‘NaT’, ‘2015-03-29 03:30:00+02:00’,
‘2015-03-29 04:30:00+02:00’],
dtype=‘datetime64[ns, Europe/Warsaw]’, freq=None)
Time zone series operations 时区序列操作
A Series with time zone naive values is represented with a dtype of datetime64[ns].

具有时区天真值的序列用 dtype 的 datetime64[ ns ]表示。

In [480]: s_naive = pd.Series(pd.date_range(“20130101”, periods=3))

In [481]: s_naive
Out[481]:
0 2013-01-01
1 2013-01-02
2 2013-01-03
dtype: datetime64[ns]
A Series with a time zone aware values is represented with a dtype of datetime64[ns, tz] where tz is the time zone

具有时区感知值的 Series 用 dtype 的 datetime64[ ns,tz ]表示,其中 tz 是时区

In [482]: s_aware = pd.Series(pd.date_range(“20130101”, periods=3, tz=“US/Eastern”))

In [483]: s_aware
Out[483]:
0 2013-01-01 00:00:00-05:00
1 2013-01-02 00:00:00-05:00
2 2013-01-03 00:00:00-05:00
dtype: datetime64[ns, US/Eastern]
Both of these Series time zone information can be manipulated via the .dt accessor, see the dt accessor section.

这两个系列时区信息都可以通过.dt 访问器来操作,请参阅 dt 访问器部分。

For example, to localize and convert a naive stamp to time zone aware.

例如,将一个朴素的戳记本地化并将其转换为能够识别时区的戳记。

In [484]: s_naive.dt.tz_localize(“UTC”).dt.tz_convert(“US/Eastern”)
Out[484]:
0 2012-12-31 19:00:00-05:00
1 2013-01-01 19:00:00-05:00
2 2013-01-02 19:00:00-05:00
dtype: datetime64[ns, US/Eastern]
Time zone information can also be manipulated using the astype method. This method can localize and convert time zone naive timestamps or convert time zone aware timestamps.

时区信息也可以使用 astype 方法进行处理。此方法可以本地化和转换时区朴素时间戳或转换时区感知的时间戳。

localize and convert a naive time zone

In [485]: s_naive.astype(“datetime64[ns, US/Eastern]”)
Out[485]:
0 2012-12-31 19:00:00-05:00
1 2013-01-01 19:00:00-05:00
2 2013-01-02 19:00:00-05:00
dtype: datetime64[ns, US/Eastern]

make an aware tz naive

In [486]: s_aware.astype(“datetime64[ns]”)
Out[486]:
0 2013-01-01 05:00:00
1 2013-01-02 05:00:00
2 2013-01-03 05:00:00
dtype: datetime64[ns]

convert to a new time zone

In [487]: s_aware.astype(“datetime64[ns, CET]”)
Out[487]:
0 2013-01-01 06:00:00+01:00
1 2013-01-02 06:00:00+01:00
2 2013-01-03 06:00:00+01:00
dtype: datetime64[ns, CET]
Note

注意

Using Series.to_numpy() on a Series, returns a NumPy array of the data. NumPy does not currently support time zones (even though it is printing in the local time zone!), therefore an object array of Timestamps is returned for time zone aware data:

在 Series 上使用 serial.to _ NumPy () ,返回数据的 NumPy 数组。NumPy 目前不支持时区(即使它是在本地时区打印!),因此返回一个 Timestamps 对象数组作为时区感知的数据:

In [488]: s_naive.to_numpy()
Out[488]:
array([‘2013-01-01T00:00:00.000000000’, ‘2013-01-02T00:00:00.000000000’,
‘2013-01-03T00:00:00.000000000’], dtype=‘datetime64[ns]’)

In [489]: s_aware.to_numpy()
Out[489]:
array([Timestamp(‘2013-01-01 00:00:00-0500’, tz=‘US/Eastern’, freq=‘D’),
Timestamp(‘2013-01-02 00:00:00-0500’, tz=‘US/Eastern’, freq=‘D’),
Timestamp(‘2013-01-03 00:00:00-0500’, tz=‘US/Eastern’, freq=‘D’)],
dtype=object)
By converting to an object array of Timestamps, it preserves the time zone information. For example, when converting back to a Series:

通过转换为 Timestamps 的对象数组,它保留了时区信息。例如,当转换回系列时:

In [490]: pd.Series(s_aware.to_numpy())
Out[490]:
0 2013-01-01 00:00:00-05:00
1 2013-01-02 00:00:00-05:00
2 2013-01-03 00:00:00-05:00
dtype: datetime64[ns, US/Eastern]
However, if you want an actual NumPy datetime64[ns] array (with the values converted to UTC) instead of an array of objects, you can specify the dtype argument:

然而,如果你想要一个实际的 NumPy datetime64[ ns ]数组(值转换为 UTC)而不是一个对象数组,你可以指定 dtype 参数:

In [491]: s_aware.to_numpy(dtype=“datetime64[ns]”)
Out[491]:
array([‘2013-01-01T05:00:00.000000000’, ‘2013-01-02T05:00:00.000000000’,
‘2013-01-03T05:00:00.000000000’], dtype=‘datetime64[ns]’)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值