03-pandas时间序列的生成与转换

pandas时间序列的生成与转换

import pandas as pd
from datetime import datetime

data = {
    'ID': ['000{}'.format(str(i)) for i in range(1, 7)],
    'name': ['aaa', 'bbb', 'ccc', 'ddd', 'eee', 'fff'],
    'gender': [True, True, False, True, False, True],
    'height': [1.1, 1.2, 1.3, 1.4, 1.5, 1.6]
}

df
IDnamegenderheight
2019-120001aaaTrue1.1
2020-010002bbbTrue1.2
2020-040003cccFalse1.3
2020-050004dddTrue1.4
2020-070005eeeFalse1.5
2020-020006fffTrue1.6

date_range函数创建时间序列

help(pd.date_range)
Help on function date_range in module pandas.core.indexes.datetimes:

date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, closed=None, **kwargs) -> pandas.core.indexes.datetimes.DatetimeIndex
    Return a fixed frequency DatetimeIndex.
    
    Parameters
    ----------
    start : str or datetime-like, optional
        Left bound for generating dates.
    end : str or datetime-like, optional
        Right bound for generating dates.
    periods : int, optional
        Number of periods to generate.
    freq : str or DateOffset, default 'D'
        Frequency strings can have multiples, e.g. '5H'. See
        :ref:`here <timeseries.offset_aliases>` for a list of
        frequency aliases.
    tz : str or tzinfo, optional
        Time zone name for returning localized DatetimeIndex, for example
        'Asia/Hong_Kong'. By default, the resulting DatetimeIndex is
        timezone-naive.
    normalize : bool, default False
        Normalize start/end dates to midnight before generating date range.
    name : str, default None
        Name of the resulting DatetimeIndex.
    closed : {None, 'left', 'right'}, optional
        Make the interval closed with respect to the given frequency to
        the 'left', 'right', or both sides (None, the default).
    **kwargs
        For compatibility. Has no effect on the result.
    
    Returns
    -------
    rng : DatetimeIndex
    
    See Also
    --------
    DatetimeIndex : An immutable container for datetimes.
    timedelta_range : Return a fixed frequency TimedeltaIndex.
    period_range : Return a fixed frequency PeriodIndex.
    interval_range : Return a fixed frequency IntervalIndex.
    
    Notes
    -----
    Of the four parameters ``start``, ``end``, ``periods``, and ``freq``,
    exactly three must be specified. If ``freq`` is omitted, the resulting
    ``DatetimeIndex`` will have ``periods`` linearly spaced elements between
    ``start`` and ``end`` (closed on both sides).
    
    To learn more about the frequency strings, please see `this link
    <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`__.
    
    Examples
    --------
    **Specifying the values**
    
    The next four examples generate the same `DatetimeIndex`, but vary
    the combination of `start`, `end` and `periods`.
    
    Specify `start` and `end`, with the default daily frequency.
    
    >>> pd.date_range(start='1/1/2018', end='1/08/2018')
    DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
                   '2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08'],
                  dtype='datetime64[ns]', freq='D')
    
    Specify `start` and `periods`, the number of periods (days).
    
    >>> pd.date_range(start='1/1/2018', periods=8)
    DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
                   '2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08'],
                  dtype='datetime64[ns]', freq='D')
    
    Specify `end` and `periods`, the number of periods (days).
    
    >>> pd.date_range(end='1/1/2018', periods=8)
    DatetimeIndex(['2017-12-25', '2017-12-26', '2017-12-27', '2017-12-28',
                   '2017-12-29', '2017-12-30', '2017-12-31', '2018-01-01'],
                  dtype='datetime64[ns]', freq='D')
    
    Specify `start`, `end`, and `periods`; the frequency is generated
    automatically (linearly spaced).
    
    >>> pd.date_range(start='2018-04-24', end='2018-04-27', periods=3)
    DatetimeIndex(['2018-04-24 00:00:00', '2018-04-25 12:00:00',
                   '2018-04-27 00:00:00'],
                  dtype='datetime64[ns]', freq=None)
    
    **Other Parameters**
    
    Changed the `freq` (frequency) to ``'M'`` (month end frequency).
    
    >>> pd.date_range(start='1/1/2018', periods=5, freq='M')
    DatetimeIndex(['2018-01-31', '2018-02-28', '2018-03-31', '2018-04-30',
                   '2018-05-31'],
                  dtype='datetime64[ns]', freq='M')
    
    Multiples are allowed
    
    >>> pd.date_range(start='1/1/2018', periods=5, freq='3M')
    DatetimeIndex(['2018-01-31', '2018-04-30', '2018-07-31', '2018-10-31',
                   '2019-01-31'],
                  dtype='datetime64[ns]', freq='3M')
    
    `freq` can also be specified as an Offset object.
    
    >>> pd.date_range(start='1/1/2018', periods=5, freq=pd.offsets.MonthEnd(3))
    DatetimeIndex(['2018-01-31', '2018-04-30', '2018-07-31', '2018-10-31',
                   '2019-01-31'],
                  dtype='datetime64[ns]', freq='3M')
    
    Specify `tz` to set the timezone.
    
    >>> pd.date_range(start='1/1/2018', periods=5, tz='Asia/Tokyo')
    DatetimeIndex(['2018-01-01 00:00:00+09:00', '2018-01-02 00:00:00+09:00',
                   '2018-01-03 00:00:00+09:00', '2018-01-04 00:00:00+09:00',
                   '2018-01-05 00:00:00+09:00'],
                  dtype='datetime64[ns, Asia/Tokyo]', freq='D')
    
    `closed` controls whether to include `start` and `end` that are on the
    boundary. The default includes boundary points on either end.
    
    >>> pd.date_range(start='2017-01-01', end='2017-01-04', closed=None)
    DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04'],
                  dtype='datetime64[ns]', freq='D')
    
    Use ``closed='left'`` to exclude `end` if it falls on the boundary.
    
    >>> pd.date_range(start='2017-01-01', end='2017-01-04', closed='left')
    DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03'],
                  dtype='datetime64[ns]', freq='D')
    
    Use ``closed='right'`` to exclude `start` if it falls on the boundary.
    
    >>> pd.date_range(start='2017-01-01', end='2017-01-04', closed='right')
    DatetimeIndex(['2017-01-02', '2017-01-03', '2017-01-04'],
                  dtype='datetime64[ns]', freq='D')

None

freq详细说明参照该blog:https://blog.csdn.net/wangqi_qiangku/article/details/79384731

df = pd.DataFrame(data, pd.date_range(start='2020-1-1', periods=6, freq='12H30min'))
df
IDnamegenderheight
2020-01-01 00:00:000001aaaTrue1.1
2020-01-01 12:30:000002bbbTrue1.2
2020-01-02 01:00:000003cccFalse1.3
2020-01-02 13:30:000004dddTrue1.4
2020-01-03 02:00:000005eeeFalse1.5
2020-01-03 14:30:000006fffTrue1.6
df = pd.DataFrame(data, pd.date_range(end='2020-1-6', periods=6, freq='MS'))
df
IDnamegenderheight
2019-08-010001aaaTrue1.1
2019-09-010002bbbTrue1.2
2019-10-010003cccFalse1.3
2019-11-010004dddTrue1.4
2019-12-010005eeeFalse1.5
2020-01-010006fffTrue1.6
df = pd.DataFrame(data, pd.date_range(start='2020-1-1', end='2020-1-31', periods=6))
df
IDnamegenderheight
2020-01-010001aaaTrue1.1
2020-01-070002bbbTrue1.2
2020-01-130003cccFalse1.3
2020-01-190004dddTrue1.4
2020-01-250005eeeFalse1.5
2020-01-310006fffTrue1.6
df = pd.DataFrame(data, pd.date_range('2020-1-1', '2020-1-7', closed='right')) # 左开右闭
df
IDnamegenderheight
2020-01-020001aaaTrue1.1
2020-01-030002bbbTrue1.2
2020-01-040003cccFalse1.3
2020-01-050004dddTrue1.4
2020-01-060005eeeFalse1.5
2020-01-070006fffTrue1.6
df = pd.DataFrame(data, pd.date_range('2020-1-1', '2020-1-7', closed='left'))  # 左闭右开
df
IDnamegenderheight
2020-01-010001aaaTrue1.1
2020-01-020002bbbTrue1.2
2020-01-030003cccFalse1.3
2020-01-040004dddTrue1.4
2020-01-050005eeeFalse1.5
2020-01-060006fffTrue1.6

pandas.PeriodIndex类创建时间序列

values = ['2019-12', '2020-1', '2020-4', '2020-5', '2020-7', '2020-2']
df = pd.DataFrame(data, pd.PeriodIndex(values, freq='D',))
df
IDnamegenderheight
2019-12-010001aaaTrue1.1
2020-01-010002bbbTrue1.2
2020-04-010003cccFalse1.3
2020-05-010004dddTrue1.4
2020-07-010005eeeFalse1.5
2020-02-010006fffTrue1.6
df = pd.DataFrame(data, pd.PeriodIndex(values, freq='Q',))
df
IDnamegenderheight
2019Q40001aaaTrue1.1
2020Q10002bbbTrue1.2
2020Q20003cccFalse1.3
2020Q20004dddTrue1.4
2020Q30005eeeFalse1.5
2020Q10006fffTrue1.6
years = [2020] * 3 + [2019] * 3
months = [1,2,3,1,2,3]
df = pd.DataFrame(data, pd.PeriodIndex(year=years, month=months, freq='D',))
df
IDnamegenderheight
2020-01-010001aaaTrue1.1
2020-02-010002bbbTrue1.2
2020-03-010003cccFalse1.3
2019-01-010004dddTrue1.4
2019-02-010005eeeFalse1.5
2019-03-010006fffTrue1.6

该类用于三个方法对时间进行转换

help(pd.PeriodIndex.asfreq)
Help on function asfreq in module pandas.core.accessor:

asfreq(self, *args, **kwargs)
    Convert the Period Array/Index to the specified frequency `freq`.
    
    Parameters
    ----------
    freq : str
        A frequency.
    how : str {'E', 'S'}
        Whether the elements should be aligned to the end
        or start within pa period.
    
        * 'E', 'END', or 'FINISH' for end,
        * 'S', 'START', or 'BEGIN' for start.
    
        January 31st ('END') vs. January 1st ('START') for example.
    
    Returns
    -------
    Period Array/Index
        Constructed with the new frequency.
    
    Examples
    --------
    >>> pidx = pd.period_range('2010-01-01', '2015-01-01', freq='A')
    >>> pidx
    PeriodIndex(['2010', '2011', '2012', '2013', '2014', '2015'],
    dtype='period[A-DEC]', freq='A-DEC')
    
    >>> pidx.asfreq('M')
    PeriodIndex(['2010-12', '2011-12', '2012-12', '2013-12', '2014-12',
    '2015-12'], dtype='period[M]', freq='M')
    
    >>> pidx.asfreq('M', how='S')
    PeriodIndex(['2010-01', '2011-01', '2012-01', '2013-01', '2014-01',
    '2015-01'], dtype='period[M]', freq='M')
df = pd.DataFrame(data, pd.PeriodIndex(values, freq='Q').asfreq('M', how='E'))
df
IDnamegenderheight
2019-120001aaaTrue1.1
2020-030002bbbTrue1.2
2020-060003cccFalse1.3
2020-060004dddTrue1.4
2020-090005eeeFalse1.5
2020-030006fffTrue1.6
help(pd.PeriodIndex.strftime)
Help on function strftime in module pandas.core.accessor:

strftime(self, *args, **kwargs)
    Convert to Index using specified date_format.
    
    Return an Index of formatted strings specified by date_format, which
    supports the same string format as the python standard library. Details
    of the string format can be found in `python string format
    doc <https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior>`__.
    
    Parameters
    ----------
    date_format : str
        Date format string (e.g. "%Y-%m-%d").
    
    Returns
    -------
    ndarray
        NumPy ndarray of formatted strings.
    
    See Also
    --------
    to_datetime : Convert the given argument to datetime.
    DatetimeIndex.normalize : Return DatetimeIndex with times to midnight.
    DatetimeIndex.round : Round the DatetimeIndex to the specified freq.
    DatetimeIndex.floor : Floor the DatetimeIndex to the specified freq.
    
    Examples
    --------
    >>> rng = pd.date_range(pd.Timestamp("2018-03-10 09:00"),
    ...                     periods=3, freq='s')
    >>> rng.strftime('%B %d, %Y, %r')
    Index(['March 10, 2018, 09:00:00 AM', 'March 10, 2018, 09:00:01 AM',
           'March 10, 2018, 09:00:02 AM'],
          dtype='object')
df.index.strftime('%Y-%m-%d')  # 返回的不再是日期类型,而是字符串
Index(['2019-12-31', '2020-03-31', '2020-06-30', '2020-06-30', '2020-09-30',
       '2020-03-31'],
      dtype='object')
help(pd.PeriodIndex.to_timestamp)
Help on function to_timestamp in module pandas.core.accessor:

to_timestamp(self, *args, **kwargs)
    Cast to DatetimeArray/Index.
    
    Parameters
    ----------
    freq : str or DateOffset, optional
        Target frequency. The default is 'D' for week or longer,
        'S' otherwise.
    how : {'s', 'e', 'start', 'end'}
        Whether to use the start or end of the time period being converted.
    
    Returns
    -------
    DatetimeArray/Index
df.to_timestamp(freq='M', how='e')
IDnamegenderheight
2020-01-01 23:59:59.9999999990001aaaTrue1.1
2020-02-01 23:59:59.9999999990002bbbTrue1.2
2020-03-01 23:59:59.9999999990003cccFalse1.3
2019-01-01 23:59:59.9999999990004dddTrue1.4
2019-02-01 23:59:59.9999999990005eeeFalse1.5
2019-03-01 23:59:59.9999999990006fffTrue1.6

DataFrame对象拥有的对时间转换操作的方法

help(df.to_period)
Help on method to_period in module pandas.core.frame:

to_period(freq=None, axis=0, copy=True) -> 'DataFrame' method of pandas.core.frame.DataFrame instance
    Convert DataFrame from DatetimeIndex to PeriodIndex.
    
    Convert DataFrame from DatetimeIndex to PeriodIndex with desired
    frequency (inferred from index if not passed).
    
    Parameters
    ----------
    freq : str, default
        Frequency of the PeriodIndex.
    axis : {0 or 'index', 1 or 'columns'}, default 0
        The axis to convert (the index by default).
    copy : bool, default True
        If False then underlying input data is not copied.
    
    Returns
    -------
    TimeSeries with PeriodIndex
dates = [datetime(2004, 10, 1, 10, 32, 45, 85), datetime(2000, 11, 27), 
                 datetime(2002, 1, 27), datetime(2002, 8, 15),
                 datetime(2003, 1, 1), datetime(2002, 12, 31)]
df = pd.DataFrame(data, index=dates)
df.to_period(freq='Q')
IDnamegenderheight
2004Q40001aaaTrue1.1
2000Q40002bbbTrue1.2
2002Q10003cccFalse1.3
2002Q30004dddTrue1.4
2003Q10005eeeFalse1.5
2002Q40006fffTrue1.6

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值