03-pandas时间序列的生成与转换

最新推荐文章于 2024-05-05 19:33:07 发布

Yph_Jerry

最新推荐文章于 2024-05-05 19:33:07 发布

阅读量586

点赞数

分类专栏： python 文章标签：数据分析 python

本文链接：https://blog.csdn.net/qq_42965915/article/details/107590758

版权

python 专栏收录该内容

8 篇文章 4 订阅

订阅专栏

pandas时间序列的生成与转换

import pandas as pd
from datetime import datetime

data = {
    'ID': ['000{}'.format(str(i)) for i in range(1, 7)],
    'name': ['aaa', 'bbb', 'ccc', 'ddd', 'eee', 'fff'],
    'gender': [True, True, False, True, False, True],
    'height': [1.1, 1.2, 1.3, 1.4, 1.5, 1.6]
}

df

	ID	name	gender	height
2019-12	0001	aaa	True	1.1
2020-01	0002	bbb	True	1.2
2020-04	0003	ccc	False	1.3
2020-05	0004	ddd	True	1.4
2020-07	0005	eee	False	1.5
2020-02	0006	fff	True	1.6

date_range函数创建时间序列

help(pd.date_range)

Help on function date_range in module pandas.core.indexes.datetimes:

date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, closed=None, **kwargs) -> pandas.core.indexes.datetimes.DatetimeIndex
    Return a fixed frequency DatetimeIndex.
    
    Parameters
    ----------
    start : str or datetime-like, optional
        Left bound for generating dates.
    end : str or datetime-like, optional
        Right bound for generating dates.
    periods : int, optional
        Number of periods to generate.
    freq : str or DateOffset, default 'D'
        Frequency strings can have multiples, e.g. '5H'. See
        :ref:`here <timeseries.offset_aliases>` for a list of
        frequency aliases.
    tz : str or tzinfo, optional
        Time zone name for returning localized DatetimeIndex, for example
        'Asia/Hong_Kong'. By default, the resulting DatetimeIndex is
        timezone-naive.
    normalize : bool, default False
        Normalize start/end dates to midnight before generating date range.
    name : str, default None
        Name of the resulting DatetimeIndex.
    closed : {None, 'left', 'right'}, optional
        Make the interval closed with respect to the given frequency to
        the 'left', 'right', or both sides (None, the default).
    **kwargs
        For compatibility. Has no effect on the result.
    
    Returns
    -------
    rng : DatetimeIndex
    
    See Also
    --------
    DatetimeIndex : An immutable container for datetimes.
    timedelta_range : Return a fixed frequency TimedeltaIndex.
    period_range : Return a fixed frequency PeriodIndex.
    interval_range : Return a fixed frequency IntervalIndex.
    
    Notes
    -----
    Of the four parameters ``start``, ``end``, ``periods``, and ``freq``,
    exactly three must be specified. If ``freq`` is omitted, the resulting
    ``DatetimeIndex`` will have ``periods`` linearly spaced elements between
    ``start`` and ``end`` (closed on both sides).
    
    To learn more about the frequency strings, please see `this link
    <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`__.
    
    Examples
    --------
    **Specifying the values**
    
    The next four examples generate the same `DatetimeIndex`, but vary
    the combination of `start`, `end` and `periods`.
    
    Specify `start` and `end`, with the default daily frequency.
    
    >>> pd.date_range(start='1/1/2018', end='1/08/2018')
    DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
                   '2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08'],
                  dtype='datetime64[ns]', freq='D')
    
    Specify `start` and `periods`, the number of periods (days).
    
    >>> pd.date_range(start='1/1/2018', periods=8)
    DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
                   '2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08'],
                  dtype='datetime64[ns]', freq='D')
    
    Specify `end` and `periods`, the number of periods (days).
    
    >>> pd.date_range(end='1/1/2018', periods=8)
    DatetimeIndex(['2017-12-25', '2017-12-26', '2017-12-27', '2017-12-28',
                   '2017-12-29', '2017-12-30', '2017-12-31', '2018-01-01'],
                  dtype='datetime64[ns]', freq='D')
    
    Specify `start`, `end`, and `periods`; the frequency is generated
    automatically (linearly spaced).
    
    >>> pd.date_range(start='2018-04-24', end='2018-04-27', periods=3)
    DatetimeIndex(['2018-04-24 00:00:00', '2018-04-25 12:00:00',
                   '2018-04-27 00:00:00'],
                  dtype='datetime64[ns]', freq=None)
    
    **Other Parameters**
    
    Changed the `freq` (frequency) to ``'M'`` (month end frequency).
    
    >>> pd.date_range(start='1/1/2018', periods=5, freq='M')
    DatetimeIndex(['2018-01-31', '2018-02-28', '2018-03-31', '2018-04-30',
                   '2018-05-31'],
                  dtype='datetime64[ns]', freq='M')
    
    Multiples are allowed
    
    >>> pd.date_range(start='1/1/2018', periods=5, freq='3M')
    DatetimeIndex(['2018-01-31', '2018-04-30', '2018-07-31', '2018-10-31',
                   '2019-01-31'],
                  dtype='datetime64[ns]', freq='3M')
    
    `freq` can also be specified as an Offset object.
    
    >>> pd.date_range(start='1/1/2018', periods=5, freq=pd.offsets.MonthEnd(3))
    DatetimeIndex(['2018-01-31', '2018-04-30', '2018-07-31', '2018-10-31',
                   '2019-01-31'],
                  dtype='datetime64[ns]', freq='3M')
    
    Specify `tz` to set the timezone.
    
    >>> pd.date_range(start='1/1/2018', periods=5, tz='Asia/Tokyo')
    DatetimeIndex(['2018-01-01 00:00:00+09:00', '2018-01-02 00:00:00+09:00',
                   '2018-01-03 00:00:00+09:00', '2018-01-04 00:00:00+09:00',
                   '2018-01-05 00:00:00+09:00'],
                  dtype='datetime64[ns, Asia/Tokyo]', freq='D')
    
    `closed` controls whether to include `start` and `end` that are on the
    boundary. The default includes boundary points on either end.
    
    >>> pd.date_range(start='2017-01-01', end='2017-01-04', closed=None)
    DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04'],
                  dtype='datetime64[ns]', freq='D')
    
    Use ``closed='left'`` to exclude `end` if it falls on the boundary.
    
    >>> pd.date_range(start='2017-01-01', end='2017-01-04', closed='left')
    DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03'],
                  dtype='datetime64[ns]', freq='D')
    
    Use ``closed='right'`` to exclude `start` if it falls on the boundary.
    
    >>> pd.date_range(start='2017-01-01', end='2017-01-04', closed='right')
    DatetimeIndex(['2017-01-02', '2017-01-03', '2017-01-04'],
                  dtype='datetime64[ns]', freq='D')

None

freq详细说明参照该blog：https://blog.csdn.net/wangqi_qiangku/article/details/79384731

df = pd.DataFrame(data, pd.date_range(start='2020-1-1', periods=6, freq='12H30min'))
df

	ID	name	gender	height
2020-01-01 00:00:00	0001	aaa	True	1.1
2020-01-01 12:30:00	0002	bbb	True	1.2
2020-01-02 01:00:00	0003	ccc	False	1.3
2020-01-02 13:30:00	0004	ddd	True	1.4
2020-01-03 02:00:00	0005	eee	False	1.5
2020-01-03 14:30:00	0006	fff	True	1.6

df = pd.DataFrame(data, pd.date_range(end='2020-1-6', periods=6, freq='MS'))
df

	ID	name	gender	height
2019-08-01	0001	aaa	True	1.1
2019-09-01	0002	bbb	True	1.2
2019-10-01	0003	ccc	False	1.3
2019-11-01	0004	ddd	True	1.4
2019-12-01	0005	eee	False	1.5
2020-01-01	0006	fff	True	1.6

df = pd.DataFrame(data, pd.date_range(start='2020-1-1', end='2020-1-31', periods=6))
df

	ID	name	gender	height
2020-01-01	0001	aaa	True	1.1
2020-01-07	0002	bbb	True	1.2
2020-01-13	0003	ccc	False	1.3
2020-01-19	0004	ddd	True	1.4
2020-01-25	0005	eee	False	1.5
2020-01-31	0006	fff	True	1.6

df = pd.DataFrame(data, pd.date_range('2020-1-1', '2020-1-7', closed='right')) # 左开右闭
df

	ID	name	gender	height
2020-01-02	0001	aaa	True	1.1
2020-01-03	0002	bbb	True	1.2
2020-01-04	0003	ccc	False	1.3
2020-01-05	0004	ddd	True	1.4
2020-01-06	0005	eee	False	1.5
2020-01-07	0006	fff	True	1.6

df = pd.DataFrame(data, pd.date_range('2020-1-1', '2020-1-7', closed='left'))  # 左闭右开
df

	ID	name	gender	height
2020-01-01	0001	aaa	True	1.1
2020-01-02	0002	bbb	True	1.2
2020-01-03	0003	ccc	False	1.3
2020-01-04	0004	ddd	True	1.4
2020-01-05	0005	eee	False	1.5
2020-01-06	0006	fff	True	1.6

pandas.PeriodIndex类创建时间序列

values = ['2019-12', '2020-1', '2020-4', '2020-5', '2020-7', '2020-2']

df = pd.DataFrame(data, pd.PeriodIndex(values, freq='D',))
df

	ID	name	gender	height
2019-12-01	0001	aaa	True	1.1
2020-01-01	0002	bbb	True	1.2
2020-04-01	0003	ccc	False	1.3
2020-05-01	0004	ddd	True	1.4
2020-07-01	0005	eee	False	1.5
2020-02-01	0006	fff	True	1.6

df = pd.DataFrame(data, pd.PeriodIndex(values, freq='Q',))
df

	ID	name	gender	height
2019Q4	0001	aaa	True	1.1
2020Q1	0002	bbb	True	1.2
2020Q2	0003	ccc	False	1.3
2020Q2	0004	ddd	True	1.4
2020Q3	0005	eee	False	1.5
2020Q1	0006	fff	True	1.6

years = [2020] * 3 + [2019] * 3
months = [1,2,3,1,2,3]
df = pd.DataFrame(data, pd.PeriodIndex(year=years, month=months, freq='D',))
df

	ID	name	gender	height
2020-01-01	0001	aaa	True	1.1
2020-02-01	0002	bbb	True	1.2
2020-03-01	0003	ccc	False	1.3
2019-01-01	0004	ddd	True	1.4
2019-02-01	0005	eee	False	1.5
2019-03-01	0006	fff	True	1.6

该类用于三个方法对时间进行转换

help(pd.PeriodIndex.asfreq)

Help on function asfreq in module pandas.core.accessor:

asfreq(self, *args, **kwargs)
    Convert the Period Array/Index to the specified frequency `freq`.
    
    Parameters
    ----------
    freq : str
        A frequency.
    how : str {'E', 'S'}
        Whether the elements should be aligned to the end
        or start within pa period.
    
        * 'E', 'END', or 'FINISH' for end,
        * 'S', 'START', or 'BEGIN' for start.
    
        January 31st ('END') vs. January 1st ('START') for example.
    
    Returns
    -------
    Period Array/Index
        Constructed with the new frequency.
    
    Examples
    --------
    >>> pidx = pd.period_range('2010-01-01', '2015-01-01', freq='A')
    >>> pidx
    PeriodIndex(['2010', '2011', '2012', '2013', '2014', '2015'],
    dtype='period[A-DEC]', freq='A-DEC')
    
    >>> pidx.asfreq('M')
    PeriodIndex(['2010-12', '2011-12', '2012-12', '2013-12', '2014-12',
    '2015-12'], dtype='period[M]', freq='M')
    
    >>> pidx.asfreq('M', how='S')
    PeriodIndex(['2010-01', '2011-01', '2012-01', '2013-01', '2014-01',
    '2015-01'], dtype='period[M]', freq='M')

df = pd.DataFrame(data, pd.PeriodIndex(values, freq='Q').asfreq('M', how='E'))
df

	ID	name	gender	height
2019-12	0001	aaa	True	1.1
2020-03	0002	bbb	True	1.2
2020-06	0003	ccc	False	1.3
2020-06	0004	ddd	True	1.4
2020-09	0005	eee	False	1.5
2020-03	0006	fff	True	1.6

help(pd.PeriodIndex.strftime)

Help on function strftime in module pandas.core.accessor:

strftime(self, *args, **kwargs)
    Convert to Index using specified date_format.
    
    Return an Index of formatted strings specified by date_format, which
    supports the same string format as the python standard library. Details
    of the string format can be found in `python string format
    doc <https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior>`__.
    
    Parameters
    ----------
    date_format : str
        Date format string (e.g. "%Y-%m-%d").
    
    Returns
    -------
    ndarray
        NumPy ndarray of formatted strings.
    
    See Also
    --------
    to_datetime : Convert the given argument to datetime.
    DatetimeIndex.normalize : Return DatetimeIndex with times to midnight.
    DatetimeIndex.round : Round the DatetimeIndex to the specified freq.
    DatetimeIndex.floor : Floor the DatetimeIndex to the specified freq.
    
    Examples
    --------
    >>> rng = pd.date_range(pd.Timestamp("2018-03-10 09:00"),
    ...                     periods=3, freq='s')
    >>> rng.strftime('%B %d, %Y, %r')
    Index(['March 10, 2018, 09:00:00 AM', 'March 10, 2018, 09:00:01 AM',
           'March 10, 2018, 09:00:02 AM'],
          dtype='object')

df.index.strftime('%Y-%m-%d')  # 返回的不再是日期类型，而是字符串

Index(['2019-12-31', '2020-03-31', '2020-06-30', '2020-06-30', '2020-09-30',
       '2020-03-31'],
      dtype='object')

help(pd.PeriodIndex.to_timestamp)

Help on function to_timestamp in module pandas.core.accessor:

to_timestamp(self, *args, **kwargs)
    Cast to DatetimeArray/Index.
    
    Parameters
    ----------
    freq : str or DateOffset, optional
        Target frequency. The default is 'D' for week or longer,
        'S' otherwise.
    how : {'s', 'e', 'start', 'end'}
        Whether to use the start or end of the time period being converted.
    
    Returns
    -------
    DatetimeArray/Index

df.to_timestamp(freq='M', how='e')

	ID	name	gender	height
2020-01-01 23:59:59.999999999	0001	aaa	True	1.1
2020-02-01 23:59:59.999999999	0002	bbb	True	1.2
2020-03-01 23:59:59.999999999	0003	ccc	False	1.3
2019-01-01 23:59:59.999999999	0004	ddd	True	1.4
2019-02-01 23:59:59.999999999	0005	eee	False	1.5
2019-03-01 23:59:59.999999999	0006	fff	True	1.6

DataFrame对象拥有的对时间转换操作的方法

help(df.to_period)

Help on method to_period in module pandas.core.frame:

to_period(freq=None, axis=0, copy=True) -> 'DataFrame' method of pandas.core.frame.DataFrame instance
    Convert DataFrame from DatetimeIndex to PeriodIndex.
    
    Convert DataFrame from DatetimeIndex to PeriodIndex with desired
    frequency (inferred from index if not passed).
    
    Parameters
    ----------
    freq : str, default
        Frequency of the PeriodIndex.
    axis : {0 or 'index', 1 or 'columns'}, default 0
        The axis to convert (the index by default).
    copy : bool, default True
        If False then underlying input data is not copied.
    
    Returns
    -------
    TimeSeries with PeriodIndex

dates = [datetime(2004, 10, 1, 10, 32, 45, 85), datetime(2000, 11, 27), 
                 datetime(2002, 1, 27), datetime(2002, 8, 15),
                 datetime(2003, 1, 1), datetime(2002, 12, 31)]
df = pd.DataFrame(data, index=dates)
df.to_period(freq='Q')

	ID	name	gender	height
2004Q4	0001	aaa	True	1.1
2000Q4	0002	bbb	True	1.2
2002Q1	0003	ccc	False	1.3
2002Q3	0004	ddd	True	1.4
2003Q1	0005	eee	False	1.5
2002Q4	0006	fff	True	1.6

Yph_Jerry

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
03-pandas时间序列的生成与转换

pandas时间序列的生成与转换import pandas as pdfrom datetime import datetimedata = { 'ID': ['000{}'.format(str(i)) for i in range(1, 7)], 'name': ['aaa', 'bbb', 'ccc', 'ddd', 'eee', 'fff'], 'gender': [True, True, False, True, False, True], 'height'
复制链接

扫一扫