Python酷库之旅-第三方库Pandas(130)

神奇夜光杯

于 2024-09-29 12:40:04 发布

阅读量641

点赞数 28

分类专栏： Myelsa的Python酷库之旅文章标签： python pandas 开发语言人工智能标准库及第三方库 excel 学习与成长

本文链接：https://blog.csdn.net/ygb_1024/article/details/142566408

版权

Myelsa的Python酷库之旅专栏收录该内容

173 篇文章 54 订阅

订阅专栏

一、用法精讲

581、pandas.DataFrame.first_valid_index方法

581-1、语法

581-2、参数

581-3、功能

581-4、返回值

581-5、说明

581-6、用法

581-6-1、数据准备

581-6-2、代码示例

581-6-3、结果输出

582、pandas.DataFrame.last_valid_index方法

582-1、语法

582-2、参数

582-3、功能

582-4、返回值

582-5、说明

582-6、用法

582-6-1、数据准备

582-6-2、代码示例

582-6-3、结果输出

583、pandas.DataFrame.resample方法

583-1、语法

583-2、参数

583-3、功能

583-4、返回值

583-5、说明

583-6、用法

583-6-1、数据准备

583-6-2、代码示例

583-6-3、结果输出

584、pandas.DataFrame.to_period方法

584-1、语法

584-2、参数

584-3、功能

584-4、返回值

584-5、说明

584-6、用法

584-6-1、数据准备

584-6-2、代码示例

584-6-3、结果输出

585、pandas.DataFrame.to_timestamp方法

585-1、语法

585-2、参数

585-3、功能

585-4、返回值

585-5、说明

585-6、用法

一、用法精讲

581、pandas.DataFrame.first_valid_index方法

581-1、语法

# 581、pandas.DataFrame.first_valid_index方法
pandas.DataFrame.first_valid_index()
Return index for first non-NA value or None, if no non-NA value is found.

Returns:
type of index.

581-2、参数

无

581-3、功能

快速找到第一个有效(非缺失)数据的索引，便于处理数据时的检查或操作。

581-4、返回值

返回值为第一个有效索引的标签，如果没有有效索引(即所有的值都是缺失的)，则返回None。

581-5、说明

无

581-6、用法

581-6-1、数据准备

无

581-6-2、代码示例

# 581、pandas.DataFrame.first_valid_index方法
import pandas as pd
import numpy as np
# 创建一个DataFrame
data = {
    'A': [np.nan, 2, np.nan],
    'B': [np.nan, np.nan, 3]
}
df = pd.DataFrame(data)
# 使用first_valid_index方法
result = df.first_valid_index()
print(result)

581-6-3、结果输出

# 581、pandas.DataFrame.first_valid_index方法 
# 1

582、pandas.DataFrame.last_valid_index方法

582-1、语法

# 582、pandas.DataFrame.last_valid_index方法
pandas.DataFrame.last_valid_index()
Return index for last non-NA value or None, if no non-NA value is found.

Returns:
type of index.

582-2、参数

无

582-3、功能

快速找到最后一个有效(非缺失)数据的索引。这在处理数据时，特别是当需要检查最后一行有效数据或进行数据清理时非常有用。

582-4、返回值

返回值是最后一个有效索引的标签，如果DataFrame中没有有效的索引(即所有的值都是缺失的)，则返回None。

582-5、说明

无

582-6、用法

582-6-1、数据准备

无

582-6-2、代码示例

# 582、pandas.DataFrame.last_valid_index方法
import pandas as pd
import numpy as np
# 创建一个DataFrame
data = {
    'A': [1, 2, np.nan],
    'B': [np.nan, np.nan, 3]
}
df = pd.DataFrame(data)
# 使用last_valid_index方法
result = df.last_valid_index()
print(result)

582-6-3、结果输出

# 582、pandas.DataFrame.last_valid_index方法 
# 2

583、pandas.DataFrame.resample方法

583-1、语法

# 583、pandas.DataFrame.resample方法
pandas.DataFrame.resample(rule, axis=_NoDefault.no_default, closed=None, label=None, convention=_NoDefault.no_default, kind=_NoDefault.no_default, on=None, level=None, origin='start_day', offset=None, group_keys=False)
Resample time-series data.

Convenience method for frequency conversion and resampling of time series. The object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or the caller must pass the label of a datetime-like series/index to the on/level keyword parameter.

Parameters:
ruleDateOffset, Timedelta or str
The offset string or object representing target conversion.

axis{0 or ‘index’, 1 or ‘columns’}, default 0
Which axis to use for up- or down-sampling. For Series this parameter is unused and defaults to 0. Must be DatetimeIndex, TimedeltaIndex or PeriodIndex.

Deprecated since version 2.0.0: Use frame.T.resample(…) instead.

closed{‘right’, ‘left’}, default None
Which side of bin interval is closed. The default is ‘left’ for all frequency offsets except for ‘ME’, ‘YE’, ‘QE’, ‘BME’, ‘BA’, ‘BQE’, and ‘W’ which all have a default of ‘right’.

label{‘right’, ‘left’}, default None
Which bin edge label to label bucket with. The default is ‘left’ for all frequency offsets except for ‘ME’, ‘YE’, ‘QE’, ‘BME’, ‘BA’, ‘BQE’, and ‘W’ which all have a default of ‘right’.

convention{‘start’, ‘end’, ‘s’, ‘e’}, default ‘start’
For PeriodIndex only, controls whether to use the start or end of rule.

Deprecated since version 2.2.0: Convert PeriodIndex to DatetimeIndex before resampling instead.

kind{‘timestamp’, ‘period’}, optional, default None
Pass ‘timestamp’ to convert the resulting index to a DateTimeIndex or ‘period’ to convert it to a PeriodIndex. By default the input representation is retained.

Deprecated since version 2.2.0: Convert index to desired type explicitly instead.

onstr, optional
For a DataFrame, column to use instead of index for resampling. Column must be datetime-like.

levelstr or int, optional
For a MultiIndex, level (name or number) to use for resampling. level must be datetime-like.

originTimestamp or str, default ‘start_day’
The timestamp on which to adjust the grouping. The timezone of origin must match the timezone of the index. If string, must be one of the following:

‘epoch’: origin is 1970-01-01

‘start’: origin is the first value of the timeseries

‘start_day’: origin is the first day at midnight of the timeseries

‘end’: origin is the last value of the timeseries

‘end_day’: origin is the ceiling midnight of the last day

New in version 1.3.0.

Note

Only takes effect for Tick-frequencies (i.e. fixed frequencies like days, hours, and minutes, rather than months or quarters).

offsetTimedelta or str, default is None
An offset timedelta added to the origin.

group_keysbool, default False
Whether to include the group keys in the result index when using .apply() on the resampled object.

New in version 1.5.0: Not specifying group_keys will retain values-dependent behavior from pandas 1.4 and earlier (see pandas 1.5.0 Release notes for examples).

Changed in version 2.0.0: group_keys now defaults to False.

Returns:
pandas.api.typing.Resampler
Resampler object.

583-2、参数

583-2-1、rule(必须)：字符串，RESAMPLING的频率为字符串格式，例如'D'(日)、'M'(月)、'H'(小时)等。

583-2-2、axis(可选)：整数，表示选择重新采样的轴，0表示行，1表示列。

583-2-3、closed(可选，默认值为None)：{'right', 'left'}，指定区间的关闭端，'right'表示区间闭合在右侧，'left'表示在左侧。

583-2-4、label(可选，默认值为None)：{'right', 'left'}，指定返回的时间戳的位置，'right'表示时间戳在区间的右侧，'left'表示在左侧。

583-2-5、convention(可选)：{'start', 'end'}，用于在on和level参数为None时指定返回的时间戳，如果kind是'period'，则此参数指定开始或结束时间；如果kind是'timestamp'，则其效果相似。

583-2-6、kind(可选)：{'timestamp', 'period'}，指定返回的数据类型，如果设置为'timestamp'，则返回时间戳；如果设置为'period'，则返回周期。

583-2-7、on(可选，默认值为None)：字符串，指定用于重新采样的列。未指定则使用DataFrame的索引。

583-2-8、level(可选，默认值为None)：整数或字符串，用于多级索引时指定要进行重新采样的级别。

583-2-9、origin(可选，默认值为'start_day')：{'epoch', 'start', 'start_day'}，用于指定时间序列的起始时间。

583-2-10、offset(可选，默认值为None)：DateOffset，与规则一起使用，以实现更灵活的时间频率定义。

583-2-11、group_keys(可选，默认值为False)：布尔值，指定是否将分组键加到结果中。

583-3、功能

用于将时间序列数据重新采样，能够根据给定的频率(如'D'表示按日，'M'表示按月等)对数据进行分组并应用聚合函数，适用于处理时间序列数据(如日期索引)，用于时间频率转换和数据的汇总。

583-4、返回值

返回值是一个新的DataFrame或Series对象，其索引是重新采样后的时间戳，值是经过聚合后的结果。

583-5、说明

无

583-6、用法

583-6-1、数据准备

无

583-6-2、代码示例

# 583、pandas.DataFrame.resample方法
import pandas as pd
import numpy as np
# 创建一个时间序列DataFrame
date_rng = pd.date_range(start='2024-01-01', end='2024-01-10', freq='D')
df = pd.DataFrame(date_rng, columns=['date'])
df['data'] = np.random.randint(0, 100, size=(len(date_rng)))
# 设置'date'列为索引
df.set_index('date', inplace=True)
# 使用resample方法进行重新采样
resampled_df = df.resample('2D').sum()
print(resampled_df)

583-6-3、结果输出

# 583、pandas.DataFrame.resample方法
#             data
# date            
# 2024-01-01    37
# 2024-01-03    97
# 2024-01-05   135
# 2024-01-07   142
# 2024-01-09    94

584、pandas.DataFrame.to_period方法

584-1、语法

# 584、pandas.DataFrame.to_period方法
pandas.DataFrame.to_period(freq=None, axis=0, copy=None)
Convert DataFrame from DatetimeIndex to PeriodIndex.

Convert DataFrame from DatetimeIndex to PeriodIndex with desired frequency (inferred from index if not passed).

Parameters:
freqstr, default
Frequency of the PeriodIndex.

axis{0 or ‘index’, 1 or ‘columns’}, default 0
The axis to convert (the index by default).

copybool, default True
If False then underlying input data is not copied.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Returns:
DataFrame
The DataFrame has a PeriodIndex.

584-2、参数

584-2-1、freq(可选，默认值为None)：str或DateOffset，指定转换成的时间周期的频率，比如'M'代表每月，'Q'代表每季度，'A'代表每年等，如果为None，则根据DataFrame的索引推断频率。

584-2-2、axis(可选，默认值为0)：{0 or 'index', 1 or 'columns'}，指定要沿着哪个轴进行操作，如果是0或'index'，则处理行索引；如果是1或'columns'，则处理列索引。

584-2-3、copy(可选，默认值为None)：布尔值，用于控制是否返回数据的副本，如果设置为True，则无论原始数据是否需要修改，都会返回副本；如果设为False，则只有在必要时才会复制数据；默认是None，决定是否复制取决于数据的修改情况。

584-3、功能

用于将时间序列数据从时间戳格式转换为周期格式，使得分析与处理时间周期数据更加便捷。比如，在处理财务数据时，可能更希望看到每个季度或每年为单位的指标。

584-4、返回值

返回一个新的DataFrame，其中原来的时间戳被转换为指定频率的时间周期，返回的DataFrame仍然保持原始的结构，但索引的类型会是PeriodIndex，这使得后续对时间数据的处理更加高效和明确。

584-5、说明

无

584-6、用法

584-6-1、数据准备

无

584-6-2、代码示例

# 584、pandas.DataFrame.to_period方法
import pandas as pd
# 创建一个包含时间戳的DataFrame
dates = pd.date_range('2024-01-01', periods=5, freq='D')
data = pd.DataFrame({'value': [1, 2, 3, 4, 5]}, index=dates)
# 将时间戳转换为按月的PeriodIndex
period_data = data.to_period(freq='M')
print(period_data)

584-6-3、结果输出

# 584、pandas.DataFrame.to_period方法
#          value
# 2024-01      1
# 2024-01      2
# 2024-01      3
# 2024-01      4
# 2024-01      5

585、pandas.DataFrame.to_timestamp方法

585-1、语法

# 585、pandas.DataFrame.to_timestamp方法
pandas.DataFrame.to_timestamp(freq=None, how='start', axis=0, copy=None)
Cast to DatetimeIndex of timestamps, at beginning of period.

Parameters:
freqstr, default frequency of PeriodIndex
Desired frequency.

how{‘s’, ‘e’, ‘start’, ‘end’}
Convention for converting period to timestamp; start of period vs. end.

axis{0 or ‘index’, 1 or ‘columns’}, default 0
The axis to convert (the index by default).

copybool, default True
If False then underlying input data is not copied.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Returns:
DataFrame
The DataFrame has a DatetimeIndex.

585-2、参数

585-2-1、freq(可选，默认值为None)：str或DateOffset，指定转换成的时间周期的频率，比如'M'代表每月，'Q'代表每季度，'A'代表每年等，如果为None，则根据DataFrame的索引推断频率。

585-2-2、how(可选，默认值为'start')：{'start', 'end'}，表示控制时间戳对应的周期的开始或结束。若设为'start'，则会返回每个周期的开始时间戳；若设为'end'，则会返回每个周期的结束时间戳。

585-2-3、axis(可选，默认值为0)：{0 or 'index', 1 or 'columns'}，指定要沿着哪个轴进行操作，如果是0或'index'，则处理行索引；如果是1或'columns'，则处理列索引。

585-2-4、copy(可选，默认值为None)：布尔值，用于控制是否返回数据的副本，如果设置为True，则无论原始数据是否需要修改，都会返回副本；如果设为False，则只有在必要时才会复制数据；默认是None，决定是否复制取决于数据的修改情况。

585-3、功能

用于将时间周期数据转换回时间戳格式，方便在时间序列分析中使用具体的日期或时间点，这在需要进一步处理、绘图或与其他时间序列数据合并时尤其有用。

585-4、返回值

返回一个新的DataFrame，其中原有的时间周期被转换为对应的时间戳，返回的DataFrame可能会有不同的索引类型，具体取决于转换时选择的时间戳频率和开始/结束的设置。

585-5、说明

无

585-6、用法

585-6-1、数据准备

无

585-6-2、代码示例

# 585、pandas.DataFrame.to_timestamp方法
import pandas as pd
# 创建一个包含周期的DataFrame
periods = pd.period_range('2024-01', periods=5, freq='M')
data = pd.DataFrame({'value': [10, 20, 30, 40, 50]}, index=periods)
# 将周期转换回时间戳（默认取每个周期的开始时间）
timestamp_data = data.to_timestamp()
print(timestamp_data)

585-6-3、结果输出

# 585、pandas.DataFrame.to_timestamp方法
#             value
# 2024-01-01     10
# 2024-02-01     20
# 2024-03-01     30
# 2024-04-01     40
# 2024-05-01     50