Python酷库之旅-第三方库Pandas(027)

最新推荐文章于 2024-08-15 09:47:08 发布

神奇夜光杯

最新推荐文章于 2024-08-15 09:47:08 发布

阅读量1.2k

点赞数 25

分类专栏： Myelsa的Python酷库之旅文章标签： python pandas 开发语言标准库及第三方库基础知识学习与成长

本文链接：https://blog.csdn.net/ygb_1024/article/details/140455305

版权

Myelsa的Python酷库之旅专栏收录该内容

125 篇文章 27 订阅

订阅专栏

一、用法精讲

68、pandas.infer_freq函数

68-1、语法

68-2、参数

68-3、功能

68-4、返回值

68-5、说明

68-6、用法

68-6-1、数据准备

68-6-2、代码示例

68-6-3、结果输出

69、pandas.interval_range函数

69-1、语法

69-2、参数

69-3、功能

69-4、返回值

69-5、说明

69-6、用法

70-1、语法

70-2、参数

70-3、功能

70-4、返回值

70-5、说明

70-6、用法

一、用法精讲

68、pandas.infer_freq函数

68-1、语法

# 68、pandas.infer_freq函数
pandas.infer_freq(index)
Infer the most likely frequency given the input index.

Parameters:
index
DatetimeIndex, TimedeltaIndex, Series or array-like
If passed a Series will use the values of the series (NOT THE INDEX).

Returns:
str or None
None if no discernible frequency.

Raises:
TypeError
If the index is not datetime-like.

ValueError
If there are fewer than three values.

68-2、参数

68-2-1、index(必须)：一个Pandas的时间序列索引对象，即DatetimeIndex或TimedeltaIndex。其中，DatetimeIndex用于表示日期时间数据，而TimedeltaIndex用于表示时间间隔数据。

68-3、功能

用于推断给定时间序列索引(DatetimeIndex或TimedeltaIndex)的频率。

68-4、返回值

返回频率字符串或None：

68-4-1、如果能够成功推断出索引的频率，infer_freq将返回一个表示该频率的字符串(如'D'表示日频率，'M'表示月末频率，'Y'表示年频率等)。

68-4-2、如果无法推断出索引的频率(比如索引中的时间间隔不一致)，则返回None。

68-5、说明

68-5-1、infer_freq函数只能推断出简单的、有规律的频率，对于复杂或不规则的时间序列数据，它可能无法准确推断出频率。

68-5-2、在处理时间序列数据时，了解数据的频率是非常重要的，因为它可以影响你如何分析和解释数据，然而，也需要注意到，即使infer_freq返回了一个频率字符串，这也不一定意味着你的数据完全遵循该频率(尤其是在存在数据缺失或不规则时间间隔的情况下)。

68-6、用法

68-6-1、数据准备

无

68-6-2、代码示例

# 68、pandas.infer_freq函数
# 68-1、使用DatetimeIndex推断频率
import pandas as pd
# 创建一个DatetimeIndex
dates = pd.date_range(start='2023-01-01', periods=10, freq='D')
# 推断频率
freq = pd.infer_freq(dates)
print(f"推断的频率是: {freq}", end='\n\n')

# 68-2、推断非标准频率
import pandas as pd
# 创建一个DatetimeIndex，但这次我们使用非标准的天数间隔
dates = pd.to_datetime(['2024-01-01', '2024-01-03', '2024-01-06'])
# 尝试推断频率
freq = pd.infer_freq(dates)
print(f"推断的频率是: {freq}", end='\n\n')

# 68-3、使用TimedeltaIndex推断频率
import pandas as pd
# 创建一个TimedeltaIndex
timedeltas = pd.to_timedelta(['1 days', '2 days', '3 days'])
# 尝试推断频率
freq = pd.infer_freq(timedeltas)
print(f"推断的频率是: {freq}")

68-6-3、结果输出

# 68、pandas.infer_freq函数
# 68-1、使用DatetimeIndex推断频率
# 推断的频率是: D

# 68-2、推断非标准频率
# 推断的频率是: None

# 68-3、使用TimedeltaIndex推断频率
# 推断的频率是: D

69、pandas.interval_range函数

69-1、语法

# 69、pandas.interval_range函数
pandas.interval_range(start=None, end=None, periods=None, freq=None, name=None, closed='right')
Return a fixed frequency IntervalIndex.

Parameters:
start
numeric or datetime-like, default None
Left bound for generating intervals.

end
numeric or datetime-like, default None
Right bound for generating intervals.

periods
int, default None
Number of periods to generate.

freq
numeric, str, Timedelta, datetime.timedelta, or DateOffset, default None
The length of each interval. Must be consistent with the type of start and end, e.g. 2 for numeric, or ‘5H’ for datetime-like. Default is 1 for numeric and ‘D’ for datetime-like.

name
str, default None
Name of the resulting IntervalIndex.

closed
{‘left’, ‘right’, ‘both’, ‘neither’}, default ‘right’
Whether the intervals are closed on the left-side, right-side, both or neither.

Returns:
IntervalIndex

69-2、参数

69-2-1、start(可选，默认值为None)：间隔范围的起始点。

69-2-2、end(可选，默认值为None)：间隔范围的结束点。

69-2-3、periods(可选，默认值为None)：要生成的间隔数量，如果指定了此参数，则start和end必须是可比较的，并且freq可以是可选的(如果未指定，则根据start、end和periods自动计算)。

69-2-4、freq(可选，默认值为None)：间隔的频率，它决定了间隔的长度和类型。对于日期时间间隔，它通常是如'D'(日)、'H'(小时)等的时间频率字符串；对于数字间隔，它可以是数字或可转换为数字间隔长度的表达式。

69-2-5、name(可选，默认值为None)：生成的IntervalIndex的名称。

69-2-6、closed(可选，默认值为'right')：指示间隔是左闭右开('right')、左开右闭('left')、两端都闭('both')还是两端都开('neither')。

69-3、功能

生成一个固定频率的IntervalIndex，IntervalIndex是Pandas中用于表示一系列区间的索引类型，每个区间都由一个左端点和一个右端点定义，并且这些区间可以是左闭右开、左开右闭、两端都闭或两端都开的形式。该函数允许用户通过指定起始点(start)、结束点(end)、期间数(periods)和频率(freq)等参数来灵活地生成所需的间隔序列。

69-4、返回值

返回一个IntervalIndex对象，该对象是一个不可变的、大小固定的索引，它包含了一系列由起始点和结束点定义的区间，每个区间的具体形式(左闭右开、左开右闭、两端都闭或两端都开)由closed参数控制，返回的IntervalIndex对象可以用于DataFrame或Series的索引，以便以区间的形式进行数据的选择、索引和操作。

69-5、说明

69-5-1、在start、end、periods和freq这四个参数中，必须指定其中三个，而第四个参数可以通过其他三个参数自动计算得出。

69-5-2、如果freq参数被省略，则生成的IntervalIndex将在start和end之间(包含start和end)线性地分配periods个元素。

69-5-3、对于日期时间间隔，freq参数必须是可转换为DateOffset的字符串或对象。

69-5-4、返回的IntervalIndex对象不可变，但你可使用Pandas的索引操作函数( .slice()、.get_loc()等)来对其进行查询和选择。

69-6、用法

69-6-1、数据准备

无

69-6-2、代码示例

# 69、pandas.interval_range函数
# 69-1、基本用法
import pandas as pd
intervals = pd.interval_range(start=0, end=10, periods=5)
print(intervals, end='\n\n')

# 69-2、使用freq参数
import pandas as pd
intervals = pd.interval_range(start=0, end=10, freq=2)
print(intervals, end='\n\n')

# 69-3、指定closed参数
import pandas as pd
intervals = pd.interval_range(start=0, end=10, periods=5, closed='left')
print(intervals, end='\n\n')

# 69-4、使用name参数
import pandas as pd
intervals = pd.interval_range(start=0, end=10, periods=5, name='sample_intervals')
print(intervals, end='\n\n')

69-6-3、结果输出

# 69、pandas.interval_range函数
# 69-1、基本用法
# IntervalIndex([(0, 2], (2, 4], (4, 6], (6, 8], (8, 10]], dtype='interval[int64, right]')

# 69-2、使用freq参数
# IntervalIndex([(0, 2], (2, 4], (4, 6], (6, 8], (8, 10]], dtype='interval[int64, right]')

# 69-3、指定closed参数
# IntervalIndex([[0, 2), [2, 4), [4, 6), [6, 8), [8, 10)], dtype='interval[int64, left]')

# 69-4、使用name参数
# IntervalIndex([(0, 2], (2, 4], (4, 6], (6, 8], (8, 10]], dtype='interval[int64, right]', name='sample_intervals')

70、pandas.eval函数

70-1、语法

# 70、pandas.eval函数
pandas.eval(expr, parser='pandas', engine=None, local_dict=None, global_dict=None, resolvers=(), level=0, target=None, inplace=False)
Evaluate a Python expression as a string using various backends.

The following arithmetic operations are supported: +, -, *, /, **, %, // (python engine only) along with the following boolean operations: | (or), & (and), and ~ (not). Additionally, the 'pandas' parser allows the use of and, or, and not with the same semantics as the corresponding bitwise operators. Series and DataFrame objects are supported and behave as they would with plain ol’ Python evaluation.

Parameters:
exprstr
The expression to evaluate. This string cannot contain any Python statements, only Python expressions.

parser{‘pandas’, ‘python’}, default ‘pandas’
The parser to use to construct the syntax tree from the expression. The default of 'pandas' parses code slightly different than standard Python. Alternatively, you can parse an expression using the 'python' parser to retain strict Python semantics. See the enhancing performance documentation for more details.

engine{‘python’, ‘numexpr’}, default ‘numexpr’
The engine used to evaluate the expression. Supported engines are

None : tries to use numexpr, falls back to python

'numexpr' : This default engine evaluates pandas objects using numexpr for large speed ups in complex expressions with large frames.

'python' : Performs operations as if you had eval’d in top level python. This engine is generally not that useful.

More backends may be available in the future.

local_dictdict or None, optional
A dictionary of local variables, taken from locals() by default.

global_dictdict or None, optional
A dictionary of global variables, taken from globals() by default.

resolverslist of dict-like or None, optional
A list of objects implementing the __getitem__ special method that you can use to inject an additional collection of namespaces to use for variable lookup. For example, this is used in the query() method to inject the DataFrame.index and DataFrame.columns variables that refer to their respective DataFrame instance attributes.

levelint, optional
The number of prior stack frames to traverse and add to the current scope. Most users will not need to change this parameter.

targetobject, optional, default None
This is the target object for assignment. It is used when there is variable assignment in the expression. If so, then target must support item assignment with string keys, and if a copy is being returned, it must also support .copy().

inplacebool, default False
If target is provided, and the expression mutates target, whether to modify target inplace. Otherwise, return a copy of target with the mutation.

Returns:
ndarray, numeric scalar, DataFrame, Series, or None
The completion value of evaluating the given code or None if inplace=True.

Raises:
ValueError
There are many instances where such an error can be raised:

target=None, but the expression is multiline.

The expression is multiline, but not all them have item assignment. An example of such an arrangement is this:

a = b + 1 a + 2

Here, there are expressions on different lines, making it multiline, but the last line has no variable assigned to the output of a + 2.

inplace=True, but the expression is missing item assignment.

Item assignment is provided, but the target does not support string item assignment.

Item assignment is provided and inplace=False, but the target does not support the .copy() method

70-2、参数

70-2-1、expr(必须)：一个字符串，表示需要求值的表达式，可以包含标准的Python表达式以及pandas支持的表达式，例如DataFrame的列运算。

70-2-2、parser(可选，默认值为'pandas')：选择解析器，可以是'pandas'(默认)或'python'，其中，'pandas'解析器支持pandas特有的表达式语法，'python'解析器则使用Python的内置解析器。

70-2-3、engine(可选，默认值为None)：选择求值引擎，默认是None，表示自动选择。可选值包括'numexpr'(高性能计算引擎)和'python'(Python的内置求值引擎)。

70-2-4、local_dict(可选，默认值为None)：一个字典，用于定义表达式中使用的局部变量，这些变量可以在表达式中直接使用。

70-2-5、global_dict(可选，默认值为None)：一个字典，用于定义表达式中使用的全局变量，这些变量可以在表达式中直接使用。

70-2-6、resolvers(可选，默认值为())：一个可选的列表，用于定义自定义的命名空间解析器，解析器按顺序处理，直到找到一个有效的名称。

70-2-7、level(可选，默认值为0)：一个整数，用于定义在局部和全局字典中变量查找的嵌套级别。

70-2-8、target(可选，默认值为None)：一个可选的对象，表示被修改的目标对象(例如DataFrame或Series)，如果提供了该参数，pandas.eval将在这个对象上进行操作。

70-2-9、inplace(可选，默认值为False)：布尔值，如果设置为True，将直接在目标对象上进行修改，而不会返回一个新的对象。

70-3、功能

用于在DataFrame和Series上高效地进行表达式求值。

70-4、返回值

70-4-1、如果target参数为None(默认值)，则pandas.eval()返回表达式的计算结果，结果类型取决于表达式的具体内容，但通常会是标量、Series或DataFrame。

70-4-2、如果target参数被指定为一个DataFrame，并且inplace=False(默认值)，则pandas.eval()也会返回target DataFrame，但请注意，如果表达式没有修改target，则返回的DataFrame可能与原始target相同(尽管在内部可能进行了优化处理)。

70-4-3、如果target参数被指定且inplace=True，则pandas.eval()不会返回任何值(即返回None)，因为操作直接在target DataFrame上进行。

70-5、说明

无

70-6、用法

70-6-1、数据准备

无

70-6-2、代码示例

# 70、pandas.eval函数
# 70-1、返回表达式的计算结果
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
result = pd.eval('A + B', local_dict={'A': df['A'], 'B': df['B']})
print(result, end='\n\n')

# 70-2、直接在DataFrame上使用
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df.eval('C = A + B', inplace=True)
print(df)

70-6-3、结果输出

# 70、pandas.eval函数
# 70-1、返回表达式的计算结果
# 0    5
# 1    7
# 2    9
# dtype: int64

# 70-2、直接在DataFrame上使用
#    A  B  C
# 0  1  4  5
# 1  2  5  7
# 2  3  6  9