Python酷库之旅-第三方库Pandas(041)

神奇夜光杯

于 2024-07-24 07:45:00 发布

阅读量450

点赞数 35

分类专栏： Myelsa的Python酷库之旅文章标签： python pandas 开发语言人工智能 excel 标准库及第三方库学习与成长

本文链接：https://blog.csdn.net/ygb_1024/article/details/140614973

版权

Myelsa的Python酷库之旅专栏收录该内容

85 篇文章 15 订阅

订阅专栏

一、用法精讲

136、pandas.Series.ne方法

136-1、语法

136-2、参数

136-3、功能

136-4、返回值

136-5、说明

136-6、用法

136-6-1、数据准备

136-6-2、代码示例

136-6-3、结果输出

137、pandas.Series.eq方法

137-1、语法

137-2、参数

137-3、功能

137-4、返回值

137-5、说明

137-6、用法

137-6-1、数据准备

137-6-2、代码示例

137-6-3、结果输出

138、pandas.Series.product方法

138-1、语法

138-2、参数

138-3、功能

138-4、返回值

138-5、说明

138-6、用法

138-6-1、数据准备

138-6-2、代码示例

138-6-3、结果输出

139、pandas.Series.dot方法

139-1、语法

139-2、参数

139-3、功能

139-4、返回值

139-5、说明

139-6、用法

139-6-1、数据准备

139-6-2、代码示例

139-6-3、结果输出

140、pandas.Series.apply方法

140-1、语法

140-2、参数

140-3、功能

140-4、返回值

140-5、说明

140-6、用法

一、用法精讲

136、pandas.Series.ne方法

136-1、语法

# 136、pandas.Series.ne方法
pandas.Series.ne(other, level=None, fill_value=None, axis=0)
Return Not equal to of series and other, element-wise (binary operator ne).

Equivalent to series != other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:
other
Series or scalar value
level
int or name
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value
None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.

axis
{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

Returns:
Series
The result of the operation.

136-2、参数

136-2-1、other(必须)：表示要与Series中每个元素进行比较的对象，可以是一个标量值或另一个Series。

136-2-2、level(可选，默认值为None)：用于指定多重索引的级别。当Series拥有多层索引时使用。

136-2-3、fill_value(可选，默认值为None)：当other和Series之间存在缺失值时，可以使用这个值来填充缺失部分，帮助确保比较的完整性。

136-2-4、axis(可选，默认值为0)：该参数在比较Series时通常不需要指定，因为Series只有一个维度。

136-3、功能

用于比较Series中每个元素与另一个值或另一Series的方法，具体功能是返回一个布尔型Series，指示每个元素是否不等于提供的值。

136-4、返回值

返回一个布尔型Series，指示每个元素是否不等于所提供的other值。

136-5、说明

使用场景：

136-5-1、数据筛选：可以用.ne()方法在处理数据时筛选出不满足特定条件的行。

136-5-2、条件判断：适用于数据分析中需要实施不相等比较的场景。

136-6、用法

136-6-1、数据准备

无

136-6-2、代码示例

# 136、pandas.Series.ne方法
# 136-1、与一个标量进行比较
import pandas as pd
s = pd.Series([1, 2, 3, 4, 5])
result = s.ne(3)
print(result, end='\n\n')

# 136-2、与另一个Series进行比较
import pandas as pd
s = pd.Series([1, 2, 3, 4, 5])
s2 = pd.Series([2, 3, 4, 5, 6])
result2 = s.ne(s2)
print(result2)

136-6-3、结果输出

# 136、pandas.Series.ne方法
# 136-1、与一个标量进行比较
# 0     True
# 1     True
# 2    False
# 3     True
# 4     True
# dtype: bool

# 136-2、与另一个Series进行比较
# 0    True
# 1    True
# 2    True
# 3    True
# 4    True
# dtype: bool

137、pandas.Series.eq方法

137-1、语法

# 137、pandas.Series.eq方法
pandas.Series.eq(other, level=None, fill_value=None, axis=0)
Return Equal to of series and other, element-wise (binary operator eq).

Equivalent to series == other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:
other
Series or scalar value
level
int or name
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value
None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.

axis
{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

Returns:
Series
The result of the operation.

137-2、参数

137-2-1、other(必须)：表示要与Series中每个元素进行比较的值，可以是标量(单一值)或另一个Series。

137-2-2、level(可选，默认值为None)：用于指定多重索引的级别，当Series拥有多层索引时使用。

137-2-3、fill_value(可选，默认值为None)：当other和Series之间存在缺失值时，可以使用这个值来填充缺失部分，以便进行更全面的比较。

137-2-4、axis(可选，默认值为0)：对于Series来说，这个参数通常不需要设置，因为Series只有一个维度。

137-3、功能

用于比较Series中每个元素与另一个值或另一Series的方法，该方法会返回一个布尔型Series，指示每个元素是否等于提供的值。

137-4、返回值

返回一个布尔型Series，表明每个元素是否等于所提供的other值。

137-5、说明

使用场景：

137-5-1、数据筛选：可以用于从数据集中筛选出满足特定条件的行。

137-5-2、条件判断：在数据分析中常用于检查一个Series中的值是否等于某个标准。

137-6、用法

137-6-1、数据准备

无

137-6-2、代码示例

# 137、pandas.Series.eq方法
# 137-1、与一个标量进行比较
import pandas as pd
s = pd.Series([1, 2, 3, 4, 5])
result = s.eq(3)
print(result, end='\n\n')

# 137-2、与另一个Series进行比较
import pandas as pd
s = pd.Series([1, 2, 3, 4, 5])
s2 = pd.Series([2, 3, 4, 4, 5])
result2 = s.eq(s2)
print(result2)

137-6-3、结果输出

# 137、pandas.Series.eq方法
# 137-1、与一个标量进行比较
# 0    False
# 1    False
# 2     True
# 3    False
# 4    False
# dtype: bool

# 137-2、与另一个Series进行比较
# 0    False
# 1    False
# 2    False
# 3     True
# 4     True
# dtype: bool

138、pandas.Series.product方法

138-1、语法

# 138、pandas.Series.product方法
pandas.Series.product(axis=None, skipna=True, numeric_only=False, min_count=0, **kwargs)
Return the product of the values over the requested axis.

Parameters:
axis{index (0)}
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

Warning

The behavior of DataFrame.prod with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

New in version 2.0.0.

skipnabool, default True
Exclude NA/null values when computing the result.

numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for Series.

min_countint, default 0
The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

**kwargs
Additional keyword arguments to be passed to the function.

Returns:
scalar or scalar

138-2、参数

138-2-1、axis(可选，默认值为None)：对于Series，这个参数通常不需要设置，因为Series只有一个维度。

138-2-2、skipna(可选，默认值为True)：如果设为True，则在计算乘积时会忽略缺失值(NaN)；如果设为False，并且Series中存在任何缺失值，那么结果将会是NaN。

138-2-3、numeric_only(可选，默认值为False)：如果设为True，则仅对数值类型的数据进行乘积计算，非数值类型的数据将被忽略。

138-2-4、min_count(可选，默认值为0)：指定在计算乘积时，能够计算的最小非缺失值数量。如果实际的非缺失值数量小于这个值，则结果将是NaN。例如，如果min_count=2，而只有一个非缺失值，那么结果将会是NaN。

138-2-5、**kwargs(可选)：其他关键字参数，不常用，通常可以省略。

138-3、功能

用于计算Series中所有元素的乘积。

138-4、返回值

返回计算得到的乘积，可以是标量值，表示Series中所有元素的乘积。

138-5、说明

使用场景：

138-5-1、财务计算：经常用于计算投资收益、销售总额等。

138-5-2、数据分析：用于聚合数据，进行统计业务分析时，可以求得总的量级

138-6、用法

138-6-1、数据准备

无

138-6-2、代码示例

# 138、pandas.Series.product方法
# 138-1、计算乘积
import pandas as pd
# 创建一个示例Series
s = pd.Series([1, 2, 3, 4])
product_result = s.product()
print(product_result, end='\n\n')

# 138-2、忽略NaN进行乘积计算
import pandas as pd
# 包含NaN值的Series
s_with_nan = pd.Series([1, 2, None, 4])
product_with_nan = s_with_nan.product(skipna=True)
print(product_with_nan, end='\n\n')

# 138-3、不忽略NaN进行乘积计算
import pandas as pd
# 包含NaN值的Series
s_with_nan = pd.Series([1, 2, None, 4])
product_with_nan_inclusive = s_with_nan.product(skipna=False)
print(product_with_nan_inclusive, end='\n\n')

# 138-4、使用numeric_only
import pandas as pd
s_mixed = pd.Series([1, 'a', 3, 4])
numeric_series = s_mixed[pd.to_numeric(s_mixed, errors='coerce').notnull()]
numeric_product = numeric_series.prod()
print(numeric_product)

138-6-3、结果输出

# 138、pandas.Series.product方法
# 138-1、计算乘积
# 24

# 138-2、忽略NaN进行乘积计算
# 8.0

# 138-3、不忽略NaN进行乘积计算
# nan

# 138-4、使用numeric_only
# 12

139、pandas.Series.dot方法

139-1、语法

# 139、pandas.Series.dot方法
pandas.Series.dot(other)
Compute the dot product between the Series and the columns of other.

This method computes the dot product between the Series and another one, or the Series and each columns of a DataFrame, or the Series and each columns of an array.

It can also be called using self @ other.

Parameters:
other
Series, DataFrame or array-like
The other object to compute the dot product with its columns.

Returns:
scalar, Series or numpy.ndarray
Return the dot product of the Series and other if other is a Series, the Series of the dot product of Series and each rows of other if other is a DataFrame or a numpy.ndarray between the Series and each columns of the numpy array.

139-2、参数

139-2-1、other(必须)：表示另一个Series或类似对象，它应该与调用该方法的Series具有相同的索引，如果other是一个标量，则结果将是标量乘以Series中的每个值。

139-3、功能

用于计算两个Series对象之间的点积(内积)，该方法对于数值计算非常有用，尤其是在处理矢量和数学运算时。

139-4、返回值

返回一个标量，表示两个Series之间的点积。如果Series的索引不匹配，方法将根据索引进行对齐，并仅计算匹配的部分。

139-5、说明

使用场景：

139-5-1、数学和统计分析：经常用于计算向量的点积，常见于机器学习和数据分析中的特征计算。

139-5-2、线性代数：在处理矩阵和线性方程组时，也可以使用这个方法进行计算。

139-6、用法

139-6-1、数据准备

无

139-6-2、代码示例

# 139、pandas.Series.dot方法
import pandas as pd
s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s2 = pd.Series([4, 5, 6], index=['a', 'b', 'c'])
dot_product = s1.dot(s2)
print(dot_product)

139-6-3、结果输出

# 139、pandas.Series.dot方法
# 32 （1*4 + 2*5 + 3*6 = 32）

140、pandas.Series.apply方法

140-1、语法

# 140、pandas.Series.apply方法
pandas.Series.apply(func, convert_dtype=_NoDefault.no_default, args=(), *, by_row='compat', **kwargs)
Invoke function on values of Series.

Can be ufunc (a NumPy function that applies to the entire Series) or a Python function that only works on single values.

Parameters:
funcfunction
Python function or NumPy ufunc to apply.

convert_dtypebool, default True
Try to find better dtype for elementwise function results. If False, leave as dtype=object. Note that the dtype is always preserved for some extension array dtypes, such as Categorical.

Deprecated since version 2.1.0: convert_dtype has been deprecated. Do ser.astype(object).apply() instead if you want convert_dtype=False.

argstuple
Positional arguments passed to func after the series value.

by_rowFalse or “compat”, default “compat”
If "compat" and func is a callable, func will be passed each element of the Series, like Series.map. If func is a list or dict of callables, will first try to translate each func into pandas methods. If that doesn’t work, will try call to apply again with by_row="compat" and if that fails, will call apply again with by_row=False (backward compatible). If False, the func will be passed the whole Series at once.

by_row has no effect when func is a string.

New in version 2.1.0.

**kwargs
Additional keyword arguments passed to func.

Returns:
Series or DataFrame
If func returns a Series object the result will be a DataFrame.

140-2、参数

140-2-1、func(必须)：表示要应用的函数，可以是自定义的函数或NumPy的一些函数。

140-2-2、convert_dtype(可选)：用于指示是否在应用函数后转换返回值的数据类型。

140-2-3、args(可选，默认值为'()')：一个元组，传递给函数的额外位置参数。

140-2-4、by_row(可选，默认值为'compat')：指示按行应用还是按列应用，通常不需要修改此参数。

140-2-5、**kwargs(可选)：其他关键字参数，将传递给函数func。

140-3、功能

用于将给定的函数应用于Series中的每个元素，该方法可以用于数据转换、数据清洗或特定的计算任务。

140-4、返回值

返回一个Series，其中包含函数func应用于原始Series中每个元素的结果。

140-5、说明

使用场景：

140-5-1、数据清洗：对于数据中的特定值进行转换或清理。

140-5-2、自定义计算：应用复杂的计算条件。

140-5-3、数据格式化：对数据进行格式化或转换为另一种类型。

140-6、用法

140-6-1、数据准备

无

140-6-2、代码示例

# 140、pandas.Series.apply方法
# 140-1、使用apply方法应用函数
# 定义一个简单的函数
import pandas as pd
# 创建一个示例Series
s = pd.Series([1, 2, 3, 4, 5])
def square(x):
    return x ** 2
result = s.apply(square)
print(result, end='\n\n')

# 140-2、使用lambda函数
# 定义一个简单的函数
import pandas as pd
# 创建一个示例Series
s = pd.Series([1, 2, 3, 4, 5])
def square(x):
    return x ** 2
result_lambda = s.apply(lambda x: x + 10)
print(result_lambda, end='\n\n')

# 140-3、使用额外的参数
# 定义一个简单的函数
import pandas as pd
# 创建一个示例Series
s = pd.Series([1, 2, 3, 4, 5])
def square(x):
    return x ** 2
def multiply(x, factor):
    return x * factor
result_with_args = s.apply(multiply, args=(10,))
print(result_with_args)

140-6-3、结果输出

# 140、pandas.Series.apply方法
# 140-1、使用apply方法应用函数
# 0     1
# 1     4
# 2     9
# 3    16
# 4    25
# dtype: int64

# 140-2、使用lambda函数
# 0    11
# 1    12
# 2    13
# 3    14
# 4    15
# dtype: int64

# 140-3、使用额外的参数
# 0    10
# 1    20
# 2    30
# 3    40
# 4    50
# dtype: int64