Python库pandas之一

IT_Beijing_BIT

于 2024-09-30 06:24:42 发布

阅读量664

点赞数 12

分类专栏： Python 程序设计语言文章标签： python pandas 开发语言

本文链接：https://blog.csdn.net/IT_Beijing_BIT/article/details/142486615

版权

程序设计语言同时被 2 个专栏收录

29 篇文章 0 订阅

订阅专栏

Python

20 篇文章 0 订阅

订阅专栏

Python库pandas之一

基本数据结构
- Series

基本数据结构

Pandas提供了两种类型的类来处理数据：

Series：保存任何类型数据的一维数组。例如整数、字符串、Python对象等。
DataFrame：一种二维数据结构，用于保存数据，如二维数组，或具有行和列的表格。

Series

带有轴标签的一维 ndarray。标签不必是唯一的，但必须是可哈希类型。该对象支持基于整数和基于标签的索引，并提供了许多用于执行涉及索引的操作的方法。 ndarray 中的统计方法已被重写，以自动排除缺失数据。

系列（+、-、/、*、**）之间的运算，根据关联的索引值，进行对齐，它们不需要具有相同的长度。结果索引将是两个索引的排序并集。

构造器

词法：pandas.Series(data=None, index=None, dtype=None, name=None, copy=None, fastpath=<no_default>)

参数说明

data，该参数类型是类似数组、可迭代、字典或标量值。包含存储在Series中的数据。如果数据是字典，则保持参数顺序。
index，该参数类型是类似数组或索引（1的）。值必须是可哈希的，并且与参数data具有相同的长度。允许使用非唯一索引值。
如果未提供index，将默认为 RangeIndex (0, 1, 2, …, n)。
如果数据是类似字典，且索引为None，则数据中的键，将用作索引。
如果索引不是None，则生成的 Series 将使用索引值，重新索引。
dtype，该参数类型是字符串、numpy.dtype或ExtensionDtype，是可选的。是输出Series的数据类型。如果未指定，则将从数据推导出来。
name，该参数类型是Hashable，默认值为None。赋予系列的名称。
copy，该参数类型是bool, 默认值为False。复制输入数据。仅影响 Series 或 1d ndarray 输入。

参数data是字典的Series

>>> d = {"a":1,"b":2,"c":3}
>>>> print(type(d))
<class 'dict'>
>>> s1 = pd.Series(d)
>>> print(type(s1))
<class 'pandas.core.series.Series'>
>>> print(s1)
a    1
b    2
c    3
dtype: int64

参数data是元组的Series

>>> l = (0.1,0.2,0.3,0.5)
>>> print(type(l))
<class 'tuple'>
>>> s2 = pd.Series(l)
>>> print(type(s2))
<class 'pandas.core.series.Series'>
>>> print(s2)
0    0.1
1    0.2
2    0.3
3    0.5
dtype: float64
>>> print(s2[1])
0.2

参数data是列表的Series

>>> a = ["a", "b", "c", "d"]
>>> print(type(a))
<class 'list'>
>>> s3 = pd.Series(a)
>>> print(type(s3))
<class 'pandas.core.series.Series'>
>>> print(s3)
0    a
1    b
2    c
3    d
dtype: object
>>> print(s3[1])
b
>>> print(type(s3[1]))
<class 'str'>

属性

属性	说明
T	返回转置，根据定义，它是 self。
array	支持该系列或索引的数据的 ExtensionArray。
at	访问行/列标签对的单个值。
attrs	该数据集的全局属性字典。
axes	返回行轴标签的列表。
dtype	返回基础数据的 dtype 对象。
empty	指示Series/DataFrame是否为空
flags	获取与此 pandas 对象关联的属性。
hasnans	如果存在任何 NaN，则返回 True。
iat	按整数位置访问行/列对的单个值。
iloc	（已弃用）纯粹基于整数位置的索引，用于按位置选择。
index	系列的索引（轴标签）。
is_monotonic_decreasing	如果对象中的值单调递减，则返回布尔值。
is_monotonic_increasing	如果对象中的值单调递增，则返回布尔值。
is_unique	如果对象中的值是唯一的，则返回布尔值。
loc	通过标签或布尔数组访问一组行和列。
name	返回系列的名称。
nbytes	返回基础数据中的字节数。
ndim	根据定义，基础数据的维度数为 1。
shape	返回基础数据形状的元组。
size	返回基础数据中的元素数量。
values	根据数据类型将 Series 返回为 ndarray 或 ndarray-like。

属性应用

Series的array属性

>>> a = ["a", "b", "c", "d"]
>>> s3 = pd.Series(a)
>>> s3.array
<NumpyExtensionArray>
['a', 'b', 'c', 'd']
Length: 4, dtype: object

Series的axes属性

>>> l = ["a", "b", "c", "d","e"]
>>> s3 = pd.Series(l)
>>> s = pd.Series(l)
>>> s.axes
[RangeIndex(start=0, stop=5, step=1)]

Series的nbytes，shape，values属性

>>> l=['1','2','3','4','5','6']
>>> s = pd.Series(l)
>>> s.nbytes
48
>>> s.shape
(6,)
>>> s.values
array(['1', '2', '3', '4', '5', '6'], dtype=object)

函数

函数	说明
abs()	返回包含每个元素的绝对数值的 Series/DataFrame。
add(other[, level, fill_value, axis])	返回系列和其他元素的加法（二元运算符加）。
add_prefix(prefix[, axis])	带有字符串前缀的前缀标签。
add_suffix(suffix[, axis])	带有字符串后缀的后缀标签。
agg([func, axis])	使用指定轴上的一项或多项操作进行聚合。
aggregate([func, axis])	使用指定轴上的一项或多项操作进行聚合。
align(other[, join, axis, level, copy, …])	使用指定的连接方法将两个对象在其轴上对齐。
all([axis, bool_only, skipna])	返回是否所有元素都为 True（可能在轴上）。
any(*[, axis, bool_only, skipna])	返回任何元素是否为 True，可能在轴上
apply(func[, convert_dtype, args, by_row])	对 Series 的值调用函数。
argmax([axis, skipna])	返回系列中最大值的 int 位置。
argmin([axis, skipna])	返回系列中最小值的 int 位置。
argsort([axis, kind, order, stable])	返回对 Series 值进行排序的整数索引。
asfreq(freq[, method, how, normalize, …])	将时间序列转换为指定频率。
asof(where[, subset])	返回最后一行，where 之前没有任何 NaN。
astype(dtype[, copy, errors])	将 pandas 对象转换为指定的 dtype dtype。
at_time(time[, asof, axis])	选择一天中特定时间的值（例如上午 9:30）。
autocorr([lag])	计算滞后 N 自相关。
backfill(*[, axis, inplace, limit, downcast])	（已弃用）通过使用下一个有效观察值来填充 NA/NaN 值以填补空白。
between(left, right[, inclusive])	返回相当于左 <= 系列 <= 右的布尔系列。
between_time(start_time, end_time[, …])	选择一天中特定时间之间的值（例如上午 9:00-9:30）。
bfill(*[, axis, inplace, limit, limit_area, …])	通过使用下一个有效观察来填补空白来填充 NA/NaN 值。
bool()	（已弃用）返回单个元素 Series 或 DataFrame 的 bool。
case_when(caselist)	替换条件为 True 的值。
clip([lower, upper, axis, inplace])	在输入阈值处修剪值。
combine(other, func[, fill_value])	根据 func 将系列与系列或标量组合。
combine_first(other)	使用“other”中相同位置的值更新空元素。
compare(other[, align_axis, keep_shape, …])	与其他系列进行比较并显示差异。
convert_dtypes([infer_objects, …])	使用支持 pd.NA 的数据类型将列转换为最佳可能的数据类型。
copy([deep])	复制该对象的索引和数据。
corr(other[, method, min_periods])	计算与其他系列的相关性，排除缺失值。
count()	返回系列中非 NA/null 观测值的数量。
cov(other[, min_periods, ddof])	计算系列的协方差，排除缺失值。
cummax([axis, skipna])	返回 DataFrame 或 Series 轴上的累积最大值。
cummin([axis, skipna])	返回 DataFrame 或 Series 轴上的累积最小值。
cumprod([axis, skipna])	返回 DataFrame 或 Series 轴上的累积乘积。
cumsum([axis, skipna])	返回 DataFrame 或 Series 轴上的累积和。
describe([percentiles, include, exclude])	生成描述性统计数据。
diff([periods])	元素的第一个离散差分。
div(other[, level, fill_value, axis])	返回系列和其他元素的浮点除法（二元运算符 truediv）。
divide(other[, level, fill_value, axis])	返回系列和其他元素的浮点除法（二元运算符 truediv）。
divmod(other[, level, fill_value, axis])	返回整数除法以及系列和其他元素的模（二元运算符 divmod）。
dot(other)	计算系列和其他列之间的点积。
drop([labels, axis, index, columns, level, …])	返回已删除指定索引标签的系列。
drop_duplicates(*[, keep, inplace, ignore_index])	返回已删除重复值的系列。
droplevel(level[, axis])	返回系列/数据帧，并删除请求的索引/列级别。
dropna(*[, axis, inplace, how, ignore_index])	返回删除了缺失值的新系列。
duplicated([keep])	指示重复的系列值。
eq(other[, level, fill_value, axis])	返回等于系列和其他元素，按元素（二元运算符 eq）。
equals(other)	测试两个对象是否包含相同的元素。
ewm([com, span, halflife, alpha, …])	提供指数加权 (EW) 计算。
expanding([min_periods, axis, method])	提供扩展窗口计算。
explode([ignore_index])	将类似列表的每个元素转换为一行。
factorize([sort, use_na_sentinel])	将对象编码为枚举类型或分类变量。
ffill(*[, axis, inplace, limit, limit_area, …])	通过将最后一个有效观测值传播到下一个有效观测值来填充 NA/NaN 值。
fillna([value, method, axis, inplace, …])	使用指定的方法填充 NA/NaN 值。
filter([items, like, regex, axis])	根据指定的索引标签对数据帧行或列进行子集化。
first(offset)	（已弃用）根据日期偏移选择时间序列数据的初始周期。
first_valid_index()	返回第一个非 NA 值的索引，如果未找到非 NA 值，则返回 None。
floordiv(other[, level, fill_value, axis])	返回系列和其他元素的整数除法（二元运算符 Floordiv）。
ge(other[, level, fill_value, axis])	返回大于或等于系列和其他元素，按元素（二元运算符 ge）。
get(key[, default])	从给定键的对象中获取项目（例如：DataFrame 列）。
groupby([by, axis, level, as_index, sort, …])	使用映射器或一系列列对系列进行分组。
gt(other[, level, fill_value, axis])	返回大于系列和其他元素的值（二元运算符 gt）。
head([n])	返回前 n 行。
hist([by, ax, grid, xlabelsize, xrot, …])	使用 matplotlib 绘制输入序列的直方图。
idxmax([axis, skipna])	返回最大值的行标签。
idxmin([axis, skipna])	返回最小值的行标签。
infer_objects([copy])	尝试为对象列推断更好的数据类型。
info([verbose, buf, max_cols, memory_usage, …])	打印系列的简明摘要。
interpolate([method, axis, limit, inplace, …])	使用插值方法填充 NaN 值。
isin(values)	Series 中的元素是否包含在值中。
isna()	检测缺失值。
isnull()	Series.isnull 是Series.isna 的别名。
item()	以 Python 标量形式返回基础数据的第一个元素。
items()	惰性迭代（索引，值）元组。
keys()	返回索引的别名。
kurt([axis, skipna, numeric_only])	返回请求轴上的无偏峰度。
kurtosis([axis, skipna, numeric_only])	返回请求轴上的无偏峰度。
last(offset)	（已弃用）根据日期偏移选择时间序列数据的最终周期。
last_valid_index()	返回最后一个非 NA 值的索引，如果未找到非 NA 值，则返回 None。
le(other[, level, fill_value, axis])
lt(other[, level, fill_value, axis])	返回小于系列和其他元素的值（二元运算符 lt）。
map(arg[, na_action])	根据输入映射或函数映射 Series 的值。
mask(cond[, other, inplace, axis, level])	替换条件为 True 的值。
max([axis, skipna, numeric_only])	返回请求轴上的最大值。
mean([axis, skipna, numeric_only])	返回请求轴上的值的平均值。
median([axis, skipna, numeric_only])	返回请求轴上的值的中位数。
memory_usage([index, deep])	返回系列的内存使用情况。
min([axis, skipna, numeric_only])	返回请求轴上的最小值。
mod(other[, level, fill_value, axis])	返回系列和其他元素的模（二元运算符 mod）。
mode([dropna])	返回系列的模式。
mul(other[, level, fill_value, axis])	返回系列和其他元素的乘法（二元运算符 mul）。
multiply(other[, level, fill_value, axis])	返回系列和其他元素的乘法（二元运算符 mul）。
ne(other[, level, fill_value, axis])	返回不等于系列和其他元素，按元素（二元运算符 ne）。
nlargest([n, keep])	返回最大的 n 个元素。
notna()	检测现有（非缺失）值。
notnull()	Series.notnull 是Series.notna 的别名。
nsmallest([n, keep])	返回最小的 n 个元素。
nunique([dropna])	返回对象中唯一元素的数量。
pad(*[, axis, inplace, limit, downcast])	（已弃用）通过将最后一个有效观测值传播到下一个有效观测值来填充 NA/NaN 值。
pct_change([periods, fill_method, limit, freq])	当前元素与先前元素之间的分数变化。
pipe(func, args, *kwargs)	应用需要 Series 或 DataFrame 的可链接函数。
pop(item)	返回系列中的物品和掉落物。
pow(other[, level, fill_value, axis])	返回级数和其他元素的指数幂（二元运算符 pow）。
prod([axis, skipna, numeric_only, min_count])	返回请求轴上的值的乘积。
product([axis, skipna, numeric_only, min_count])	返回请求轴上的值的乘积。
quantile([q, interpolation])	返回给定分位数的值。
radd(other[, level, fill_value, axis])	返回系列和其他元素的加法（二元运算符 radd）。
rank([axis, method, numeric_only, …])	计算沿轴的数值数据排名（1 到 n）。
ravel([order])	（已弃用）将展平的基础数据作为 ndarray 或 ExtensionArray 返回。
rdiv(other[, level, fill_value, axis])	返回系列和其他元素的浮点除法（二元运算符 rtruediv）。
rdivmod(other[, level, fill_value, axis])	返回整数除法以及系列和其他元素的模（二元运算符 rdivmod）。
reindex([index, axis, method, copy, level, …])	通过可选的填充逻辑使系列符合新索引。
reindex_like(other[, method, copy, limit, …])	返回一个与其他对象具有匹配索引的对象。
rename([index, axis, copy, inplace, level, …])	更改系列索引标签或名称。
rename_axis([mapper, index, axis, copy, inplace])	设置索引或列的轴名称。
reorder_levels(order)	使用输入顺序重新排列索引级别。
repeat(repeats[, axis])	重复系列的元素。
replace([to_replace, value, inplace, limit, …])	将 to_replace 中给出的值替换为 value。
resample(rule[, axis, closed, label, …])	对时间序列数据重新采样。
reset_index([level, drop, name, inplace, …])	生成一个新的 DataFrame 或 Series 并重置索引。
rfloordiv(other[, level, fill_value, axis])	返回系列和其他元素的整数除法（二元运算符 rfloordiv）。
rmod(other[, level, fill_value, axis])	返回系列和其他元素的模（二元运算符 rmod）。
rmul(other[, level, fill_value, axis])	返回系列和其他元素的乘法（二元运算符 rmul）。
rolling(window[, min_periods, center, …])	提供滚动窗口计算。
round([decimals])	系列中的每个值四舍五入到给定的小数位数。
rpow(other[, level, fill_value, axis])	返回级数和其他元素的指数幂（二元运算符 rpow）。
rsub(other[, level, fill_value, axis])	返回系列和其他元素的减法（二元运算符 rsub）。
rtruediv(other[, level, fill_value, axis])	返回系列和其他元素的浮点除法（二元运算符 rtruediv）。
sample([n, frac, replace, weights, …])	从对象轴返回项目的随机样本。
searchsorted(value[, side, sorter])	查找应插入元素，以维持顺序的索引。
sem([axis, skipna, ddof, numeric_only])	返回请求轴上平均值的无偏标准误差。
set_axis(labels, *[, axis, copy])	将所需索引分配给给定轴。
set_flags(*[, copy, allows_duplicate_labels])	返回带有更新标志的新对象。
shift([periods, freq, axis, fill_value, suffix])	使用可选的时间频率将索引移动所需的周期数。
skew([axis, skipna, numeric_only])	返回请求轴上的无偏斜。
sort_index(*[, axis, level, ascending, …])	按索引标签对系列进行排序。
sort_values(*[, axis, ascending, inplace, …])	按值排序。
squeeze([axis])	将一维轴对象压缩为标量。
std([axis, skipna, ddof, numeric_only])	返回请求轴上的样本标准差。
sub(other[, level, fill_value, axis])	返回系列和其他元素的减法（二元运算符 sub）。
subtract(other[, level, fill_value, axis])	返回系列和其他元素的减法（二元运算符 sub）。
sum([axis, skipna, numeric_only, min_count])	返回请求轴上的值的总和。
swapaxes(axis1, axis2[, copy])	（已弃用）适当地互换轴和交换值轴。
swaplevel([i, j, copy])	交换 MultiIndex 中的级别 i 和 j。
tail([n])	返回最后 n 行。
take(indices[, axis])	返回沿轴给定位置索引中的元素。
to_clipboard(*[, excel, sep])	将对象复制到系统剪贴板。
to_csv([path_or_buf, sep, na_rep, …])	将对象写入逗号分隔值 (csv) 文件。
to_dict(*[, into])	将 Series 转换为 {label -> value} 字典或类似字典的对象。
to_excel(excel_writer, *[, sheet_name, …])	将对象写入 Excel 工作表。
to_frame([name])	将系列转换为数据帧。
to_hdf(path_or_buf, *, key[, mode, …])	使用 HDFStore 将包含的数据写入 HDF5 文件。
to_json([path_or_buf, orient, date_format, …])	将对象转换为 JSON 字符串。
to_latex([buf, columns, header, index, …])	将对象渲染为 LaTeX 表格、长表或嵌套表。
to_list()	返回值的列表。
to_markdown([buf, mode, index, storage_options])	以 Markdown 友好格式打印系列。
to_numpy([dtype, copy, na_value])	表示该系列或索引中的值的 NumPy ndarray。
to_period([freq, copy])	将系列从日期时间索引转换为周期索引。
to_pickle(path, *[, compression, protocol, …])	将对象序列化到文件。
to_sql(name, con, *[, schema, if_exists, …])	将存储在 DataFrame 中的记录写入 SQL 数据库。
to_string([buf, na_rep, float_format, …])	渲染系列的字符串表示形式。
to_timestamp([freq, how, copy])	在周期开始时转换为时间戳的 DatetimeIndex。
to_xarray()	从 pandas 对象返回一个 xarray 对象。
tolist()	返回值的列表。
transform(func[, axis])	调用 func 来生成一个与 self 具有相同轴形状的 Series。
transpose(args, *kwargs)	返回转置，根据定义，它是 self。
truediv(other[, level, fill_value, axis])	返回系列和其他元素的浮点除法（二元运算符 truediv）。
truncate([before, after, axis, copy])	在某个索引值之前和之后截断 Series 或 DataFrame。
tz_convert(tz[, axis, level, copy])	将 tz 感知轴转换为目标时区。
tz_localize(tz[, axis, level, copy, …])	将 Series 或 DataFrame 的 tz-naive 索引本地化到目标时区。
unique()	返回 Series 对象的唯一值。
unstack([level, fill_value, sort])	Unstack，也称为pivot，使用MultiIndex的Series来生成DataFrame。
update(other)	使用传递的系列中的值就地修改系列。
value_counts([normalize, sort, ascending, …])	返回包含唯一值计数的系列。
var([axis, skipna, ddof, numeric_only])	返回请求轴上的无偏方差。
view([dtype])	（已弃用）创建系列的新视图。
where(cond[, other, inplace, axis, level])	替换条件为 False 的值。
xs(key[, axis, level, drop_level])	从系列/数据帧返回横截面。

函数应用

add函数实列

>>> l=['1','2','3','5']
>>> s = pd.Series(l)
>>> s.add('abc')
0    1abc
1    2abc
2    3abc
3    5abc
dtype: object

将 pandas对象转换为指定的dtype。

>>> l=[1,2,3,5]
>>> s = pd.Series(l)
>>> s1 = s.astype('O')
>>> s1.dtype
dtype('O')
>>> s.dtype
dtype('int64')

将 pandas对象转换为列表。

>>> l1=[1,2,3,5]
>>> s = pd.Series(l1)
>>> l2 = s.to_list()
>>> print(type(s))
<class 'pandas.core.series.Series'>
>>> print(type(l2))
<class 'list'>

IT_Beijing_BIT

关注

12
点赞
踩
14

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录