pandas学习日记(五)

最新推荐文章于 2024-09-14 21:53:48 发布

Mystic Musings

最新推荐文章于 2024-09-14 21:53:48 发布

阅读量56

点赞数 1

分类专栏： Pandas学习日记文章标签： pandas 学习

本文链接：https://blog.csdn.net/qq_61249949/article/details/134491864

版权

Pandas学习日记专栏收录该内容

7 篇文章 0 订阅

订阅专栏

本文介绍了如何在Pandas中对Series和DataFrame进行排序，包括单列排序、多列排序，以及涉及字符串处理、索引操作，如随机顺序索引、有序索引排序和数据对齐的方法。还提及了使用Sklearn库打乱数据和优化运行效率的技巧。

摘要由CSDN通过智能技术生成

排序

Series和DataFrame排序

import pandas as pd
df = pd.read_csv("文件")

# Series 排序
df["列字段"].sort_values(ascending=bool) # bool=True 表示升序 False表示降序

#DataFrame 排序 根据多个字段排序
df.sort_values(by=["字段1","字段2","字段3",...],ascending=[bool1,bool2,bool3,...])

字符串处理

# 字符串对象
df["字段"].str
# 判断数字
df["字段"].str.isnumeric()
# 字符串长度
df["字段"].str.len()
# 字符串切片
df["字段"].str[x:y] 
df['字段'].str.slice(x,y)
# 字符串替换
df["字段"].str.replace([被换内容],替换内容)

index索引属性

使用合理使用索引可以加快运行速度

from sklearn.utils import shuffle #引入打乱序的库

#####   随机顺序索引
#设置索引
df.set_index(字段名,inplace=True,drop=False)
# 打乱原数据
df_shuffle = shuffle(df)
# 查看索引是否递增
df_shuffle.index.is_monotonic_increasing
# 使用魔法函数查看运行状况
%timeit df_shuffle.loc[索引字段]


##### 使用index排序
# 排序
df_sorted = df_shuffle.sort_index()
# 查看索引 
df_sorted.index.is_monotonic_increasing
# 检测索引是否唯一 (查找会用哈希搜索)
df_sorted.index.is_unique
# 使用魔法函数查看运行状况
%timeit df_shuffle.loc[索引字段]



##### index数据对齐
s1 = pd.Series([2,3,4],index=list("acd"))
s2 = pd.Series([2,3,4],index=list("bcd"))
# 两个相加会自动后会补充没有的数值 类似数据库外连接
s1+s2
# 不存在的置为0 写法
s1.add(s2, fill_value=0)