Pandas学习

最新推荐文章于 2024-01-06 09:53:14 发布

qq_31655309

最新推荐文章于 2024-01-06 09:53:14 发布

阅读量83

点赞数

分类专栏： python 文章标签： python

本文链接：https://blog.csdn.net/qq_31655309/article/details/105504031

版权

python 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

DataFrame

性质：既是矩阵，又是字典
创建：矩阵创建，字典创建
基本操作：

基本
type
index
columns
统计
describe（） #只能统计数字
T # 矩阵转制
sort_index # 排序
sort_values # 排序

查询

字典查询和切片，定位，筛选
df['A'],df.A # 查询
df[0:3],df['2017':"2019"] # 切片
loc # 类似于sql，用标签筛选
iloc # 类似于numpy，用数字切片
df[df.A<8] # 类似于sql，筛选数据

修改

df.loc['marry','age'] = 20 # 
df.iloc[2,2] = 2222 # 
df.B[df.A>4] = 0 # 每一行，若属性A大于4，则属性B被改为0
df['F'] = np.nan
df['E'] = pd.Series([1,2,3], index=pd.date_range('20130101', periods=3))

处理nan数据

df.dropna(axis=0, how='any') # 移除
df.fillna(value=0) # 填充
df.isnull()    
np.any(df.isnull())

导入导出

# csv,excel,pickle,html
# read_csv
# to_csv
data = pd.read_csv("student.csv", sep=",")
data.to_pickle("student.pickle")

合并

concat

# df1,df2,df3
# 上下合并
df = pd.concat([df1,df2,df3], axis=0, ignore_index=True)
# join {'inner', 'outer'}
# inner最后的表格属性只有并集，outer是交集
df = pd.concat([df1,df2,df3], join='inner', ignore_index=True)
# 左右合并
res = pd.concat([df1, df2], axis=1, join_axes=[df1.index]) # 以表1的索引为主
res = pd.concat([df1, df2], axis=1)

# append 添加一个item，也可以添加一个表格
s1 = pd.Series([1,2,3,4], index=["a","b","c","d"])
df1.append(s1, ignore_index=True)
df1.append(df2, ingnore_index=True)

merge

# 以某个值为主键合并
res = pd.merge(left, right, on="key")
# 有时候需要两个参数才能唯一确定主键,可选项
# how={inner,outer,left,right} 合并项的方式
# indicator={True,False} 是否两个表都贡献了数据
# left_index,right_index
res = pd.merge(left, right, on=["key1","key2"], how='inner')
# 为了区别不同表的相同属性，可以给同名属性加后缀
res = pd.merge(left, right, on="key", suffixes=['_boys', '_girls'], how='inner')

join：和merge功能一样
略

pandas可视化

plot的方法：bar，hist，box，kde，area，scatter，hexbin，pie
展示序列数据

import matplotlib.pyplot as plt
data = pd.Series(np.random.randn(1000),index=np.arange(1000))
data = data.cumsum() #积分
data.plot() # 可设置颜色粗细等
plt.show()

展示DataFrame数据

import matplotlib.pyplot as plt
data = pd.Series(np.random.randn(1000,4),index=np.arange(1000),columns=list("ABCD"))
data = data.cumsum() #积分
data.plot() # 展示plot
ax = data.plot.scatter(x="A",y="B",color='DarkBlue',label="class 1")
data.plot.scatter(x="A",y="C",color='DarkGreen',label="class 2", ax=ax)
plt.show()

qq_31655309

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Pandas学习

DataFrame性质：既是矩阵，又是字典创建：矩阵创建，字典创建基本操作：基本typeindexcolumns统计describe（） #只能统计数字T # 矩阵转制sort_index # 排序sort_values # 排序查询字典查询和切片，定位，筛选df['A'],df.A # 查询df[0:3],df['2017':"2019"] # 切片loc #...
复制链接

扫一扫