十分钟pandas

最新推荐文章于 2020-08-13 22:40:03 发布

weixin_43942919

最新推荐文章于 2020-08-13 22:40:03 发布

阅读量66

点赞数

本文链接：https://blog.csdn.net/weixin_43942919/article/details/85252265

版权

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
"""
一、如何创建数据
1.创建一个Series通过传递值的列表，创建一个默认的整数索引：
如果不加入,np.nan是int64,加入了则是float64
"""
s = pd.Series([1,3,5,np.nan,6,8])
# print(s)

"""
2.DataFrame通过传递带有日期时间索引和标记列的Numpy数组来创建
index=索引，columnns=列标题，np.random.randn(6,4)创建6行4列
重点：pd.date_range("begin",periods=numbe) 生成日期序列
索引，列标题数目要和数组相对应，要不然报错。
"""
dates = pd.date_range("20130101",periods=6)
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list("ABCD"))
# print(df)

"""
3.DateFrame通过传递可以转换为类似系列的对象的dict来创建
可以理解字典的键对应每一列的标题，字典的值对应每行的内容。列通过键确定，值需要通过Series方法的index来确定。
简单来理解，通过字典的形式建立DatFrame需要确定行和列，对应的传递值就可以。
"""
df2 = pd.DataFrame({"A":1.,
				    "B":pd.Timestamp("20130102"),
				    "C":pd.Series(1,dtype="float32",index=[1,2,3,4]),
				    "D":np.array([3]*4,dtype="int32"),
				    "E":pd.Categorical(["test","train","test","train"]),
				    "F":"foo",
				   })
# print(df2)

"""
二、查看数据
1.查看顶部和尾部，查看索引、列标题、数据内容,快速统计信息
"""
# print(df2.head(1))
# print(df2.tail(1))
# print(df.index)
# print(df.columns)
# print(df.values)
# print(df.describe())
"""
2.置换数据，按轴排序，按值排序
"""
# print(df.T)
# print(df.sort_index(axis=1,ascending=False))
# print(df.sort_values(axis=0,by="B"))

"""
三、获得数据
1.选择列、切片、
"""
#选择列数据
# print(df["A"])
#选择行数据
# print(df[0:3])
#按索引名称截取选择数据,切片操作
# print(df["20130103":"20130104"])
#按标签选取数据 df.loc[dates[:]]是全部数据
# print(df.loc[dates[0:3]])
# print(df.loc[dates,["A","B"]])
# print(df.loc["20130102":"20130104",["A","B","C"]])
# print(df.loc["20130103",["A","B"]])
# print(df.at[dates[0],"A"])
"""
2.使用iloc使用位置进行切片
"""
# print(df.iloc[0,0])
# print(df.iloc[0:1,:])
# print(df.iat[0,0])
"""
3.布尔值判断进行索引,isin([list])判断。
判断和选择的方法都是 df[条件]，df.copy()复制。
"""
# print(df[df.A > 0])
# print(df[df > 0])
# df2 = df.copy()
# df2["E"] = ["one","one","two","three","four",np.nan]
# print(df2[df2["E"].isin(["two","four"])])
"""
4.修改DataFrame的值
"""
#新的列通过索引数据会自动对齐数值，间接修改
# s1 = pd.Series(range(1,7),index=pd.date_range("20130102",periods=6))
# print(s1)
# df["F"] = s1
# #通过位置或切片进行修改
# df.at[dates[0],"A"] = 0
# df.iat[0,1] = 0
# df.at[dates[0],"F"] = 5
# #使用np数组直接修改一列数据
# df.loc[:,"D"] = np.array([5] * len(df))
# print(df)
# #通过逻辑判断进行修改赋值
# df2 = df.copy()
# df2[df2 > 0] = -df2
# print(df2)

"""
四、处理缺少数据
1.reindex重建索引，允许更改，删除，添加。可以建立副本。
"""
# df1 = df.reindex(index=dates[0:4],columns=list(df.columns) + ["E"])
# df1.loc[:2,["E"]] = 1
# df1.dropna(how="any")
# #替换确实的值
# df1.fillna(value=5)
# #判断值是否为空
# pd.isna(df1)
# print(df1)
"""
五、数据操作
1.reindex重建索引，允许更改，删除，添加。可以建立副本。
"""
# print(df.mean(1))
# shift是位移的位移的意思
# s = pd.Series([1,3,5,np.nan,6,8],index=dates).shift(2)
# print(s)
#每个值上都加上s，加法
# print(df.sub(s,axis="index"))
#运用函数运算
# print(df)
# print(df.apply(lambda x:x.max()-x.min()))
#直方图,s产生10个0-7的随机数，s.value_counts()进行统计直方图
# s = pd.Series(np.random.randint(0,7,size=10))
# print(s.value_counts())
#字符串方法,python字符串方法都可以使用
# s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])
# print(s.str.title())
"""
六、合并追加等操作
将pandas对象连接在一起concat()：
1.concat,下面的例子df等同于pieces拼接。
"""
# df = pd.DataFrame(np.random.randn(10,4))
# print(df)
# pieces = [df[:3],df[3:7],df[7:]]
# print(pd.concat(pieces))

left = pd.DataFrame({"key":["foo","bar"],
					 "ival":[1,2],
					})
right = pd.DataFrame({"key":["foo","bar"],
					  "ival":[3,4]
					})
new = pd.merge(left,right,on="key")
print(left,"\n")
print(right,"\n")
print(new,"\n")

weixin_43942919

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
十分钟pandas

import pandas as pdimport numpy as npimport matplotlib.pyplot as plt"""一、如何创建数据1.创建一个Series通过传递值的列表，创建一个默认的整数索引：如果不加入,np.nan是int64,加入了则是float64"""s = pd.Series([1,3,5,np.nan,6,8])# print(s)...
复制链接

扫一扫