1.Pandas概述
Pandas是Python的一个数据分析包,该工具为解决数据分析任务而创建。
Pandas纳入大量库和标准数据模型,提供高效的操作数据集所需的工具。
Pandas提供大量能使我们快速便捷地处理数据的函数和方法。
Pandas是字典形式,基于NumPy创建,让NumPy为中心的应用变得更加简单。
2.Pandas安装
3.Pandas引入
import pandas as pd
4.Pandas数据结构
4.1Series
import numpy as np
import pandas as pd
s=pd.Series([1,2,3,np.nan,5,6])
print(s)
----------执行以上程序,返回的结果为----------
0 1.0
1 2.0
2 3.0
3 NaN
4 5.0
5 6.0
dtype: float64
4.2DataFrame
DataFrame是表格型数据结构,包含一组有序的列,每列可以是不同的值类型。DataFrame有行索引和列索引,可以看成由Series组成的字典。
import numpy as np
import pandas as pd
dates=pd.date_range("2019-08-01",periods=6)
pd=pd.DataFrame(np.random.randn(6,4),index=dates,columns=["A","B","C","D"])
print("输出6行4列的表格:")
print(pd)
print(" ")
print("输出第二列:")
print(pd["B"])
print(" ")
----------执行以上程序,返回的结果为----------
输出6行4列的表格:
A B C D
2019-08-01 0.796050 -0.383286 -1.465294 -0.272321
2019-08-02 -1.431981 -0.875381 1.371449 0.321703
2019-08-03 -1.497636 1.258925 -1.374210 -0.765626
2019-08-04 2.518305 0.125094 2.647512 -0.024748
2019-08-05 -0.319238 0.395384 -0.582052 -0.396132
2019-08-06 -0.519434 1.873216 1.685524 -1.493000
输出第二列:
2019-08-01 -0.383286
2019-08-02 -0.875381
2019-08-03 1.258925
2019-08-04 0.125094
2019-08-05 0.395384
2019-08-06 1.873216
Freq: D, Name: B, dtype: float64
-------------------------------------------
import numpy as np
import pandas as pd
from datetimeimport datetime as dt
print("通过字典创建DataFrame:")
df_1=pd.DataFrame({"A":1.0,
"B":pd.Timestamp(2019,8,19),
"C":pd.Series(1,index=list(range(4)),dtype="float32"),
"D":np.array([3]*4,dtype="int32"),
"E":pd.Categorical(["test","train","test","train"]),
"F":"foo"})
print(df_1)
print(" ")
print("返回每列的数据类型:")
print(df_1.dtypes)
print(" ")
print("返回行的序号:")
print(df_1.index)
print(" ")
print("返回列的序号名字:")
print(df_1.columns)
print(" ")
print("把每个值进行打印出来:")
print(df_1.values)
print(" ")
print("数字总结:")
print(df_1.describe())
print(" ")
print("翻转数据:")
print(df_1.T)
print(" ")
print("按第一列进行排序:")
#axis等于1按列进行排序 如ABCDEFG 然后ascending倒叙进行显示
print(df_1.sort_index(1,ascending=False))
print(" ")
print("按某列的值进行排序:")
print(df_1.sort_values("E"))
print(" ")
----------执行以上程序,返回的结果为----------
通过字典创建DataFrame:
A B C D E F
0 1.0 2019-08-19 1.0 3 test foo
1 1.0 2019-08-19 1.0 3 train foo
2 1.0 2019-08-19 1.0 3 test foo
3 1.0 2019-08-19 1.0 3 train foo
返回每列的数据类型:
A float64
B datetime64[ns]
C float32
D int32
E category
F object
dtype: object
返回行的序号:
Int64Index([0, 1, 2, 3], dtype="int64")
返回列的序号名字:
Index(["A", "B", "C", "D", "E", "F"], dtype="object")
把每个值进行打印出来:
[[1.0 Timestamp("2019-08-19 00:00:00") 1.0 3 "test" "foo"]
[1.0 Timestamp("2019-08-19 00:00:00") 1.0 3 "train" "foo"]
[1.0 Timestamp("2019-08-19 00:00:00") 1.0 3 "test" "foo"]
[1.0 Timestamp("2019-08-19 00:00:00") 1.0 3 "train" "foo"]]
数字总结:
A C D
count 4.0 4.0 4.0
mean 1.0 1.0 3.0
std 0.0 0.0 0.0
min 1.0 1.0 3.0
25% 1.0 1.0 3.0
50% 1.0 1.0 3.0
75% 1.0 1.0 3.0
max 1.0 1.0 3.0
翻转数据:
0 1 2 3
A 1 1 1 1
B 2019-08-19 00:00:00 2019-08-19 00:00:00 2019-08-19 00:00:00 2019-08-19 00:00:00
C 1 1 1 1