Python数据分析 4.pandas数据科学库
1.数据类型series
实质是带标签的一维数组,有键和值
import pandas as pd
t1 = pd.Series([1,2,3,4,5],index=list("abcde"))
temp_dict = {
"name":"Lucy","age":20,"tel":10086}
t2 = pd.Series(temp_dict)
print(t1[[1]])
print(t2[["name","age"]])
print(t1>2)
list(t1.index)[:2]
type(t1.values) #numpy.ndarray
b 2
dtype: int64
name Lucy
age 20
dtype: object
a False
b False
c True
d True
e True
dtype: bool
2.pandas读取外部数据
import pandas as pd
# pandas读取csv中的文件
df = pd.read_csv("D:/数据分析资料/day04/code/dogNames2.csv")
3.DataFrame的创建
DataFrame对象既有行索引,又有列索引
行索引,表明不同行,横向索引,叫index,0轴,axis=0
列索引,表名不同列,纵向索引,叫columns,1轴,axis=1
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(12).reshape(3,4),index=list("abc"),columns=list("WXYZ"))
print(df)
W X Y Z
a 0 1 2 3
b 4 5 6 7
c 8 9 10 11
(DataFrame可以看作一个Series容器)
import pandas as pd
import numpy as np
d1 = {
"name":["Lucy","Lily","Cindy"],"age":[20,42,37],"tel":["110","120","119"]}
df1 = pd.DataFrame(d1)
d2 = [{
"name":"Lucy","age":20,"tel":110},{
"name":"Lily","age":42,"tel":120},{
"name":"Cindy","age":37}]
df2 = pd.DataFrame(d2)
print(df1)
print(df2)
name age tel
0 Lucy 20 110
1 Lily 42 120
2 Cindy 37 119
name age tel
0 Lucy 20 110.0
1 Lily 42 120.0
2 Cindy 37 NaN
df.index #行索引
df.columns #列索引
df.values #对象值
df.shape #形状
df.dtypes #数据类型
df.ndim #数据维度
df.head()
df.tail()
df.info()
df.describe()
4.DataFrame的排序和索引
排序:
import pandas as pd
# pandas读取csv中的文件
df = pd.read_csv("D:/daily/大二下/量化/拜师/数据分析资料/day04/code/dogNames2.csv")
# print(df.info())
# print(df.head())
# dataframe中排序
df = df.sort_values(by="Count_AnimalName",ascending