Pandas模块的数据结构主要有两:1、Series ;2、DataFrame
series是一个一维数组,是基于NumPy的ndarray结构;
Pandas会默认用0到n-1来作为series的index,也可以自己指定index(可以把index理解为dict里面的key)
Series创建
1 pd.Series([list],index=[list])
import pandas ps pd
pd.Series([list],index=[list])
参数为list;index为可选参数,若不填写则默认index从0开始;若填写则index长度应该与value长度相等;
>>> import pandas as pd
>>> s=pd.Series([1,2,3,4,5])
>>> print (s)
0 1
1 2
2 3
3 4
4 5
2 pd.Series({dict})
>>> import pandas as pd
>>> s=pd.Series({'a':1,'b':2,'c':3,'f':4,'e':5})
>>> print (s)
a 1
b 2
c 3
f 4
e 5
Series取值
s[index]
or s[[index的list]]
取值操作类似数组,当取不连续的多个值时可以以list为参数
>>> import pandas as pd
>>> import numpy as np
>>> v = np.random.random_sample(10)
>>> s = pd.Series(v)
>>> v
array([0.43181969, 0.63676192, 0.40774042, 0.18825677, 0.40360617,
0.42005226, 0.75754098, 0.81468285, 0.24450568, 0.3806161 ])
>>> s
0 0.431820
1 0.636762
2 0.407740
3 0.188257
4 0.403606
5 0.420052
6 0.757541
7 0.814683
8 0.244506
9 0.380616
dtype: float64
>>> >>> s1 = s[[3, 5, 6, 9]]
>>> s1
3 0.188257
5 0.420052
6 0.757541
9 0.380616
dtype: float64
>>> s2 = s[3:6]
>>> s2
3 0.188257
4 0.403606
5 0.420052
dtype: float64
Series取头和尾的值
.head(n)
;
.tail(n);
取出头n行或尾n行,n为可选参数,若不填默认5;
>>> import pandas as pd
>>> import numpy as np
>>>
>>> v = np.random.random_sample(50)
>>> s = pd.Series(v)
>>> print("s.head()", s.head())
s.head() 0 0.567240
1 0.079623
2 0.961206
3 0.092650
4 0.327317
dtype: float64
>>> print("s.head(3)", s.head(3))
s.head(3) 0 0.567240
1 0.079623
2 0.961206
dtype: float64
>>> print("s.tail()", s.tail())
s.tail() 45 0.295008
46 0.746931
47 0.524577
48 0.063680
49 0.243979
dtype: float64
>>> print("s.head(3)", s.head(3))
s.head(3) 0 0.567240
1 0.079623
2 0.961206
dtype: float64
Series常用操作
>>> v = [10, 3, 2, 2, np.nan]
>>> v = pd.Series(v)
>>> v
0 10.0
1 3.0
2 2.0
3 2.0
4 NaN
dtype: float64
>>> print("len():", len(v)) # Series长度,包括NaN
len(): 5
>>> print("shape():", np.shape(v)) # 矩阵形状,(,)
shape(): (5,)
>>> print("count():", v.count()) # Series长度,不包括NaN
count(): 4
>>> print("unique():", v.unique()) # 出现不重复values值
unique(): [10. 3. 2. nan]
>>> print("value_counts():\n", v.value_counts()) # 统计value值出现次数
value_counts():
2.0 2
10.0 1
3.0 1
dtype: int64
Series查找
- 范围查找
>>> import pandas as pd
>>> import numpy as np
>>>
>>> s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": None}
>>> sa = pd.Series(s, name="age")
>>> print(sa[sa>19])
ton 20.0
jim 22.0
lj 24.0
Name: age, dtype: float64
2. 中位数
>>> import pandas as pd
>>> import numpy as np
>>>
>>> s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": None}
>>> sa = pd.Series(s, name="age")
>>> print("sa.median()", sa.median())
sa.median() 20.0
参考:pd.Series()函数解析(最清晰的解释)_我是管小亮的博客-CSDN博客_pd.series()
自己做记录使用,希望大家去看原作者内容,更清晰易懂!