一、数据查看
- .head()查看头部数据
- .tail()查看尾部数据
- 默认查看5条
import numpy as np
import pandas as pd
s = pd.Series(np.random.rand(50))
print("s.head() = \n", s.head())
print("-" * 100)
print("s.head(10) = \n", s.head(10))
print("-" * 100)
print("s.tail() = \n", s.tail())
打印结果:
s.head() =
0 0.891778
1 0.575982
2 0.138742
3 0.101361
4 0.247216
dtype: float64
----------------------------------------------------------------------------------------------------
s.head(10) =
0 0.891778
1 0.575982
2 0.138742
3 0.101361
4 0.247216
5 0.376180
6 0.117379
7 0.001082
8 0.769211
9 0.204997
dtype: float64
----------------------------------------------------------------------------------------------------
s.tail() =
45 0.020636
46 0.062189
47 0.110146
48 0.958667
49 0.788788
dtype: float64
Process finished with exit code 0
二、排序
- 使用series.sort_values(ascending=True)进行排序
series排序时,只有一列,不需要参数
data['p_change'].sort_values(ascending=True).head()
2015-09-01 -10.03
2015-09-14 -10.02
2016-01-11 -10.02
2015-07-15 -10.02
2015-08-26 -10.01
Name: p_change, dtype: float64
- 使用series.sort_index()进行排序
与df一致
# 对索引进行排序
data['p_change'].sort_index().head()
2015-03-02 2.62
2015-03-03 1.44
2015-03-04 1.57
2015-03-05 2.02
2015-03-06 8.51
Name: p_change, dtype: float64
三、重新索引
.reindex将会根据索引重新排序,如果当前索引不存在,则引入缺失值
- .reindex()中也是写列表
- 这里’d’索引不存在,所以值为NaN
- fill_value参数:填充缺失值的值
import numpy as np
import pandas as pd
# 重新索引reindex
# .reindex将会根据索引重新排序,如果当前索引不存在,则引入缺失值
s = pd.Series(np.random.rand(3), index=['a', 'b', 'c'])
print("s = \n", s)
print("-" * 100)
# .reindex()中也是写列表
# 这里'd'索引不存在,所以值为NaN
s1 = s.reindex(['c', 'b', 'a', 'd'])
print("s1 = \n", s1)
print("-" * 100)
# fill_value参数:填充缺失值的值
s2 = s.reindex(['c', 'b', 'a', 'd'], fill_value=0)
print("s2 = \n", s2)
打印结果:
s =
a 0.496666
b 0.828771
c 0.363888
dtype: float64
----------------------------------------------------------------------------------------------------
s1 =
c 0.363888
b 0.828771
a 0.496666
d NaN
dtype: float64
----------------------------------------------------------------------------------------------------
s2 =
c 0.363888
b 0.828771
a 0.496666
d 0.000000
dtype: float64
Process finished with exit code 0
四、Series对齐(操作会根据标签自动对齐)
Series 和 ndarray 之间的主要区别是,Series 上的操作会根据标签自动对齐
- index顺序不会影响数值计算,以标签来计算
- 空值和任何值计算结果扔为空值
import numpy as np
import pandas as pd
# Series对齐
s1 = pd.Series(np.random.rand(3), index = ['Jack','Marry','Tom'])
s2 = pd.Series(np.random.rand(3), index = ['Wang','Jack','Marry'])
print("s1 = \n", s1)
print("s2 = \n", s2)
print("-" * 100)
print("s1+s2 = \n", s1+s2)
打印结果:
s1 =
Jack 0.965087
Marry 0.088279
Tom 0.369567
dtype: float64
s2 =
Wang 0.398997
Jack 0.082579
Marry 0.856640
dtype: float64
----------------------------------------------------------------------------------------------------
s1+s2 =
Jack 1.047665
Marry 0.944919
Tom NaN
Wang NaN
dtype: float64
Process finished with exit code 0
五、添加元素/数组
直接通过下标索引/标签index添加值
- 通过.append方法,直接添加一个数组
- .append方法生成一个新的数组,不改变之前的数组
import numpy as np
import pandas as pd
# 添加
s1 = pd.Series(np.random.rand(5))
s2 = pd.Series(np.random.rand(5), index=list('ngjur'))
print("s1 = \n", s1)
print("s2 = \n", s2)
print("-" * 100)
# 直接通过下标索引/标签index添加值
s1[5] = 100
s2['a'] = 100
print("s1 = \n", s1)
print("s2 = \n", s2)
print("-" * 100)
s3 = s1.append(s2)
print("s1 = \n", s1)
print("s3 = \n", s3)
打印结果:
s1 =
0 0.418343
1 0.611628
2 0.793579
3 0.643884
4 0.062399
dtype: float64
s2 =
n 0.178642
g 0.360007
j 0.287545
u 0.016724
r 0.126153
dtype: float64
----------------------------------------------------------------------------------------------------
s1 =
0 0.418343
1 0.611628
2 0.793579
3 0.643884
4 0.062399
5 100.000000
dtype: float64
s2 =
n 0.178642
g 0.360007
j 0.287545
u 0.016724
r 0.126153
a 100.000000
dtype: float64
----------------------------------------------------------------------------------------------------
s1 =
0 0.418343
1 0.611628
2 0.793579
3 0.643884
4 0.062399
5 100.000000
dtype: float64
s3 =
0 0.418343
1 0.611628
2 0.793579
3 0.643884
4 0.062399
5 100.000000
n 0.178642
g 0.360007
j 0.287545
u 0.016724
r 0.126153
a 100.000000
dtype: float64
Process finished with exit code 0
六、修改元素
通过索引直接修改,类似序列
import numpy as np
import pandas as pd
# 修改
s = pd.Series(np.random.rand(3), index=['a', 'b', 'c'])
print("s = \n", s)
s['a'] = 100
s[['b', 'c']] = 200
print("-" * 100)
print("s = \n", s)
打印结果:
s =
a 0.383475
b 0.123369
c 0.911300
dtype: float64
----------------------------------------------------------------------------------------------------
s =
a 100.0
b 200.0
c 200.0
dtype: float64
Process finished with exit code 0
七、删除值
drop 删除元素之后返回新对象
import numpy as np
import pandas as pd
# 删除:.drop
s = pd.Series(np.random.rand(5), index=list('ngjur'))
print("s = \n", s)
print("-" * 100)
s1 = s.drop('n')
s2 = s.drop(['g', 'j'])
print("s1 = \n", s1)
print("-" * 50)
print("s2 = \n", s2)
print("-" * 50)
print("s = \n", s)
打印结果
s =
n 0.744795
g 0.345820
j 0.001573
u 0.275530
r 0.046669
dtype: float64
----------------------------------------------------------------------------------------------------
s1 =
g 0.345820
j 0.001573
u 0.275530
r 0.046669
dtype: float64
--------------------------------------------------
s2 =
n 0.744795
u 0.275530
r 0.046669
dtype: float64
--------------------------------------------------
s =
n 0.744795
g 0.345820
j 0.001573
u 0.275530
r 0.046669
dtype: float64
Process finished with exit code 0