Pandas Series相对于Numpy ndarry
1.更多的函数
describe()...
2相似
下标方式访问:s[10]、s[3:10]...
for循环:for item in s
向量计算:+、-、*、/、...
科学计算:mean、sum、max...
比Python List快
import pandas as pd
def dir_judge(var1,var2):
mean1 = var1.mean()
mean2 = var2.mean()
same_dir = (((var1>mean1)&(var2>mean2))|((var1 < mean1)&(var2<mean2)))
print(same_dir)
#return(len(same_dir[same_dir == True]),len(same_dir[same_dir == False]))
return (same_dir.sum(),len(var1)-same_dir.sum())
house_area = pd.Series([67.5,32,135,84,200,62,101,25])
house_price = pd.Series([550,268,850,652,1300,906,1100,400])
print(dir_judge(house_area,house_price))
Series 索引
通过位置访问Series元素
iloc
通过索引访问Series元素
loc
获取Series中最大元素的索引
idxmax
import pandas as pd
arr = pd.Series([56,78,89],index=['ls','ww','zs'])
print(arr)
print(arr.describe())
#通过索引输出其对应的值
print(arr.loc['ls'])
#获取最大值的索引
print(arr.idxmax())
#接收传入的Pandas Series参数(参数以学员名作为index,值为成绩),返回成绩最低的学员名及其分数:
def get_min(scores):
return (scores.idxmin(), scores.loc[scores.idxmin()])
scores = pd.Series([81, 90, 57, 100, 76, 98],
index = ['Li Mu', 'Zhang Qiang', 'Wang Ning',
'Ma Ying', 'Zhang Dong', 'Liu Fang'])
print(get_min(scores))
Series向量化计算
import numpy as np
import pandas as pd
a = np.array([1,2,3,4])
b = np.array([5,6,7,8])
print(a+b)
s1 = pd.Series([1,2,3,4],index=['a','b','c','d'])
s2 = pd.Series([10,20,30,40],index=['a','b','c','d'])
print(s1+s2)
s1 = pd.Series([1,2,3,4],index=['a','b','c','d'])
s2 = pd.Series([10,20,30,40],index=['d','b','a','c'])
print(s1+s2)
s1 = pd.Series([1,2,3,4],index=['a','b','c','d'])
s2 = pd.Series([10,20,30,40],index=['c','d','e','f'])
print(s1+s2)
s1 = pd.Series([1,2,3,4],index=['a','b','c','d'])
s2 = pd.Series([10,20,30,40],index=['e','f','g','i'])
print(s1+s2)
Series apply的用法
对于每个Series中的元素,应用到apply所传入的函数上
import pandas as pd
def multi(aa):
return aa*2
arr = pd.Series([1,2,3])
print(arr.apply(multi))
def multi_big(number):
if number>2:
return number*2
print(arr.apply(multi_big))
#在一个Pandas Series结构中,存储了对于若干电影的评分数据。
#完成代码填空,将电影评分中大于等于8分的标志为S、大于等于6分,低于8分的标志为A、低于6分的标志为B。
scores = pd.Series([8, 2, 9, 8, 2, 5, 2, 9, 5, 4, 7, 7, 9, 6, 4, 7, 4, 8, 4, 8, 6, 2,
2, 5, 6, 9, 5, 3, 3, 7, 3, 2, 8, 3, 5, 2, 4, 9, 3, 9, 7, 7, 8, 9,
8, 9, 5, 8, 5, 2, 8, 3, 2, 2, 9, 9, 9, 8, 2, 7, 2, 3, 8, 5, 5, 9,
3, 4, 6, 2, 7, 3, 8, 7, 8, 8, 2, 2, 7, 7, 5, 7, 2, 3, 4, 3, 2, 9,
2, 5, 9, 7, 5, 7, 3, 4, 3, 8, 5, 3])
def movie_level(score):
if score >= 8:
return 'S'
elif score >= 6 and score < 8:
return 'A'
else:
return 'B'
print(scores.apply(movie_level))
缺失值的填充
dropna 将NAN元素去除
add以及fill_value
使用自定义填充值填充最终结果。
import pandas as pd
arr = pd.Series([1,2,3,4,5],index=['a','b','c','d','e'])
brr = pd.Series([4,5,6,7,8],index=['d','e','g','h','i'])
crr = arr + brr
#将NAN元素去除
print(crr.dropna())
#将NAN元素设为0
print(crr.fillna(0))
#使用自定义填充值填充最终的结果
print(arr.add(brr,fill_value=0))