pandas数据运算
1.算术运算
Series数据的算术运算
from pandas import Series,DataFrame
import pandas as pd
data1=Series([1,2,3,4,5],index=['a','b','c','d','e'])
print(data)
a 1
b 2
c 3
d 4
e 5
dtype: int64
再创建一组数据,实现数据相加效果。data2=Series([1,2,3,4,5],index=['a','b','d','e','f'])
from pandas import Series,DataFrame
import pandas as pd
data1=Series([1,2,3,4,5],index=['a','b','c','d','e'])
data2=Series([1,2,3,4,5],index=['a','b','d','e','f'])
print(data1+data2)
a 2.0
b 4.0
c NaN
d 7.0
e 9.0
f NaN
dtype: float64
2.函数应用和映射
在数据分析时,常常会对数据进行较复杂的数据运算,这时需要定义函数。定义好的函数可以应用到pandas数据中,其中三种方法:map函数,将函数套用在Series的每一个元素中;apply函数,将函数套用到DataFrame的行和列上;applymap函数,将函数套用在DataFrame的每个元素上。
例:
from pandas import Series,DataFrame
from IPython.display import display
import pandas as pd
data={
'水果':['苹果','香蕉','菠萝','火龙果'],
'价格':['25元','15元','20元','25元']
}
df=DataFrame(data)
display(df)
如果想把"元"去掉,则要使用map函数
from pandas import Series,DataFrame
from IPython.display import display
import pandas as pd
data={
'水果':['苹果','香蕉','菠萝','火龙果'],
'价格':['25元','15元','20元','25元']
}
df=DataFrame(data)
def f(x):
return x.split('元')[0]
df['价格']=df['价格'].map(f)
display(df)
apply函数的使用。
from pandas import Series,DataFrame
from IPython.display import display
import numpy as np
import random
import pandas as pd
data=DataFrame(np.random.randn(3,3),columns=['a','b','c'],index=['d','e','f'])
display(data)
f=lambda x:x.max()-x.min()
display(data.apply(f))
3.排序
在Series中,通过sort_index函数可对索引进行排序,默认升序
from pandas import Series,DataFrame
from IPython.display import display
import pandas as pd
data=Series([1,0,-13,3],index=['a','x','b','d'])
display(data)
a 1
x 0
b -13
d 3
dtype: int64
升序
from pandas import Series,DataFrame
from IPython.display import display
import pandas as pd
data=Series([1,0,-13,3],index=['a','x','b','d'])
display(data.sort_index())
a 1
b -13
d 3
x 0
dtype: int64
降序
from pandas import Series,DataFrame
from IPython.display import display
import pandas as pd
data=Series([1,0,-13,3],index=['a','x','b','d'])
display(data.sort_index(ascending=False))
x 0
d 3
b -13
a 1
dtype: int64
想对值进行排序
from pandas import Series,DataFrame
from IPython.display import display
import pandas as pd
data=Series([1,0,-13,3],index=['a','x','b','d'])
display(data.sort_values())
b -13
x 0
a 1
d 3
dtype: int64
4.统计出现的次数
from pandas import Series,DataFrame
from IPython.display import display
import pandas as pd
data=Series(['a','a','b','a','d','c'])
display(data.value_counts())
a 3
c 1
d 1
b 1
dtype: int64