最近做一个系列博客,跟着stackoverflow学Pandas。
专栏地址:http://blog.csdn.net/column/details/16726.html
以 pandas作为关键词,在stackoverflow中进行搜索,随后安照 votes 数目进行排序:
https://stackoverflow.com/questions/tagged/pandas?sort=votes&pageSize=15
Difference between map, applymap and apply methods in Pandas - map、apply、applymap 三者使用差异
https://stackoverflow.com/questions/19798153/difference-between-map-applymap-and-apply-methods-in-pandas
数据准备
import pandas as pd
import numpy as np
df= pd.DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
apply
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html
apply 既可以操作 DataFrame数据,也可以操作Series数据。
func : function
Function to apply to each column/row
# 该参数必须是一个函数,这个函数的输入是dataframe的行或者列
axis : {0 or ‘index’, 1 or ‘columns’}, default 0
0 or ‘index’: apply function to each column
1 or ‘columns’: apply function to each row
#对行、还是对列进行操作
broadcast : boolean, default False
For aggregation functions, return object of same size with values propagated
raw : boolean, default False
If False, convert each row or column into a Series. If raw=True the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance
reduce : boolean or None, default None
Try to apply reduction procedures. If the DataFrame is empty, apply will use reduce to determine whether the result should be a Series or a DataFrame. If reduce is None (the default), apply’s return value will be guessed by calling func an empty Series (note: while guessing, exceptions raised by func will be ignored). If reduce is True a Series will always be returned, and if False a DataFrame will always be returned.
f = lambda x: x.max() - x.min()
df.apply(f) # 默认对行进行操作,获取每一行的最大值和最小值的差
# col1 6.253621
# col2 5.970929
# col3 6.128654
# dtype: float64
applymap
applymap 部分行、列,对所有元素进行操作。
操作对象可以是DataFrame 或者 Series
format = lambda x: '%.2f' % x
print df.applymap(format)
# b d e
# Utah -0.66 0.59 0.38
# Ohio 1.65 -0.06 -1.24
# Texas 0.62 0.03 -0.20
# Oregon -1.24 0.12 -1.10
map
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.map.html
map 仅面向 Series 类型数据
format = lambda x: '%.2f' % x
print df['e'].map(format)
# Utah 0.38
# Ohio -1.24
# Texas -0.20
# Oregon -1.10
# Name: e, dtype: object
根据官方文档, map 还有很多特殊的用法。
数值替换
x = pd.Series([1,2,3], index=['one', 'two', 'three'])
print x
# one 1
# two 2
# three 3
# dtype: int64
y = pd.Series(['foo', 'bar', 'baz'], index=[1,2,3])
print y
# 1 foo
# 2 bar
# 3 baz
# dtype: object
x.map(y)
# one foo
# two bar
# three baz
# dtype: object
z = {1: 'A', 2: 'B', 3: 'C'}
x.map(z)
# one A
# two B
# three C
# dtype: object
合并字符串
s2 = s.map('this is a string {}'.format, na_action=None)
print s2
# 0 this is a string 1.0
# 1 this is a string 2.0
# 2 this is a string 3.0
# 3 this is a string nan
# dtype: object
# 忽略NaN
s3 = s.map('this is a string {}'.format, na_action='ignore')
print s3
# 0 this is a string 1.0
# 1 this is a string 2.0
# 2 this is a string 3.0
# 3 NaN
# dtype: object