本篇博客整理各种pandas中dataframe的操作技巧,长期更新。
1:原有列基础生成新列
常见使用情景:两列相减的值为新的一列,或者多列操作生成新的一列
技巧:
import pandas as pd
# make a simple dataframe
df = pd.DataFrame({'a':[1,2], 'b':[3,4]})
df
# a b
# 0 1 3
# 1 2 4
# this just creates an unattached column:
df.apply(lambda row: row.a + row.b, axis=1)
# 0 4
# 1 6
# do same but attach it to the dataframe
df['c'] = df.apply(lambda row: row.a + row.b, axis=1)
df
# a b c
# 0 1 3 4
# 1 2 4 6
2:如何遍历Pandas Dataframe每一行
c1 c2
0 10 100
1 11 110
2 12 120
In [18]: for index, row in df.iterrows():
....: print row['c1'], row['c2']
....:
10 100
11 110
12 120
3:Pandas Dataframe通过matplotlib画图如何鼠标滚轮缩放
主要缩放函数:
import matplotlib.pyplot as plt
def zoom_factory(ax,base_scale = 2.):
def zoom_fun(event):
# get the current x and y limits
cur_xlim = ax.get_xlim()
cur_ylim = ax.get_ylim()
cur_xrange = (cur_xlim[1] - cur_xlim[0])*.5
cur_yrange = (cur_ylim[1] - cur_ylim[0])*.5
xdata = event.xdata # get event x location
ydata = event.ydata # get event y location
if event.button == 'up':
# deal with zoom in
scale_factor = 1/base_scale
elif event.button == 'down':
# deal with zoom out
scale_factor = base_scale
else:
# deal with something that should never happen
scale_factor = 1
print event.button
# set new limits
ax.set_xlim([xdata - cur_xrange*scale_factor,
xdata + cur_xrange*scale_factor])
ax.set_ylim([ydata - cur_yrange*scale_factor,
ydata + cur_yrange*scale_factor])
plt.draw() # force re-draw
fig = ax.get_figure() # get the figure of interest
# attach the call back
fig.canvas.mpl_connect('scroll_event',zoom_fun)
#return the function
return zoom_fun
如何使用(直接对matplotlib的轴对象ax进行操作即可):
ax.plot(range(10))
scale = 1.5
f = zoom_factory(ax,base_scale = scale)
可选参数base_scale允许您将比例因子设置为您想要的值。
一定要确保缩放函数有个返回值对象f。所以如果你不保存f,该缩放返回值可能被垃圾回收。
演示:
4:pandas更改索引,更改index为某列
In [1]: import pandas as pd
In [2]: df = pd.read_csv('hello.csv')
In [3]: df
Out[3]:
name gender
0 Lucas Male
1 Lucy Female
2 Lily Female
3 Jim Male
In [4]: df.set_index('name')
Out[4]:
gender
name
Lucas Male
Lucy Female
Lily Female
Jim Male
5:ndarray切片区域选取
In [1]: from numpy import *
In [2]: a = arange(36).reshape((6,6))
In [3]: a
Out[3]:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]])
In [4]: a[1,2]
Out[4]: 8
In [5]: a[1,:]
Out[5]: array([ 6, 7, 8, 9, 10, 11])
In [6]: a[:,2]
Out[6]: array([ 2, 8, 14, 20, 26, 32])
In [7]: a[0:2, 0:2]
Out[7]:
array([[0, 1],
[6, 7]])
语法:中括号中第一个选取就是行,第二个选取的就是列,中间用逗号隔开。
6:新增一列是某列的累加
In [1]: import pandas as pd
In [2]: num_list = [1, 2, 3, 4]
In [3]: df = pd.DataFrame(data=num_list)
In [4]: df[1] = df.cumsum()
In [5]: df
Out[6]:
0 1
0 1 1
1 2 3
2 3 6
文章的脚注信息由WordPress的wp-posturl插件自动生成
|2|left
打赏
微信扫一扫,打赏作者吧~