O’Reilly出版的Wes McKenny编的《Python for Data Analysis》, 采用Anaconda3集成环境
1.1 Movielens数据的处理例子,输出前五个用户信息。代码如下:
import pandas as pd
unames = ['user_id', 'gender', 'age', 'occupationb', 'zip']
users = pd.read_table('ch02/movielens/users.dat', sep = "::", header = None, names = uname)
报错信息:
D:\Program Files\Anaconda\lib\site-packages\pandas\io\parsers.py:648: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators;you can avoid this warning by specifying engine='python'. ParserWarning)
修改方法:在users语句末尾加上 engine=‘python’
users = pd.read_table('ch02/movielens/users.dat', sep = "::", header = None, names = uname,engine='python')
###1.2 在MovieLens 1M数据集例子,使用pivot_table()按性别计算每部电影的平均得分
mean_ratings = data.pivot_table('rating', rows = 'title', cols = 'gender', aggfunc = 'mean')
报错信息:
Traceback (most recent call last):
File "<ipython-input-28-669a36c33797>", line 1, in <module>
mean_ratings = data.pivot_table('rating', rows = 'title', cols = 'gender', aggfunc = 'mean')
TypeError: pivot_table() got an unexpected keyword argument 'rows'
修改方法:用 index 替换 rows,用 columns 替换 cols
mean_ratings = data.pivot_table('rating', index= 'title', columns= 'gender', aggfunc = 'mean')
1.3 Numpy.cumsum 计算累积和
# 创建 5*4 的数组
In[1]: arr = np.arange(1, 21).reshape(5, 4)
Out[1]:
([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12],
[13, 14, 15, 16],
[17, 18, 19, 20]])
# numpy.cumsum() 求取累计和
In [2]: arr.cumsum()
Out[2]:
array([ 1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91, 105, 120, 136, 153, 171, 190, 210])
# numpy.cumsum(0) 当参数为0时, 按行累计和,在每一列中累加
In [3]: arr.cumsum(0)
Out[3]:
array([[ 1, 2, 3, 4],
[ 6, 8, 10, 12],
[15, 18, 21, 24],
[28, 32, 36, 40],
[45, 50, 55, 60]])
# numpy.cumsum(1) 当参数为1时, 按列累计和,在每一行中累加
In [4]: arr.cumsum(1)
Out[4]:
array([[ 1, 3, 6, 10],
[ 5, 11, 18, 26],
[ 9, 19, 30, 42],
[13, 27, 42, 58],
[17, 35, 54, 74]])