Python 学习中遇到的各种问题

最新推荐文章于 2022-10-20 10:52:05 发布

夏日麦香

最新推荐文章于 2022-10-20 10:52:05 发布

阅读量1.5k

点赞数 1

分类专栏：实际问题解决文章标签： python 问题

本文链接：https://blog.csdn.net/u010652755/article/details/49824227

版权

实际问题解决专栏收录该内容

34 篇文章 3 订阅

订阅专栏

O’Reilly出版的Wes McKenny编的《Python for Data Analysis》，采用Anaconda3集成环境

1.1 Movielens数据的处理例子,输出前五个用户信息。代码如下：

import pandas as pd
unames = ['user_id', 'gender', 'age', 'occupationb', 'zip'] 
users = pd.read_table('ch02/movielens/users.dat', sep = "::", header = None, names = uname)

报错信息：

D:\Program Files\Anaconda\lib\site-packages\pandas\io\parsers.py:648: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators;you can avoid this warning by specifying engine='python'.  ParserWarning)

修改方法：在users语句末尾加上 engine=‘python’

users = pd.read_table('ch02/movielens/users.dat', sep = "::", header = None, names = uname，engine='python')

###1.2 在MovieLens 1M数据集例子，使用pivot_table()按性别计算每部电影的平均得分

mean_ratings = data.pivot_table('rating', rows = 'title', cols = 'gender', aggfunc = 'mean')

报错信息：

Traceback (most recent call last):

 File "<ipython-input-28-669a36c33797>", line 1, in <module>
        mean_ratings = data.pivot_table('rating', rows = 'title', cols = 'gender', aggfunc = 'mean')

TypeError: pivot_table() got an unexpected keyword argument 'rows'

修改方法：用 index 替换 rows，用 columns 替换 cols

mean_ratings = data.pivot_table('rating', index= 'title', columns= 'gender', aggfunc = 'mean')

1.3 Numpy.cumsum 计算累积和

# 创建 5*4 的数组
In[1]: arr = np.arange(1, 21).reshape(5, 4)
Out[1]: 
([[ 1,  2,  3,  4],
  [ 5,  6,  7,  8],
  [ 9, 10, 11, 12],
  [13, 14, 15, 16],
  [17, 18, 19, 20]])

# numpy.cumsum() 求取累计和
In [2]: arr.cumsum()
Out[2]:
array([  1,   3,   6,  10,  15,  21,  28,  36,  45,  55,  66,  78,  91, 105, 120, 136, 153, 171, 190, 210])

# numpy.cumsum(0) 当参数为0时， 按行累计和，在每一列中累加 
In [3]: arr.cumsum(0)
Out[3]:
array([[ 1,  2,  3,  4],
       [ 6,  8, 10, 12],
       [15, 18, 21, 24],
       [28, 32, 36, 40],
       [45, 50, 55, 60]])
# numpy.cumsum(1) 当参数为1时， 按列累计和，在每一行中累加
In [4]: arr.cumsum(1)
Out[4]:
array([[ 1,  3,  6, 10],
       [ 5, 11, 18, 26],
       [ 9, 19, 30, 42],
       [13, 27, 42, 58],
       [17, 35, 54, 74]])