python pandas获取行号_Python Pandas:增加最大行数

I am processing a large text file (500k lines), formatted as below:

S1_A16

0.141,0.009340221649748676

0.141,4.192618196894668E-5

0.11,0.014122135626540204

S1_A17

0.188,2.3292323316081486E-6

0.469,0.007928706856794138

0.172,3.726771730573038E-5

I'm using the code below to return the correlation coefficients of each series, e.g. S!_A16:

import numpy as np

import pandas as pd

import csv

pd.options.display.max_rows = None

fileName = 'wordUnigramPauseTEST.data'

df = pd.read_csv(fileName, names=['pause', 'probability'])

mask = df['pause'].str.match('^S\d+_A\d+')

df['S/A'] = (df['pause']

.where(mask, np.nan)

.fillna(method='ffill'))

df = df.loc[~mask]

result = df.groupby(['S/A']).apply(lambda grp: grp['pause'].corr(grp['probability']))

print(result)

However, on some large files, this returns the error:

Traceback (most recent call last):

File "/Users/adamg/PycharmProjects/Subj_AnswerCorrCoef/GetCorrCoef.py", line 15, in

print(result)

File "/Users/adamg/anaconda/lib/python2.7/site-packages/pandas/core/base.py", line 35, in __str__

return self.__bytes__()

File "/Users/adamg/anaconda/lib/python2.7/site-packages/pandas/core/base.py", line 47, in __bytes__

return self.__unicode__().encode(encoding, 'replace')

File "/Users/adamg/anaconda/lib/python2.7/site-packages/pandas/core/series.py", line 857, in __unicode__

result = self._tidy_repr(min(30, max_rows - 4))

TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'

I understand that this is related to the print statement, but how do I fix it?

EDIT:

This is related to the maximum number of rows. Does anyone know how to accommodate a greater number of rows?

解决方案

The error message:

TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'

is saying None minus an int is a TypeError. If you look at the next-to-last line in the traceback you see that the only subtraction going on there is

max_rows - 4

So max_rows must be None. If you dive into /Users/adamg/anaconda/lib/python2.7/site-packages/pandas/core/series.py, near line 857 and ask yourself how max_rows could end up being equal to None, you'll see that somehow

get_option("display.max_rows")

must be returning None.

This part of the code is calling _tidy_repr which is used to summarize the Series. None is the correct value to set when you want pandas to display all lines of the Series.

So this part of the code should not have been reached when max_rows is None.

I've made a pull request to correct this.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值