python中的max_row,Openpyxl max_row和max_column错误地报告了一个较大的数字

在使用openpyxl库解析Excel文件时,遇到max_row函数返回值比实际行数大的问题。尽管尝试了删除空行和根据其他讨论修改代码,但错误依然存在。问题在于max_row并未按预期忽略空行。为解决此问题,需要自定义代码从max_row开始反向遍历,直到找到第一个所有单元格值都为空的行,以此确定实际的最大行数。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

My query is to do with a function that is part of a parsing script Im developing. I am trying to write a python function to find the column number corresponding to a matched value in excel. The excel has been created on the fly with openpyxl, and it has the first row (from 3rd column) headers that each span 4 columns merged into one. In my subsequent function, I am parsing some content to be added to the columns corresponding to the matching headers. (Additional info: The content I'm parsing is blast+ output. I'm trying to create a summary spreadsheet with the hit names in each column with subcolumns for hits, gaps, span and identity. The first two columns are query contigs and its length. )

I had initially written a similar function for xlrd and it worked. But when I try to rewrite it for openpyxl, I find that the max_row and max_col function wrongly returns a larger number of rows and columns than actually present. For instance, I have 20 rows for this pilot input, but it reports it as 82.

Note that I manually selected the empty rows & columns and right clicked and deleted them, as advised elsewhere in this forum. This didn't change the error.

def find_column_number(x):

col = 0

print "maxrow = ", hrsh.max_row

print "maxcol = ", hrsh.max_column

for rowz in range(hrsh.max_row):

print "now the row is ", rowz

if(rowz > 0):

pass

for colz in range(hrsh.max_column):

print "now the column is ", colz

name = (hrsh.cell(row=rowz,column=colz).value)

if(name == x):

col = colz

return col

The issue with max_row and max_col, has been discussed here https://bitbucket.org/openpyxl/openpyxl/issues/514/cell-max_row-reports-higher-than-actual I applied the suggestion here. But the max_row is still wrong.

for row in reversed(hrsh.rows):

values = [cell.value for cell in row]

if any(values):

print("last row with data is {0}".format(row[0].row))

maxrow = row[0].row

I then tried the suggestion at https://www.reddit.com/r/learnpython/comments/3prmun/openpyxl_loop_through_and_find_value_of_the/, and tried to get the column values. Once, again the script takes into account the empty columns and reports a higher number columns than actually present.

for currentRow in hrsh.rows:

for currentCell in currentRow:

print(currentCell.value)

Can you please help me resolve this error, or suggest another method to achieve my aim?

解决方案

As noted in the bug report you linked to there's a difference between a sheet's reported dimensions and whether these include empty rows or columns. If max_row and max_column are not reporting what you want to see then you will need to write your own code to find the first completely empty. The most efficient way, of course, would be to start from max_row and work backwards but the following is probably sufficient:

for max_row, row in enumerate(ws, 1):

if all(c.value is None for c in row):

break

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值