痛!使用paddleOCR表格识别标注Excel遇到的问题IndexError: list index out of range

1 篇文章 0 订阅
1 篇文章 0 订阅
[2023/08/02 18:25:35] ppocr ERROR: When parsing line {"filename": "3_new68_table.png", "html": {"structure": {"tokens": ["<tbody>", "<tr>", "<td", " colspan=\"11\", \">", "</td>",  "</tr>", "<tr>", "<td", " colspan=\"2\"", ">", "</td>", "<td", " colspan=\"4\"", ">", "</td>", "<td", " colspan=\"5\"", ">", "</td>",  "</tr>", "<tr>", "<td", " colspan=\"2\"", ">", "</td>", "<td", " colspan=\"9\"", ">", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td", " colspan=\"2\"", ">", "</td>", "<td", " colspan=\"3\"", ">", "</td>", "<td", " colspan=\"2\"", ">", "</td>", "<td", " colspan=\"3\"", ">", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td", " rowspan=\"2\"", ">", "</td>", "<td", " rowspan=\"2\"", ">", "</td>", "<td", " rowspan=\"2\"", ">", "</td>", "<td", " rowspan=\"2\"", ">", "</td>", "<td", " rowspan=\"2\"", ">", "</td>", "<td", " rowspan=\"2\"", ">", "</td>", "<td", " rowspan=\"2\"", ">", "</td>", "<td", " rowspan=\"2\"", ">", "</td>", "<td", " colspan=\"3\"", ">", "</td>", "<td>", "</td>
  File "/home/aistudio/PaddleOCR/ppocr/data/pubtab_dataset.py", line 118, in __getitem__
    outs = transform(data, self.ops)
  File "/home/aistudio/PaddleOCR/ppocr/data/imaug/__init__.py", line 53, in transform
    data = op(data)
  File "/home/aistudio/PaddleOCR/ppocr/data/imaug/label_ops.py", line 669, in __call__
    if 'bbox' in cells[bbox_idx] and len(cells[bbox_idx][
IndexError: list index out of range

 在日志中出现了IndexError: list index out of range,一般是Excel中单元格的问题,我的是因为单元格被拉宽,导致每一行后面多了个<td>排查了很久才找到

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值