python PyPDF2.utils.PdfReadError: Illegal character in Name Object

python PyPDF2.utils.PdfReadError: Illegal character in Name Object

环境说明

PyCharm 2020.2.3 x64
Python 3.7
PyPDF2模块

报错场景

在使用PyPDF2模块将一个PDF文件分割成多个PDF文件时,出现报错

python PyPDF2.utils.PdfReadError: Illegal character in Name Object

报错原因

该错误是由于读取的PDF文件中包含多种编码导致

报错信息如下
Traceback (most recent call last):
  File "E:\Program_Practice\Python\pachong\venv\lib\site-packages\PyPDF2\generic.py", line 484, in readFromStream
    return NameObject(name.decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcb in position 8: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:/Program_Practice/Python/pachong/test/Day_12/pdf_segmentation.py", line 19, in <module>
    pdf_writer.write(out)
  File "E:\Program_Practice\Python\pachong\venv\lib\site-packages\PyPDF2\pdf.py", line 482, in write
    self._sweepIndirectReferences(externalReferenceMap, self._root)
  File "E:\Program_Practice\Python\pachong\venv\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "E:\Program_Practice\Python\pachong\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "E:\Program_Practice\Python\pachong\venv\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "E:\Program_Practice\Python\pachong\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "E:\Program_Practice\Python\pachong\venv\lib\site-packages\PyPDF2\pdf.py", line 556, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
  File "E:\Program_Practice\Python\pachong\venv\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "E:\Program_Practice\Python\pachong\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "E:\Program_Practice\Python\pachong\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "E:\Program_Practice\Python\pachong\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "E:\Program_Practice\Python\pachong\venv\lib\site-packages\PyPDF2\pdf.py", line 577, in _sweepIndirectReferences
    newobj = data.pdf.getObject(data)
  File "E:\Program_Practice\Python\pachong\venv\lib\site-packages\PyPDF2\pdf.py", line 1611, in getObject
    retval = readObject(self.stream, self)
  File "E:\Program_Practice\Python\pachong\venv\lib\site-packages\PyPDF2\generic.py", line 66, in readObject
    return DictionaryObject.readFromStream(stream, pdf)
  File "E:\Program_Practice\Python\pachong\venv\lib\site-packages\PyPDF2\generic.py", line 580, in readFromStream
    value = readObject(stream, pdf)
  File "E:\Program_Practice\Python\pachong\venv\lib\site-packages\PyPDF2\generic.py", line 60, in readObject
    return NameObject.readFromStream(stream, pdf)
  File "E:\Program_Practice\Python\pachong\venv\lib\site-packages\PyPDF2\generic.py", line 493, in readFromStream
    raise utils.PdfReadError("Illegal character in Name Object")
PyPDF2.utils.PdfReadError: Illegal character in Name Object

解决办法

修改PyPDF2模块中的源码,使其能够处理多种编码

步骤1

点击报错日志最上方的generic.py文件,错误提示如下

Traceback (most recent call last):
  File "E:\Program_Practice\Python\pachong\venv\lib\site-packages\PyPDF2\generic.py", line 484, in readFromStream
    return NameObject(name.decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcb in position 8: invalid continuation byte

实际错误提示格式
在这里插入图片描述

步骤2

将如下代码粘贴到步骤1中打开的generic.py文件的486行,然后保存文件

            try:
                ret = name.decode('utf-8')
            except (UnicodeEncodeError, UnicodeDecodeError) as e:
                ret = name.decode('gbk')
            return NameObject(ret)

注意缩进,粘贴后代码格式如下
在这里插入图片描述

步骤3

此时执行我们自己写的代码后,错误提示变为

Traceback (most recent call last):
  File "E:/Program_Practice/Python/pachong/test/Day_12/pdf_segmentation.py", line 19, in <module>
    pdf_writer.write(out)
  File "E:\Program_Practice\Python\pachong\venv\lib\site-packages\PyPDF2\pdf.py", line 501, in write
    obj.writeToStream(stream, key)
  File "E:\Program_Practice\Python\pachong\venv\lib\site-packages\PyPDF2\generic.py", line 554, in writeToStream
    value.writeToStream(stream, encryption_key)
  File "E:\Program_Practice\Python\pachong\venv\lib\site-packages\PyPDF2\generic.py", line 472, in writeToStream
    stream.write(b_(self))
  File "E:\Program_Practice\Python\pachong\venv\lib\site-packages\PyPDF2\utils.py", line 238, in b_
    r = s.encode('latin-1')
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 8-9: ordinal not in range(256)

实际错误提示格式

在这里插入图片描述

步骤4

点击报错日志中utils.py文件,将238-241行代码注释掉

注释前

在这里插入图片描述

注释后

在这里插入图片描述

步骤5

将如下代码粘贴到 238行,注意缩进,保存文件

            try:
                r = s.encode('latin-1')
            except Exception as e:
                r = s.encode('utf-8')
            if len(s) < 2:
                bc[s] = r
        return r

粘贴后文件格式应为
在这里插入图片描述
重新执行我们自己写的代码后,执行成功

  • 9
    点赞
  • 9
    评论
  • 11
    收藏
  • 打赏
    打赏
  • 扫一扫,分享海报

©️2022 CSDN 皮肤主题:大白 设计师:CSDN官方博客 返回首页

打赏作者

babbling

你的鼓励将是我创作的最大动力

¥2 ¥4 ¥6 ¥10 ¥20
输入1-500的整数
余额支付 (余额:-- )
扫码支付
扫码支付:¥2
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值