一,利用pypdf库 批量 解除pdf 的文件的密码。这里选择pypdf4,其它pypdf2,pypdf3等,亦可参考,代码如下:
import os
from PyPDF4 import PdfFileReader
from PyPDF4 import PdfFileWriter
res_path="./resdir/"
def decrypt_pdf(srcfname, resfname, password):
try:
file = open(srcfname, 'rb')
except Exception as err:
print('file open failed!' + str(err))
return None
pdf_reader = PdfFileReader(file, strict=False)
if not pdf_reader.isEncrypted:
print('file is no encrypted, do nothing. file: %s' % srcfname)
return None
ret = pdf_reader.decrypt(password)
if (ret != 1):
print("%s no password (%s) is error" % (srcfname, password))
return None
pdf_writer = PdfFileWriter()
pdf_writer.appendPagesFromReader(pdf_reader)
res_file = open(resfname, 'wb')
pdf_writer.write(res_file)
file.close()
res_file.close()
return None
def main():
os.mkdir(res_path)
src_path = input(r"input pdf path(example: D:\\pdf\): ")
password = input(r"input passwd(example: 123456): ")
if src_path == "" or password == "":
print('please input right path and password !!!')
return
for filename in os.listdir(src_path):
sfname = src_path + filename
rfname = res_path + filename
print("----- start decrypting file-----------")
decrypt_pdf(sfname, rfname)
print("----- end decrypting file-------------")
if __name__ == '__main__':
main()
使用环境:python3环境,将此脚本和要解密的pdf文件夹放在同级目录下执行。
二,解密过程中遇到的问题:
File "/xxx/lib/python3.10/site-packages/PyPDF4/utils.py", line 237, in b_
r = s.encode('latin-1')
UnicodeEncodeError: 'latin-1' codec can't encode character '\u02c6' in position 0: ordinal not in range(256)
这个问题是pypdf库在解析 pdf中文文档时会出现,解决方法是修改库里面的utils.py文件,如下:
源代码:
...
r = s.encode('latin-1')
if len(s) < 2:
bc[s] = r
return r
...
修改后:
...
try:
r = s.encode('latin-1')
except Exception as e:
r = s.encode('utf-8')
if len(s) < 2:
bc[s] = r
return r
...
修改完后重新运行上面脚本,既可解决此问题。