python：PyPDF2 从PDF文件中提取目录

belldeep

已于 2024-05-16 22:04:47 修改

阅读量2.3k

点赞数 8

分类专栏： python 文章标签： python pdf outline pypdf2 pypdf

于 2024-02-13 22:13:33 首次发布

本文链接：https://blog.csdn.net/belldeep/article/details/136110009

版权

我发现 pypdf 和 pypdf2 的作者是同一人：Mathieu Fenniak

pip install pypdf2 ;

pypdf2-3.0.1-py3-none-any.whl (232 kB)

编写 pdf_read_dir.py 如下

# -*- coding: utf-8 -*-
""" pypdf2==3.0.1 从PDF中提取目录 """
import os
import sys
from PyPDF2 import PdfReader

#每个书签的索引格式
#{'/Title': '书签名', '/Page': '指向的目标页数', '/Type': '类型'}

# 查找指定的字符出现次数
def find_char(str1, char):
    cs = 0
    for c in str1:
        if c == char:
            cs += 1
    return cs
    
directory_str = ''
def bookmark_listhandler(list1):
    global directory_str
    for message in list1:
        if isinstance(message, dict):
            title = message['/Title'].strip()
            if title.startswith("Chapter"): 
                directory_str += '\n' + title + '\n'
            elif title[0:2] in ("序章