获取PDF文件的标题的脚本

最新推荐文章于 2023-12-25 20:22:20 发布

tomeasure

最新推荐文章于 2023-12-25 20:22:20 发布

阅读量2.3k

点赞数 1

本文链接：https://blog.csdn.net/qq_29695701/article/details/117526856

版权

Python 同时被 2 个专栏收录

36 篇文章 1 订阅

订阅专栏

shell

5 篇文章 1 订阅

订阅专栏

背景

对于大批量的PDF文件集合，能有一个自动读取文件标题的脚本是很有用的。

方式

Python版

from PyPDF2 import PdfFileReader

fin = open("test.pdf", "rb")
pdf_title = PdfFileReader(fin).getDocumentInfo().title # 标题

print(pdf_title)
fin.close()

Bash版

alias get_pdf_title='python -c "from PyPDF2 import PdfFileReader; import sys; fin = open(sys.argv[1], \"rb\"); print(PdfFileReader(fin).getDocumentInfo().title.replace(\" \", \"_\")); fin.close()"'

这里把空格替换为了下划线 _，这样可以方便后面的脚本处理。
使用方式：

~/workspace >>$ get_pdf_title WACV2021/Akiva_H2O-Net_Self-Supervised_Flood_Segmentation_via_Adversarial_Domain_WACV_2021_paper.pdf
H2O-Net:_Self-Supervised_Flood_Segmentation_via_Adversarial_Domain_Adaptation_and_Label_Refinement

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

tomeasure

关注关注

1
点赞
踩
7

收藏

觉得还不错? 一键收藏
2
评论
获取PDF文件的标题的脚本

背景对于大批量的PDF文件集合，能有一个自动读取文件标题的脚本是很有用的。方式Python版from PyPDF2 import PdfFileReaderfin = open("test.pdf", "rb")pdf_title = PdfFileReader(fin).getDocumentInfo().title # 标题print(pdf_title)fin.close()Bash版alias get_pdf_title='python -c "from PyPDF
复制链接

扫一扫