使用
pyPdf寻找URI,
import pyPdf
f = open('TMR-Issue6.pdf','rb')
pdf = pyPdf.PdfFileReader(f)
pgs = pdf.getNumPages()
key = '/Annots'
uri = '/URI'
ank = '/A'
for pg in range(pgs):
p = pdf.getPage(pg)
o = p.getObject()
if o.has_key(key):
ann = o[key]
for a in ann:
u = a.getObject()
if u[ank].has_key(uri):
print u[ank][uri]
给,
http://www.augustsson.net/Darcs/Djinn/
http://plato.stanford.edu/entries/logic-intuitionistic/
http://citeseer.ist.psu.edu/ishihara98note.html
etc...
我找不到一个链接到另一个pdf的文件,但我怀疑URI字段应该包含表单file:/// myfiles的URI