zotero导出pdf

今天老师给我改论文的时候布置了一个任务,让我把所有论文的pdf按格式打包发给她。可是之前我用zotero的时候都是在线保存的,有些是没有pdf的,怎么办?而且就算有pdf,他们的命名格式也五花八门,难道一个个手改吗?

这篇文章我引用了96篇,我肯定不可能一篇篇手改,那也太蠢了。程序员不可能干超过三遍的事情,所以肯定要写个程序。

万幸的是,zotero是可以导出csv格式,还保存了你存储pdf的位置。

打开生成的csv,我们可以看到“File Attachments”(左图)一列保存了pdf保存的地址。我们只需要根据这个地址索引,就可以获取所有的pdf,然后再根据具体的出版时间、作者、标题生成一下参考文献列表即可。

 这样就构思了第一版代码

思路就是从File Attachments里面搞到内容复制到文件夹里面去

import csv
import shutil

copySuccess = 0
copyFail = 0
mycsvfile = 'doc.csv'
mypdfdic = '.\pdf'
with open(mycsvfile,newline='',encoding='utf-8-sig')as csvfile:
    cr = csv.DictReader(csvfile)
    for row in cr:
        print('Copying...{}'.format(row["File Attachments"]))
        try:
            shutil.copy(row["File Attachments"],mypdfdic)
            copySuccess = copySuccess + 1
        except:
            copyFail = copyFail + 1

print('Done.{}Succeed,{}Failed.'.format(copySuccess,copyFail))

然而,我发现大部分文件都复制失败了。debug发现,有些File Attachments它包含了多个文献,除了pdf还包含网页文件。比如如下格式: C:\Users\xxr00\Zotero\storage\S9GT3DMY\Agostino 等。 - 2008 - Voluntary, spontaneous, and reflex blinking in Par.pdf; C:\Users\xxr00\Zotero\storage\8WUZSCZA\mds.html

那么我们就需要排除掉网页格式,这很简单,我们对每个分号做一个切割,然后保留最后四个字母是“pdf”的就可以了。另外,还发现只要是最后有一个分号它也不能成功复制,毕竟复制要求格式很严谨。

但是这种思路被我排除了,我觉得这样可能还会遇到其它坑。一种更简单的方式是,根据File Attachments提供的位置,我们获取类似S9GT3DMY的信息,然后读取对应文件夹下所有的pdf文件。毕竟字符串处理这问题其实恶心起来鬼知道能恶心成什么样子,还是少惹他为好。

import csv
import shutil
import os

copySuccess = 0
copyFail = 0
mycsvfile = 'doc.csv'
mypdfdic = '.\pdf'
zoterodic = r'C:\Users\xxr00\Zotero\storage'
with open(mycsvfile,newline='',encoding='utf-8-sig')as csvfile:
    cr = csv.DictReader(csvfile)
    for row in cr:
        print('Copying...{}'.format(row["File Attachments"]))
        docname = (row["File Attachments"])
        try:
            tmp = docname.split("\\")
            keyword = tmp[5]
            rowpdf_dic = zoterodic + '\\' + keyword
            files = os.listdir(rowpdf_dic)
            pdfdir = '';
            for tmpstr in files:
                strlen = len(tmpstr)
                pdf_yorn = tmpstr[strlen-3:strlen]
                if pdf_yorn == 'pdf':
                    pdfdir = rowpdf_dic + '\\' + tmpstr

            shutil.copy(pdfdir,mypdfdic)
            copySuccess = copySuccess + 1
        except:
            copyFail = copyFail + 1

print('Done.{}Succeed,{}Failed.'.format(copySuccess,copyFail))

中间写了不少乱七八糟的变量,实际上就是读取对应地址下文件,然后看它是不是pdf,如果是,就复制它。就这么简单。当然这里得确保对应路径下只有一个pdf文件,如果不是的话可不行。不过这一版还是pdf原来的名字,肯定是有问题的。所以还是应该再修改pdf的名字。

import csv
import shutil
import os

copySuccess = 0
copyFail = 0
mycsvfile = 'doc.csv'
mypdfdic = '.\pdf'
zoterodic = r'C:\Users\xxr00\Zotero\storage'
with open(mycsvfile,newline='',encoding='utf-8-sig')as csvfile:
    cr = csv.DictReader(csvfile)

    os.chdir(mypdfdic)  # 修改路径到之前的文件
    mypdfdic = os.getcwd()
    for row in cr:
        print('Copying...{}'.format(row["File Attachments"]))
        docname = (row["File Attachments"])
        try:
            tmp = docname.split("\\")
            keyword = tmp[5]
            rowpdf_dic = zoterodic + '\\' + keyword
            files = os.listdir(rowpdf_dic)
            pdfdir = '';
            for tmpstr in files:
                strlen = len(tmpstr)
                pdf_yorn = tmpstr[strlen-3:strlen]
                if pdf_yorn == 'pdf':
                    pdfdir = rowpdf_dic + '\\' + tmpstr
                    oldname = tmpstr

            shutil.copy(pdfdir,mypdfdic)

            # 修改文件名
            #作者这个较为复杂,第一个作者只要第一个,两个作者用and,三个作者用etal
            authorname = row["Author"]
            tmp = authorname.split(";")
            for i in range(len(tmp)):
                au = tmp[i]
                tmp[i] = au[0:au.rfind(',', 1)]
            #if len(tmp)>2:

            if len(tmp) == 1:
                author_final = tmp[0]
            elif len(tmp) == 2:
                author_final = tmp[0] + ' and ' + tmp[1]
            else:
                author_final = tmp[0] + ' et.al,'

            year_fianl = row["Publication Year"]
            title_final = row['Title']
            publication_final = row['Publication Title']


            newname = '(' + author_final + year_fianl + ') ' + title_final + ', ' + publication_final +  '.pdf'
            os.rename(oldname, newname)
            copySuccess = copySuccess + 1
        except:
            copyFail = copyFail + 1

print('Done.{}Succeed,{}Failed.'.format(copySuccess,copyFail))

这一版本总算可以将pdf的名称也导入出来了。

但是,我发现有部分文献没有被成功改名。。。然后看了看原因,似乎是因为标题+出版社太长了。没办法,我们就不要出版社了,然后确保当字符串长度超过200时,将除了200的全部改成...

import csv
import shutil
import os

copySuccess = 0
copyFail = 0
mycsvfile = 'doc.csv'
mypdfdic = '.\pdf'
zoterodic = r'C:\Users\xxr00\Zotero\storage'
with open(mycsvfile,newline='',encoding='utf-8-sig')as csvfile:
    cr = csv.DictReader(csvfile)

    os.chdir(mypdfdic)  # 修改路径到之前的文件
    mypdfdic = os.getcwd()
    for row in cr:
        print('Copying...{}'.format(row["File Attachments"]))
        docname = (row["File Attachments"])
        try:
            tmp = docname.split("\\")
            keyword = tmp[5]
            rowpdf_dic = zoterodic + '\\' + keyword
            files = os.listdir(rowpdf_dic)
            pdfdir = '';
            for tmpstr in files:
                strlen = len(tmpstr)
                pdf_yorn = tmpstr[strlen-3:strlen]
                if pdf_yorn == 'pdf':
                    pdfdir = rowpdf_dic + '\\' + tmpstr
                    oldname = tmpstr

            shutil.copy(pdfdir,mypdfdic)

            # 修改文件名
            #作者这个较为复杂,第一个作者只要第一个,两个作者用and,三个作者用etal
            authorname = row["Author"]
            tmp = authorname.split(";")
            for i in range(len(tmp)):
                au = tmp[i]
                tmp[i] = au[0:au.rfind(',', 1)]
            #if len(tmp)>2:

            if len(tmp) == 1:
                author_final = tmp[0]
            elif len(tmp) == 2:
                author_final = tmp[0] + ' and ' + tmp[1]
            else:
                author_final = tmp[0] + ' et.al,'

            year_fianl = row["Publication Year"]
            title_final = row['Title']

            finalname = '(' + author_final + year_fianl + ') ' + title_final + ', '
            if len(finalname)>200:
                finalname = finalname[0:200] + '...'
            newname =    finalname + '.pdf'
            os.rename(oldname, newname)
            copySuccess = copySuccess + 1
        except:
            copyFail = copyFail + 1

print('Done.{}Succeed,{}Failed.'.format(copySuccess,copyFail))

然后发现还是有部分没被成功改过来。真的是字符串狗都不做,语言没错了。继续debug

我发现是因为路径中出现了冒号。。。和盘符冲突。没办法,为了尽量减少麻烦,将英文冒号改为中文冒号。

import csv
import shutil
import os

copySuccess = 0
copyFail = 0
mycsvfile = 'doc.csv'
mypdfdic = '.\pdf'
zoterodic = r'C:\Users\xxr00\Zotero\storage'
with open(mycsvfile,newline='',encoding='utf-8-sig')as csvfile:
    cr = csv.DictReader(csvfile)

    os.chdir(mypdfdic)  # 修改路径到之前的文件
    mypdfdic = os.getcwd()
    for row in cr:
        print('Copying...{}'.format(row["File Attachments"]))
        docname = (row["File Attachments"])
        try:
            tmp = docname.split("\\")
            keyword = tmp[5]
            rowpdf_dic = zoterodic + '\\' + keyword
            files = os.listdir(rowpdf_dic)
            pdfdir = '';
            for tmpstr in files:
                strlen = len(tmpstr)
                pdf_yorn = tmpstr[strlen-3:strlen]
                if pdf_yorn == 'pdf':
                    pdfdir = rowpdf_dic + '\\' + tmpstr
                    oldname = tmpstr

            shutil.copy(pdfdir,mypdfdic)

            # 修改文件名
            #作者这个较为复杂,第一个作者只要第一个,两个作者用and,三个作者用etal
            authorname = row["Author"]
            tmp = authorname.split(";")
            for i in range(len(tmp)):
                au = tmp[i]
                tmp[i] = au[0:au.rfind(',', 1)]
            #if len(tmp)>2:

            if len(tmp) == 1:
                author_final = tmp[0]
            elif len(tmp) == 2:
                author_final = tmp[0] + ' and ' + tmp[1]
            else:
                author_final = tmp[0] + ' et.al,'

            year_fianl = row["Publication Year"]
            title_final = row['Title']

            finalname = '(' + author_final + year_fianl + ') ' + title_final + ', '
            if len(finalname)>200:
                finalname = finalname[0:200] + '...'
            newname =    finalname + '.pdf'
            newname = newname.replace(':',':')
            os.rename(oldname, newname)
            copySuccess = copySuccess + 1
        except:
            copyFail = copyFail + 1

print('Done.{}Succeed,{}Failed.'.format(copySuccess,copyFail))

当然,如果原来zotero就没有保存pdf,那就没办法了。这可能真的得手工干。我先开始补充。实际上,对待一些文件,右键就可以看到是否有pdf。你可以全选文件,然后直接右键然后“找寻pdf”。看zotero能帮你找到几个哈哈。

结果发现,它能找到纯属运气,大部分得靠你自己.

补完文献后,发现代码仍然有一个错误。。。

找完文献后,发现运行代码还是会有一点问题。主要可能是zotero存储了多个位置,会出现如下形式:

C:\Users\xxr00\Zotero\storage\ZFTV27TR\Theiler 等。 - 1992 - Testing for nonlinearity in time series the metho.pdf; C:\Users\xxr00\Zotero\storage\UQ2GUHS2\016727899290102S.html

即将pdf和网页存在了不同的位置,比如上例中存在了ZFTV27TR和UQ2GUHS2中

字符串处理到这里已经开始十分厌烦了。算了,直接打补丁吧。简单说就是用分号间隔,然后看谁pdf就要谁。

import csv
import shutil
import os

copySuccess = 0
copyFail = 0
mycsvfile = 'doc.csv'
mypdfdic = '.\pdf'
zoterodic = r'C:\Users\xxr00\Zotero\storage'
with open(mycsvfile,newline='',encoding='utf-8-sig')as csvfile:
    cr = csv.DictReader(csvfile)

    os.chdir(mypdfdic)  # 修改路径到之前的文件
    mypdfdic = os.getcwd()
    for row in cr:
        print('Copying...{}'.format(row["File Attachments"]))
        docname = (row["File Attachments"])
        try:
            tmp = docname.split(";")
            for i in range(len(tmp)):
                strlen = len(tmp[i])
                if tmp[i][strlen-3:strlen] == 'pdf':
                    tmp = tmp[i]
                    break

            tmp = tmp.split("\\")
            keyword = tmp[5]
            rowpdf_dic = zoterodic + '\\' + keyword
            files = os.listdir(rowpdf_dic)
            pdfdir = ''
            for tmpstr in files:
                strlen = len(tmpstr)
                pdf_yorn = tmpstr[strlen-3:strlen]
                if pdf_yorn == 'pdf':
                    pdfdir = rowpdf_dic + '\\' + tmpstr
                    oldname = tmpstr

            shutil.copy(pdfdir,mypdfdic)

            # 修改文件名
            #作者这个较为复杂,第一个作者只要第一个,两个作者用and,三个作者用etal
            authorname = row["Author"]
            tmp = authorname.split(";")
            for i in range(len(tmp)):
                au = tmp[i]
                tmp[i] = au[0:au.rfind(',', 1)]
            #if len(tmp)>2:

            if len(tmp) == 1:
                author_final = tmp[0]
            elif len(tmp) == 2:
                author_final = tmp[0] + ' and ' + tmp[1]
            else:
                author_final = tmp[0] + ' et.al'

            year_fianl = row["Publication Year"]
            title_final = row['Title']

            finalname = '(' + author_final + ',' + year_fianl + ') ' + title_final + ', '
            if len(finalname)>200:
                finalname = finalname[0:200] + '...'
            newname =    finalname + '.pdf'
            newname = newname.replace(':',':')
            newname = newname.replace('"', '“')
            os.rename(oldname, newname)
            copySuccess = copySuccess + 1
        except:
            copyFail = copyFail + 1

print('Done.{}Succeed,{}Failed.'.format(copySuccess,copyFail))

通过所有样例。别说,真就像本科acm的时候写大模拟的题目。不过总算写出来了,以后都可以直接用了。

最后根据结果做一点优化,让生成的格式更好看,主要就调整一下空格

import csv
import shutil
import os

copySuccess = 0
copyFail = 0

#唯一需要修改的地方
mycsvfile = 'doc.csv' #存储从zotero导出csv的地址
mypdfdic = '.\pdf' #保存pdf的位置
zoterodic = r'C:\Users\xxr00\Zotero\storage' #zotero保存原始文件的位置



with open(mycsvfile,newline='',encoding='utf-8-sig')as csvfile:
    cr = csv.DictReader(csvfile)

    os.chdir(mypdfdic)  # 修改路径到之前的文件
    mypdfdic = os.getcwd()
    for row in cr:
        print('Copying...{}'.format(row["File Attachments"]))
        docname = (row["File Attachments"])
        try:
            tmp = docname.split(";")
            for i in range(len(tmp)):
                strlen = len(tmp[i])
                if tmp[i][strlen-3:strlen] == 'pdf':
                    tmp = tmp[i]
                    break

            tmp = tmp.split("\\")
            keyword = tmp[5]
            rowpdf_dic = zoterodic + '\\' + keyword
            files = os.listdir(rowpdf_dic)
            pdfdir = ''
            for tmpstr in files:
                strlen = len(tmpstr)
                pdf_yorn = tmpstr[strlen-3:strlen]
                if pdf_yorn == 'pdf':
                    pdfdir = rowpdf_dic + '\\' + tmpstr
                    oldname = tmpstr

            shutil.copy(pdfdir,mypdfdic)

            # 修改文件名
            #作者这个较为复杂,第一个作者只要第一个,两个作者用and,三个作者用etal
            authorname = row["Author"]
            tmp = authorname.split(";")
            for i in range(len(tmp)):
                au = tmp[i]
                tmp[i] = au[0:au.rfind(',', 1)]
            #if len(tmp)>2:

            if len(tmp) == 1:
                author_final = tmp[0]
            elif len(tmp) == 2:
                author_final = tmp[0] + ' and ' + tmp[1]
            else:
                author_final = tmp[0] + ' et.al'

            year_fianl = row["Publication Year"]
            title_final = row['Title']

            finalname = '(' + author_final + ', ' + year_fianl + ') ' + title_final
            if len(finalname)>200:
                finalname = finalname[0:200] + '...'
            newname =    finalname + '.pdf'
            newname = newname.replace(':',':')
            newname = newname.replace('"', '“')
            os.rename(oldname, newname)
            copySuccess = copySuccess + 1
        except:
            copyFail = copyFail + 1

print('Done.{}Succeed,{}Failed.'.format(copySuccess,copyFail))

  • 4
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值