将latex bib file中所有引文的title大写

有时候引文里面有些title是需要大写的,但是因为latex在引用的时候,除了首字母之外,如果大写字母外面没有{}的话,大写是不生效的。
再加上有很多journal是有doi或者url的,就会导致引文比较难看。

为了处理上面两个问题,笔者写了一段简单的脚本,能够将bib file规整化。

'''
capitalize the title in your .bib file
remove all url and doi
'''
import argparse
from multiprocessing.connection import wait

parser = argparse.ArgumentParser()

parser.add_argument('-I','--input',type=str,default="./anthology.bib")
parser.add_argument('-O','--output',type=str,default="./anthology_cap.bib")
parser.add_argument('-V','--verbose',action="store_true")
args = parser.parse_args()

no_cap = ["with","of","for","to","from","and","on","in","under","a","by","the"]  # preposition
# dele = ['url', 'doi', 'publisher', 'organization'] 
cus_remain = ['title','author', 'booktitle', 'journal', 'year', 'pages', 'volume', 'number'] # reserved attributes, can be customized
fix = ['@']
remain = cus_remain + fix
new_bib = ""

def upper_already_cap(token:str):
    new_token = ''
    for t in token:
        if t.isupper():
            new_token += '{' + t + '}'
        else:
            new_token += t

    return new_token

def upper_all_tokens(title:str):
    all_tokens = title.split(" ")
    new_tokens = []
    for i,tk in enumerate(all_tokens):
        if i == 0 or tk.lower() not in no_cap:  # or tk[0].isupper
            ## must capitalize
            tk = tk.replace("{","")
            tk = tk.replace("}","")
            new_tk = '{' + tk[0].upper() + '}' + upper_already_cap(tk[1:])
            new_tokens.append(new_tk)
        else:
            new_tokens.append(tk)
    
    return " ".join(new_tokens)

def in_line(strr,line):
    new_str = strr.lower()
    new_line = line.lower()
    return new_str in new_line

def find_index(line,substr):
    if substr in line:
        return line.index('=')
    else:
        return len(line)

with open(args.input,"r",encoding="utf-8") as f:
    ori_bib = f.readlines()
    
for line in ori_bib:
    new_line = None
    com_line = line[:find_index(line,'=')].strip()
    # print(com_line)
    if not any([in_line(t,com_line) for t in remain]) and line != '}\n' and line != '}':
        new_line = ""  ## remove
    elif in_line('title',com_line) and not in_line('booktitle',com_line):
        start = line.index('{')  ## the first position of '{'
        end = len(line) - line[::-1].index("}")  ## the last position of '}'
        title = line[start+1:end-1]
        left,right = line[:start+1],line[end-1:]
        new_title = upper_all_tokens(title)
        new_line = left + new_title + right
    else:
        new_line = line
    
    new_bib += '\n' + new_line if '@' in new_line else new_line

new_bib = '% Encoding: UTF-8\n' + new_bib[1:] # add UTF-Encoding

if not args.verbose:
    print(new_bib)

with open(args.output,"w",encoding="utf-8") as f:
    f.write(new_bib)

print("write capitalized bib file at",args.output)

要使用上述代码的话,用cmd:

## XXX是你给这个脚本的命名, A是原始文件名(路径),B是输出文件名(路径)
python XXX.py --input A.bib --output B.bib

一般来说建议是paper全部写好之后,比方说改camera ready的时候,最后用一次这个脚本,把所有引文都规整化,以防引文再出现变动。

该代码的github同步仓库:
Bib-normalizer

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值