将latex bib file中所有引文的title大写

Reza.

已于 2022-07-23 14:32:12 修改

阅读量2.5k

点赞数 2

分类专栏： LaTex

于 2021-12-27 10:31:56 首次发布

本文链接：https://blog.csdn.net/weixin_43301333/article/details/122166474

版权

python latex

LaTex 专栏收录该内容

37 篇文章 63 订阅

订阅专栏

有时候引文里面有些title是需要大写的，但是因为latex在引用的时候，除了首字母之外，如果大写字母外面没有{}的话，大写是不生效的。
再加上有很多journal是有doi或者url的，就会导致引文比较难看。

为了处理上面两个问题，笔者写了一段简单的脚本，能够将bib file规整化。

'''
capitalize the title in your .bib file
remove all url and doi
'''
import argparse
from multiprocessing.connection import wait

parser = argparse.ArgumentParser()

parser.add_argument('-I','--input',type=str,default="./anthology.bib")
parser.add_argument('-O','--output',type=str,default="./anthology_cap.bib")
parser.add_argument('-V','--verbose',action="store_true")
args = parser.parse_args()

no_cap = ["with","of","for","to","from","and","on","in","under","a","by","the"]  # preposition
# dele = ['url', 'doi', 'publisher', 'organization'] 
cus_remain = ['title','author', 'booktitle', 'journal', 'year', 'pages', 'volume', 'number'] # reserved attributes, can be customized
fix = ['@']
remain = cus_remain + fix
new_bib = ""

def upper_already_cap(token:str):
    new_token = ''
    for t in token:
        if t.isupper():
            new_token += '{' + t + '}'
        else:
            new_token += t

    return new_token

def upper_all_tokens(title:str):
    all_tokens = title.split(" ")
    new_tokens = []
    for i,tk in enumerate(all_tokens):
        if i == 0 or tk.lower() not in no_cap:  # or tk[0].isupper
            ## must capitalize
            tk = tk.replace("{","")
            tk = tk.replace("}","")
            new_tk = '{' + tk[0].upper() + '}' + upper_already_cap(tk[1:])
            new_tokens.append(new_tk)
        else:
            new_tokens.append(tk)
    
    return " ".join(new_tokens)

def in_line(strr,line):
    new_str = strr.lower()
    new_line = line.lower()
    return new_str in new_line

def find_index(line,substr):
    if substr in line:
        return line.index('=')
    else:
        return len(line)

with open(args.input,"r",encoding="utf-8") as f:
    ori_bib = f.readlines()
    
for line in ori_bib:
    new_line = None
    com_line = line[:find_index(line,'=')].strip()
    # print(com_line)
    if not any([in_line(t,com_line) for t in remain]) and line != '}\n' and line != '}':
        new_line = ""  ## remove
    elif in_line('title',com_line) and not in_line('booktitle',com_line):
        start = line.index('{')  ## the first position of '{'
        end = len(line) - line[::-1].index("}")  ## the last position of '}'
        title = line[start+1:end-1]
        left,right = line[:start+1],line[end-1:]
        new_title = upper_all_tokens(title)
        new_line = left + new_title + right
    else:
        new_line = line
    
    new_bib += '\n' + new_line if '@' in new_line else new_line

new_bib = '% Encoding: UTF-8\n' + new_bib[1:] # add UTF-Encoding

if not args.verbose:
    print(new_bib)

with open(args.output,"w",encoding="utf-8") as f:
    f.write(new_bib)

print("write capitalized bib file at",args.output)

要使用上述代码的话，用cmd:

## XXX是你给这个脚本的命名, A是原始文件名(路径)，B是输出文件名(路径)
python XXX.py --input A.bib --output B.bib

一般来说建议是paper全部写好之后，比方说改camera ready的时候，最后用一次这个脚本，把所有引文都规整化，以防引文再出现变动。

该代码的github同步仓库：
Bib-normalizer

Reza.

关注

2
点赞
踩
7

收藏

觉得还不错? 一键收藏
2
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录