How to Get a Citation from DOI with Python

本文链接：https://blog.csdn.net/sergeyyurkov1/article/details/130090743

本文是针对初学者的教程，介绍了如何使用Python的habanero库访问Crossref数据库，获取期刊文章、书籍和章节的信息，并根据所需的学术引用样式生成引用。文章详细讲解了安装habanero库、定义函数以处理DOI和引用样式，以及如何进行错误处理来确保程序的健壮性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

在这里插入图片描述 Photo by Iñaki del Olmo on Unsplash

In this lesson for beginners, we will learn how to access the Crossref database, the largest source of information about journal articles, books, and chapters, and generate a citation based on the academic style required.

We will use a convenient wrapper around the Crossref API called habanero.

Installation

First, we need to install the package. Please run the following command in your Terminal:

pip install habanero

…or if you work inside a Jupyter Notebook environment, copy and paste this command into one of the cells and press Ctrl+Enter:

!pip install habanero

The program

Second, we import the so-called cn module from habanero which allows us to get citations in various formats from CrossRef:

from habanero import cn

Now, we need to stop and think for a moment about the structure of our program. Because we want to decode not just one DOI but as many as we like and even reuse that piece of code in other scripts or parts of the program, the best way to organize it is with a function:

from habanero import cn

def get_citation(doi, style):
    citation = cn.content_negotiation(ids=doi, format="text", style=style)

    return citation

We give this function a meaningful name get_citation and define it with 2 parameters: doi and style, which we will later pass to habanero.

We call the content_negotiation method of the cn module, which in turn accepts 3 parameters: ids, format, and style. According to the documentation, the first parameter ids is the DOI identifier that needs to be a string (e.g. “10.1126/science.169.3946.635”). The second parameter format is also a string and by default is bibtex. bibtex is a citation format that looks like this:

@article{knuth:1984,
  title={Literate Programming},
  author={Donald E. Knuth},
  journal={The Computer Journal},
  volume={27},
  number={2},
  pages={97--111},
  year={1984},
  publisher={Oxford University Press}
}

However, many prefer to use a simpler plain text version that can be pasted directly into a Word document, so I would like to override that parameter to be text. As an exercise, you can experiment with different formats of citations (please refer to habanero documentation for a list of possible values). Finally, we pass in the style parameter, which also is a string.

We then store the return value in an intermediary variable we call citation and return it as an output. We can also skip the intermediary variable and return the output directly.

A note on error handling

In the case that a particular DOI is not found in the Crossref database, habanero will raise an HTTPError; the program will exit, because it doesn’t know what to do next. It is a good idea to catch that error and tell the program exactly how to handle this situation. We can rewrite our function to add a special try, except block that will help us to better control the logic of our program:

from habanero import cn

# Additional import
from requests import HTTPError

def get_citation(doi, style):
    try:
        citation = cn.content_negotiation(ids=doi, format="text", style=style)
    except HTTPError:
        return None

    return citation

We need to import the definition of the error (HTTPError) from the requests module so that our program can identify and react to it accordingly (you may need to install the requests module first: pip install requests). We put a piece of code that we know can result in an exception inside the try block and catch that error with the except keyword. We then write our error-handling logic inside the except block. In our case, we simply return None.

For a more comprehensive overview of error handling, please refer to https://www.pythoncheatsheet.org/cheatsheet/exception-handling

If we run our program now, we will get the following citation in APA format:

citation = get_citation("10.1126/science.169.3946.635", "apa")

print(citation)

Frank, H. S. (1970). The Structure of Ordinary Water. Science, 169(3946), 635–641. https://doi.org/10.1126/science.169.3946.635

…or None in the case of the error described earlier.