Photo by Iñaki del Olmo on Unsplash
In this lesson for beginners, we will learn how to access the Crossref database, the largest source of information about journal articles, books, and chapters, and generate a citation based on the academic style required.
We will use a convenient wrapper around the Crossref API called habanero
.
Installation
First, we need to install the package. Please run the following command in your Terminal:
pip install habanero
…or if you work inside a Jupyter Notebook environment, copy and paste this command into one of the cells and press Ctrl+Enter:
!pip install habanero
The program
Second, we import the so-called cn
module from habanero
which allows us to get citations in various formats from CrossRef:
from habanero import cn
Now, we need to stop and think for a moment about the structure of our program. Because we want to decode not just one DOI but as many as we like and even reuse that piece of code in other scripts or parts of the program, the best way to organize it is with a function:
from habanero import cn
def get_citation(doi, style):
citation = cn.content_negotiation(ids=doi, format="text", style=style)
return citation
We give this function a meaningful name get_citation
and define it with 2 parameters: doi
and style
, which we will later pass to habanero.
We call the content_negotiation
method of the cn
module, which in turn accepts 3 parameters: ids
, format
, and style
. According to the documentation, the first parameter ids
is the DOI identifier that needs to be a string (e.g. “10.1126/science.169.3946.635”). The second parameter format
is also a string and by default is bibtex
. bibtex
is a citation format that looks like this:
@article{knuth:1984,
title={Literate Programming},
author={Donald E. Knuth},
journal={The Computer Journal},
volume={27},
number={2},
pages={97--111},
year={1984},
publisher={Oxford University Press}
}
However, many prefer to use a simpler plain text version that can be pasted directly into a Word document, so I would like to override that parameter to be text
. As an exercise, you can experiment with different formats of citations (please refer to habanero documentation for a list of possible values). Finally, we pass in the style
parameter, which also is a string.
We then store the return value in an intermediary variable we call citation
and return it as an output. We can also skip the intermediary variable and return the output directly.
A note on error handling
In the case that a particular DOI is not found in the Crossref database, habanero will raise an HTTPError
; the program will exit, because it doesn’t know what to do next. It is a good idea to catch that error and tell the program exactly how to handle this situation. We can rewrite our function to add a special try, except
block that will help us to better control the logic of our program:
from habanero import cn
# Additional import
from requests import HTTPError
def get_citation(doi, style):
try:
citation = cn.content_negotiation(ids=doi, format="text", style=style)
except HTTPError:
return None
return citation
We need to import the definition of the error (HTTPError
) from the requests
module so that our program can identify and react to it accordingly (you may need to install the requests
module first: pip install requests
). We put a piece of code that we know can result in an exception inside the try
block and catch that error with the except
keyword. We then write our error-handling logic inside the except
block. In our case, we simply return None
.
For a more comprehensive overview of error handling, please refer to https://www.pythoncheatsheet.org/cheatsheet/exception-handling
If we run our program now, we will get the following citation in APA format:
citation = get_citation("10.1126/science.169.3946.635", "apa")
print(citation)
Frank, H. S. (1970). The Structure of Ordinary Water. Science, 169(3946), 635–641. https://doi.org/10.1126/science.169.3946.635
…or None
in the case of the error described earlier.