arXivScraper 使用教程

甄如冰Lea

于 2024-08-31 09:23:13 发布

阅读量884

点赞数 10

本文链接：https://blog.csdn.net/gitblog_00714/article/details/141743941

版权

arXivScraper 使用教程

arxivscraperA python module to scrape arxiv.org for a date range and category项目地址:https://gitcode.com/gh_mirrors/ar/arxivscraper

项目介绍

arXivScraper 是一个用于从 arXiv.org 抓取学术论文的 Python 模块。它允许用户根据特定的学科类别和日期范围检索论文记录。arXiv 是一个提供大量跨学科学术文章的预印本服务器，涵盖物理、数学、计算机科学等多个领域。通过 arXivScraper，研究人员可以方便地获取早期研究资料。

项目快速启动

安装

你可以通过 pip 安装 arXivScraper：

pip install arxivscraper

或者下载源码并使用 setup.py 安装：

python setup.py install

使用示例

以下是一个简单的使用示例，展示如何从凝聚态物理类别中抓取论文：

import arxivscraper

# 创建一个抓取器实例
scraper = arxivscraper.Scraper(category='physics:cond-mat', date_from='2017-05-27', date_until='2017-06-07')

# 开始抓取
output = scraper.scrape()

# 将输出转换为 pandas DataFrame
import pandas as pd
cols = ('id', 'title', 'categories', 'abstract', 'doi', 'created')
df = pd.DataFrame(output, columns=cols)

print(df)