爬取豆瓣书籍信息并且保存为ris文件并导入Endnote

最新推荐文章于 2023-11-16 14:08:13 发布

stromlord

最新推荐文章于 2023-11-16 14:08:13 发布

阅读量2.1k

点赞数 1

文章标签： python

本文链接：https://blog.csdn.net/stromlord/article/details/106963163

版权

读书期间下载了许多的电子书，但是分类很乱。突发奇想，可以用endnote进行管理，但是手动输入书籍信息很麻烦，于是就想爬取豆瓣读书中的书籍信息。

首先用python爬取豆瓣中书籍的基本信息，如书名，作者，摘要等。然后将其保存为RIS文件，最后用endnote读取RIS文件。

实现GUI界面，输入豆瓣中书籍地址，即可爬取书籍信息，并且保存为RIS文件，可以导入各种文献管理软件之中。最后将其打包成exe文件，具体的下载地址如下：https://download.csdn.net/download/stromlord/12552065

爬取书籍信息，并保存为RIS文件：

def html_request(url, encodeing='utf-8', timeout=5, headers=None):
    response = requests.get(url, headers=headers, timeout=timeout)
    if response.status_code == 200:
        html = response.content.decode(encodeing, 'ignore')
    else:
        html = None
    return html


def html_read(url, encodeing='utf-8', timeout=5, headers=None):
    read_count = 0
    connect_count = 0
    html = ''
    while True:
        try:
            html = html_request(url, encodeing, timeout, headers=headers)
            break
        except requests.exceptions.ReadTimeout:
            time.sleep(3)
            print("ReadTimeout", end='')
            read_count = read_count + 1
            if read_count > 10:
                break
        except requests.exceptions.ConnectionError:
            time.sleep(3)
            print("ConnectionError", end='')
            connect_count = connect_count + 1
            if connect_count > 10:
                break
    return html


def book_info(book_url, ris_dir='', series=None, ris_flag=True):
    book_info_dict

最低0.47元/天解锁文章