读书期间下载了许多的电子书,但是分类很乱。突发奇想,可以用endnote进行管理,但是手动输入书籍信息很麻烦,于是就想爬取豆瓣读书中的书籍信息。
首先用python爬取豆瓣中书籍的基本信息,如书名,作者,摘要等。然后将其保存为RIS文件,最后用endnote读取RIS文件。
实现GUI界面,输入豆瓣中书籍地址,即可爬取书籍信息,并且保存为RIS文件,可以导入各种文献管理软件之中。最后将其打包成exe文件,具体的下载地址如下:https://download.csdn.net/download/stromlord/12552065
爬取书籍信息,并保存为RIS文件:
def html_request(url, encodeing='utf-8', timeout=5, headers=None):
response = requests.get(url, headers=headers, timeout=timeout)
if response.status_code == 200:
html = response.content.decode(encodeing, 'ignore')
else:
html = None
return html
def html_read(url, encodeing='utf-8', timeout=5, headers=None):
read_count = 0
connect_count = 0
html = ''
while True:
try:
html = html_request(url, encodeing, timeout, headers=headers)
break
except requests.exceptions.ReadTimeout:
time.sleep(3)
print("ReadTimeout", end='')
read_count = read_count + 1
if read_count > 10:
break
except requests.exceptions.ConnectionError:
time.sleep(3)
print("ConnectionError", end='')
connect_count = connect_count + 1
if connect_count > 10:
break
return html
def book_info(book_url, ris_dir='', series=None, ris_flag=True):
book_info_dict