利用 Selenium 自动化抓取 Web of Science 论文数据：以 IEEE SENSORS JOURNAL 为例

RobotsRuning

已于 2024-07-10 15:07:27 修改

阅读量42

点赞数 1

文章标签： selenium 爬虫 web of science

于 2024-07-10 13:57:20 首次发布

本文链接：https://blog.csdn.net/u014374826/article/details/140322000

版权

在当今数字化时代，科研工作者面临着海量学术信息的挑战。有效地收集、筛选和分析相关领域的最新研究成果，对于保持科研竞争力至关重要。然而，手动检索和整理学术文献不仅耗时耗力，还容易出现疏漏。为了解决这一问题，我们可以借助自动化工具来提高文献检索的效率和准确性。

本文将介绍如何使用 Python 和 Selenium WebDriver 来自动化抓取 Web of Science 上的论文数据。我们以 IEEE SENSORS JOURNAL 为例，展示了如何编写脚本来模拟用户操作，包括登录、导航、搜索、以及批量提取论文标题和发表日期等信息。这种方法不仅可以大大提高文献收集的效率，还能为后续的数据分析奠定基础。

话不多说，直接上代码：

from selenium import webdriver
# from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait, Select
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchWindowException, NoSuchElementException, StaleElementReferenceException
# import tkinter as tk
# from tkinter import simpledialog
import time
import datetime
# import re


# 设置账号和密码
# username_str = ''
# password_str = ''
# 获取当前时间
now = datetime.datetime.now()
year = now.strftime("%Y")    # 提取年
month = now.strftime("%m")   # 提取月
day = now.strftime("%d")     # 提取日
print(f"Year: {year}, Month: {month}, Day: {day}")



print("正在尝试正在打开 wuyoutsg.com 网址")

# ChromeDriver 路径
driver_path = r'd:\chromedriver-win64\chromedriver.exe'

# 初始化 WebDriver
wd = webdriver.Chrome(executable_path=driver_path)

# 打开网址
wd.get('http://www.wuyoutsg.com')
time.sleep(3)  # 在这里暂停 3 秒
wd.maximize_window()

print("wuyoutsg.com 网址已经在 Chrome 浏览器打开")


print("正在尝试输入账号")
username_input = wd.find_element(By.XPATH, '//input[@placeholder="用户名"]')  # 这里的XPath根据实际网页内容调整
username_

最低0.47元/天解锁文章

RobotsRuning

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
利用 Selenium 自动化抓取 Web of Science 论文数据：以 IEEE SENSORS JOURNAL 为例

利用 Selenium 自动化抓取 Web of Science 论文数据：以 IEEE SENSORS JOURNAL 为例
复制链接

扫一扫