Python爬虫实战，Selenium爬取全本小说

最新推荐文章于 2024-05-08 21:32:09 发布

smallql1314

最新推荐文章于 2024-05-08 21:32:09 发布

阅读量1.1k

点赞数 20

文章标签： python 爬虫 selenium

本文链接：https://blog.csdn.net/smallql1314/article/details/136383414

版权

爬取目标

随便打开一个盗版小说网站，爬取策略和样式都差不多。

例如：https://www.00ksw.com/html/3/3804/

爬取目标，把这个小说，爬取存储到本地的Txt文件

Selenium代码

from selenium.webdriver.common.by import By  
from selenium.webdriver.support.wait import WebDriverWait  
  
url = "https://www.00ksw.com/html/3/3804/"  
  
from selenium.webdriver import Chrome  
  
from selenium.webdriver.chrome.options import Options  
  
# 创建Chrome浏览器的配置选项  
chrome_options = Options()  
chrome_options.add_argument("--headless")  # 设置为无头模式  
  
# 初始化Chrome浏览器对象  
driver = Chrome(options=chrome_options)  
  
driver.get(url)  
# print(driver.page_source)  
  
wait = WebDriverWait(driver, 10)  
wait.until(lambda d: "ml_list" in d.page_source)  
  
  
# print(driver.page_source)