使用selenium爬取飞卢小说网排行榜的小说标题

本文链接：https://blog.csdn.net/andux/article/details/134005116

使用selenium集成化爬虫工具，它的xpath跟scrapy等有一定区别，scrapy等需要使用get()或者extract()等函数来获取dom节点，而selenium就不需要了。

飞卢小说网的排行榜中，每个小说块的class都是一样的，这样获取整页的div块的xpath就很好写了。

app.py

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time

service = Service(executable_path="./driver/chromedriver.exe")
driver = webdriver.Chrome(service=service)
url = "https://b.faloo.com/y_7_0_0_0_0_1_1.html"
driver.get(url=url)
try:
    boxs = driver.find_elements(By.XPATH, '//div[@class="TwoBox02_02"]')
    for box in boxs:
        h1 = box.find_element(By.XPATH, './/h1[@class="fontSize17andHei"]')
        title = h1.get_attribute("title")
        print(title)
except Exception as e:
    print(e)

time.sleep(2)
driver.quit()

把爬取代码放到try里面，是为了出错时程序不会中断，继续执行下去。

需要注意获取飞卢小说网小说标题的xpath是：

h1 = box.find_element(By.XPATH, './/h1[@class="fontSize17andHei"]')

而不是：

h1 = box.find_element(By.XPATH, '//h1[@class="fontSize17andHei"]')

一个点（.）是指当前子目录中，没有点的话，就是直接从根目录开始了。

1. Example syntax using a single attribute (relative XPath type):

// a [@href=’http://www.google.com’]
//input[@id=’name’]
//input[@name=’username’]
//img[@alt=’sometext’]

2. Example syntax using multiple attributes (relative XPath type):

//tagname[@attribute1=’value1’] [attribute2=’value2’]
//a[@id=’id1’] [@name=’namevalue1’]
//img[@src=’’] [@href=’’]

3. Example syntax of a search using “contains” (here, to create an account):

//tagname[contains(@attribute,’value1’)]
//input[contains(@id,’’)]
//input[contains(@name,’’)]
//a[contains(@href,’’)]
//img[contains(@src,’’)]
//div[contains(@id,’’)]

4. Example syntax of a search using “starts-with:”:

//tagname[starts-with(@attribute-name,’’)]
//id[starts-with(@id,’’)]
//a[starts-with(@href=’’)]
//img[starts-with(@src=’’)]
//div[starts-with(@id=’’)]
//input[starts-with(@id=’’)]
//button[starts-with(@id,’’)]

5. Example syntax for using the following node:

Xpath/following::again-ur-regular-path
//input[@id=’’]/following::input[1]
//a[@href=’’]/following::a[1] 
//img[@src=’’]/following::img[1]

6. Example syntax for using the preceding node:

Xpath/preceding::again-ur-regular-path
 //input[@id=’’]/ preceding::input[1]
 //a[@href=’’]/ preceding::a[1]
 //img[@src=’’]/ preceding::img[1]

7. Example syntax for using the absolute XPath type:

1-/html/head/body/div/input

8. Example syntax for text search in XPath:

Syntax- tagname[text()=’text which is visible on page’]

9. Example syntax for text search with ‘contains’:

/*[contains(text(),’Employee Distribution by Subunit’)]