python如何获取指定链接_python – 如何从包含分页的站点中提取链接?(使用selenium)...

我想从以下网站中提取链接,但它确实包括分页:

I want to extract link under the MoreInfo Button:

我正在使用以下代码段:

import time

import requests

import csv

from selenium import webdriver

from selenium.webdriver.common.by import By

from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.common.action_chains import ActionChains

import re

browser = webdriver.Chrome()

time.sleep(5)

browser.get('https://www.usta.com/en/home/play/facility-listing.html?searchTerm=&distance=5000000000&address=Palo%20Alto,%20%20CA')

wait = WebDriverWait(browser,15)

def extract_data(browser):

links = browser.find_elements_by_xpath("//div[@class='seeMoreBtn']/a")

return [link.get_attribute('href') for link in links]

element = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.XPATH, "//a[@class='glyphicon glyphicon-chevron-right']")))

max_pages = int(re.search(r'\d+ de (\d+)', element.text).group(1), re.UNICODE)

# extract from the current (1) page

print("Page 1")

print(extract_data(browser))

for page in range(2, max_pages + 1):

print("Page %d" % page)

next_page = browser.find_element_by_xpath("//a[@class='glyphicon glyphicon-chevron-right']").click()

print(extract_data(browser))

print("-----")

当我运行上面的脚本时我得到了这个错误**(我不太熟悉正则表达式以及只是探索这个概念)**:

Traceback (most recent call last):

File "E:/Python/CSV/testingtesting.py", line 29, in

max_pages = int(re.search(r'\d+ de (\d+)', element.text).group(1), re.UNICODE)

AttributeError: 'NoneType' object has no attribute 'group'

如果可能的话,请建议我解决方案.不知怎的,我设法使用等待和点击分页链接提取链接.但是它的时间已经增加了近13秒的等待时间,而且工作代码如下:

import time

import requests

import csv

from selenium import webdriver

from selenium.webdriver.common.by import By

from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.common.action_chains import ActionChains

import re

# ----------------------------------------------HANDLING-SELENIUM-STUFF-------------------------------------------------

linkList = []

driver = webdriver.Chrome()

time.sleep(5)

driver.get('https://www.usta.com/en/home/play/facility-listing.html?searchTerm=&distance=5000000000&address=Palo%20Alto,%20%20CA')

wait = WebDriverWait(driver,8)

time.sleep(7)

for i in range(1,2925):

time.sleep(3)

# wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "//div[@class='seeMoreBtn']/a")))

links = driver.find_elements_by_xpath("//div[@class='seeMoreBtn']/a")

# print(links.text)

time.sleep(3)

#appending extracted links to the list

for link in links:

value=link.get_attribute("href")

# linkList.append(value)

with open('test.csv','a',encoding='utf-8',newline='') as fp:

writer = csv.writer(fp, delimiter=',')

writer.writerow([value])

# print(i," ",)

time.sleep(1)

driver.find_element_by_xpath("//a[@class='glyphicon glyphicon-chevron-right']").click()

time.sleep(6)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值