python可以用来写导航吗_在python中使用硒导航

I'm scraping this website using Python and Selenium. But it currently only scrapes the first 10 page for the month of July, it turns the page number of the previous sibling of the next button into int and clicks next number_of_pages - 1 however after it gets to page 10 it stops.

Can anyone help me to get it to scrape all the pages?

def pagination( driver ):

data = []

last_element = driver.find_element_by_xpath('//a[ contains( concat( " ", normalize-space( @class ), " "), " next ") ]/preceding-sibling::a[1]')

if last_element is None:

number_of_pages = 1

else:

number_of_pages = int( last_element.text )

# data = [ getData( driver ) ]

data.extend(getData(driver))

for i in range(number_of_pages - 1):

driver.find_element_by_xpath('//a[ contains( concat( " ", normalize-space( @class ), " "), " next ") ]').click()

data.extend( getData( driver ) )

time.sleep(1)

return data

解决方案

Look, I understand you took the idea of calculating the total number of pages from my answer for a previous question of yours. In the previous case since the last page number was directly available to us, it worked but that's not the case here.

Solution :

Although the number of pages is not directly available but the total number of entries is -

Now, as you can see in the above screenshot for the case of July this number is 174. Assuming you put the pagination length(the number of entries in a single page) as default 10, the number of pages should be 18 (17 pages of 10 entries each and one extra page for remaining 4 entries).

So, the logic of calculating the number of pages should be simple. If you somehow got this total number of entries in total_entries variable, the number of pages should be(taken from this:

number_of_pages = (total_entries/10) + 1

Python by default returns the lower bound integer by division operator so 174/10 will return 17 and adding +1 will return 18. So there you have it- 18 as the number of pages.

Now, to extract the total number of entries. You use the below locator to find the element holding that.

driver.find_element_by_xpath('//span[@class='showing']')

But this element contains text like this - Showing 1-10 of 174. You need only the 174 part from the entire string. To do that, first you extract the string after "of" and then convert it into int.

Algorithm to extract the total number of entries as int from the text:

showing_text = driver.find_element_by_xpath("//span[@class='showing']").text #Showing 1-10 of 174

number_of_entries_text = showing_text.split("of",1)[1] # 174 as text

number_of_entries = int( re.findall(r'\d+',number_of_entries_text)[0]) #174 as int

number_of_pages = (number_of_entries/10) + 1 #18

Final code:

def pagination( driver ):

data = []

last_element = driver.find_element_by_xpath("//span[@class='showing']")

if last_element is None:

number_of_pages = 1

else:

showing_text = driver.find_element_by_xpath("//span[@class='showing']").text number_of_entries_text = showing_text.split("of",1)[1]

number_of_entries = int( re.findall(r'\d+',number_of_entries_text)[0])

number_of_pages = (number_of_entries/10) +1

for i in range(number_of_pages - 1):

driver.find_element_by_xpath('//a[ contains( concat( " ", normalize-space( @class ), " "), " next ") ]').click()

time.sleep(1)

Note:

I think my solution is better since you don't have to repeatedly check for any element to be available or to catch any exceptions. You just directly get the number of pages and you click the next button that many times.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值