python模态窗口_Python Webscraping Selenium和BeautifulSoup(模态窗口内容)

博主在学习网页抓取时遇到一个问题:在Quora上点击按钮打开新元素后,无法获取新元素的页面源码。经过尝试,发现问题在于没有等待页面加载完成。解决方案是在点击按钮后添加适当的时间延迟,如`time.sleep(sleep_time)`,这样页面源码就能正确更新,成功获取到弹窗内容。博主强调在网页抓取中耐心和等待的重要性。
摘要由CSDN通过智能技术生成

I am trying to learn webscraping (I am a total novice). I noticed that on some websites (for eg. Quora), when I click a button and a new element comes up on screen. I cannot seem to get the page source of the new element. I want to be able to get the page source of the new popup and get all the elements. Note that you need to have a Quora account in order to understand my problem.

I have a part of a code that you can use using beautifulsoup, selenium and chromedriver:

from selenium import webdriver

from bs4 import BeautifulSoup

from unidecode import unidecode

import time

sleep = 10

USER_NAME = 'Insert Account name' #Insert Account name here

PASS_WORD = 'Insert Account Password' #Insert Account Password here

url = 'Insert url'

url2 = ['insert url']

#Logging in to your account

driver = webdriver.Chrome('INSERT PATH TO CHROME DRIVER')

driver.get(url)

page_source=driver.page_source

if 'Continue With Email' in page_source:

try:

username = driver.find_element(By.XPATH, '//input[@placeholder="Email"]')

password = driver.find_element(By.XPATH, '//input[@placeholder="Password"]')

login= driver.find_element(By.XPATH, '//input[@value="Login"]')

username.send_keys(USER_NAME)

password.send_keys(PASS_WORD)

time.sleep(sleep)

login.click()

time.sleep(sleep)

except:

print ('Did not work :( .. Try again')

else:

print ('Did not work :( .. Try different page')

Next part will go to the concerned webpage and ("try to") collect information about the followers of a particular question.

for url1 in url2:

driver.get(url1)

source = driver.page_source

soup1 = BeautifulSoup(source,"lxml")

Follower_button = soup1.find('a',{'class':'FollowerListModalLink QuestionFollowerListModalLink'})

Follower_button2 = unidecode(Follower_button.text)

driver.find_element_by_link_text(Follower_button2).click()

####Does not gives me correct page source in the next line####

source2=driver.page_source

soup2=BeautifulSoup(source2,"lxml")

follower_list = soup2.findAll('div',{'class':'FollowerListModal QuestionFollowerListModal Modal'})

if len(follower_list)>0:

print 'It worked :)'

else:

print 'Did not work :('

However when I try to get the page source of the followers element, I end up getting the page source of the main page rather than the follower element. Can anyone help me to get the page source of the follower element that pops up?? What am I not getting here.

NOTE:

Another way of recreating or looking at my problem is to log in to your Quora account (if you have one) and then go to any question with followers. If you click the followers button on the lower right side of the screen, that will result in a popup. My problem is essentially to get the elements of this popup.

Update -

Okay so I have been reading a bit and it seems like the window is a modal window. Does anyone help me with getting contents of a modal window?

解决方案

Problem resolved. All I had to do was to add one line:

time.sleep(sleep_time)

after generating the click. The problem was because there was no wait time initially, the page source was not getting updated. However with time.sleep sufficiently long (may vary from website to website), the page source finally got updated and I was able to get the required elements. :) Lesson learnt. Patience is the key to web scraping. Spent the entire day trying to figure this out.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值