python模态窗口_Python Webscraping Selenium和BeautifulSoup（模态窗口内容）

weixin_39994665

于 2020-12-16 20:49:04 发布

阅读量110

点赞数

文章标签： python模态窗口

本文链接：https://blog.csdn.net/weixin_39994665/article/details/111442720

版权

博主在学习网页抓取时遇到一个问题：在Quora上点击按钮打开新元素后，无法获取新元素的页面源码。经过尝试，发现问题在于没有等待页面加载完成。解决方案是在点击按钮后添加适当的时间延迟，如`time.sleep(sleep_time)`，这样页面源码就能正确更新，成功获取到弹窗内容。博主强调在网页抓取中耐心和等待的重要性。

摘要由CSDN通过智能技术生成

I am trying to learn webscraping (I am a total novice). I noticed that on some websites (for eg. Quora), when I click a button and a new element comes up on screen. I cannot seem to get the page source of the new element. I want to be able to get the page source of the new popup and get all the elements. Note that you need to have a Quora account in order to understand my problem.

I have a part of a code that you can use using beautifulsoup, selenium and chromedriver:

from selenium import webdriver

from bs4 import BeautifulSoup

from unidecode import unidecode

import time

sleep = 10

USER_NAME = 'Insert Account name' #Insert Account name here

PASS_WORD = 'Insert Account Password' #Insert Account Password here

url = 'Insert url'

url2 = ['insert url']

#Logging in to your account

driver = webdriver.Chrome('INSERT PATH TO CHROME DRIVER')

driver.get(url)

page_source=driver.page_source

if 'Continue With Email' in page_source:

try:

username = driver.find_element(By.XPATH, '//input[@placeholder="Email"]')

password = driver.find_element(By.XPATH, '//input[@placeholder="Password"]')

username.send_keys(USER_NAME)

password.send_keys(PASS_WORD)

time.sleep(sleep)

except:

print ('Did not work :( .. Try again')

else:

print ('Did not work :( .. Try different page')

Next part will go to the concerned webpage and ("try to") collect information about the followers of a particular question.

for url1 in url2:

driver.get(url1)

source = driver.page_source

soup1 = BeautifulSoup(source,"lxml")

Follower_button = soup1.find('a',{'class':'FollowerListModalLink QuestionFollowerListModalLink'})

Follower_button2 = unidecode(Follower_button.text)

driver.find_element_by_link_text(Follower_button2).click()

####Does not gives me correct page source in the next line####

source2=driver.page_source

soup2=BeautifulSoup(source2,"lxml")

follower_list = soup2.findAll('div',{'class':'FollowerListModal QuestionFollowerListModal Modal'})

if len(follower_list)>0:

print 'It worked :)'

else:

print 'Did not work :('

However when I try to get the page source of the followers element, I end up getting the page source of the main page rather than the follower element. Can anyone help me to get the page source of the follower element that pops up?? What am I not getting here.

NOTE:

Another way of recreating or looking at my problem is to log in to your Quora account (if you have one) and then go to any question with followers. If you click the followers button on the lower right side of the screen, that will result in a popup. My problem is essentially to get the elements of this popup.

Update -

Okay so I have been reading a bit and it seems like the window is a modal window. Does anyone help me with getting contents of a modal window?

解决方案

Problem resolved. All I had to do was to add one line:

time.sleep(sleep_time)

after generating the click. The problem was because there was no wait time initially, the page source was not getting updated. However with time.sleep sufficiently long (may vary from website to website), the page source finally got updated and I was able to get the required elements. :) Lesson learnt. Patience is the key to web scraping. Spent the entire day trying to figure this out.

weixin_39994665

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python模态窗口_Python Webscraping Selenium和BeautifulSoup（模态窗口内容）

I am trying to learn webscraping (I am a total novice). I noticed that on some websites (for eg. Quora), when I click a button and a new element comes up on screen. I cannot seem to get the page sourc...
复制链接

扫一扫