I am trying to extract the source link of an HTML5 video found in the video tag . Using Firefox webdrive , I am able to get the desired result ie -
[]
but if I use PhantomJS -
I suspect this is because of PhantomJS' lack of HTML5 Video support . Is there anyway I can trick the webpage into thinking that HTML5 Video is supported so that it generates the URL ? Or can I do something else ?
tried this
try:
WebDriverWait(browser,10).until(EC.presence_of_element_located((By.XPATH, "//video")))
finally:
k = browser.page_source
browser.quit()
soup = BeautifulSoup(k,'html.parser')
print (soup.find_all('video'))
解决方案
The way Firefox and phantomjs webdrivers communicate with Selenium are quite different.
When using Firefox, it signals back that the page has finished loading after it loaded some of the javascript
Differently in phantomjs, it signals Selenium that the page has finished loading as soon as it is able to get the page source meaning it wouldn't have loaded any javascript.
What you need to do is Wait for the element to be present before extracting it, in this case it would be:
video = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//video")))
EDIT:
Youtube first checks if the browser supports the video content before deciding whether to provide the source, theres a workaround though described here