在写爬虫时遇到有些网页加载超时的情况,以下对比一下他们的优缺点:
WebDriverWait():selenium设置元素发现超时等待时间
WebDriverWait()函数是在在设置时间内,默认每隔一段时间检测一次当前页面所指定元素是否存在,如果超过设置时间检测不到则抛出异常。
用法: WebDriverWait(driver, 20, 0.5).until(EC.presence_of_element_located((By.ID, "kw")))
until(method,message):
- method: 在等待期间,每隔poll_frequency时间之后调用这个传入的方法,直到返回值不是False或者超出timeout时间范围才不再执行
- message :如果超时,抛出TimeoutException,将message传入异常
注: until_not 与until方法刚好相反,until是当某元素出现或什么条件成立则继续执行,until_not是当某元素消失或什么条件不成立则继续执行,参数也相同,
WebDriverWait():
In [12]: WebDriverWait()
driver= %%!
ignored_exceptions= "Application Data"
poll_frequency= "Local Settings" >
self= "My Documents"
timeout= "Saved Games"
- driver: WebDriver 的驱动程序
- timeout:最长超时时间,默认以秒为单位
- poll_frequency:调用until或until_not中的方法的休眠时间的间隔(步长)时间,默认为 0.5 秒
- ignored_exceptions:这里设置忽略的异常如果在调用until或until_not的过程中抛出中的异常元组中, 则不中断代码,继续等待,如果抛出的是这个元组外的异常,则中断代码,抛出异常。默认只有NoSuchElementException。
在使用 presence_of_element_located()函数检查元素是否存在或加载完成时,这个函数传入的是一个元组参数,而非两个单独的参数,错误代码如下:
In [12]: WebDriverWait(driver, 20, 0.5).until(EC.presence_of_element_located(By.ID, "kw"))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)<ipython-input-12-84622169566b> in <module>()
----> 1 WebDriverWait(driver, 20, 0.5).until(EC.presence_of_element_located(By.ID, "kw"))
TypeError: __init__() takes exactly 2 arguments (3 given)
只需要一个参数,并且只能是一个元组,正确写法如下:
In [13]: WebDriverWait(driver, 20, 0.5).until(EC.presence_of_element_located((By.ID, "kw")))
Out[13]: <selenium.webdriver.remote.webelement.WebElement (session="b0f52d40-582
d-11e7-9ffc-7d9ee5fd2752", element=":wdc:1498233922094")>
implicitly_wait(timeout):隐式等待
如果某些元素没有找到, 不是立即可用的,隐式等待是告诉WebDriver去等待一定的时间后去查找元素。 默认等待时间是0秒,一旦设置该值,隐式等待是设置该WebDriver的实例的生命周期。
sleep:进程等待
有些时候我们喜欢将进程睡眠几秒钟而使网页加载完成。
三者的比较
测试用例如下(本例为了测试时间找了一个网页中没有的标签进行测试):
WebDriverWait()用例
import datetime
from selenium import webdriver
from selenium.webdriver import DesiredCapabilities
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
dcap = dict(DesiredCapabilities.PHANTOMJS)
user_agent = "Mozilla/5.0WindowsNT6.1WOW64AppleWebKit/535.8KHTML,likeGeckoBeamrise/17.2.0.9Chrome/17.0.939.0Safari/535.8"
dcap["phantomjs.page.settings.userAgent"] = user_agent
driver = webdriver.PhantomJS(desired_capabilities=dcap)
# driver.implicitly_wait(10)
start_time = datetime.datetime.now()
print 'start_time: ', start_time
driver.get('https://www.baidu.com')
t = datetime.datetime.now()
try:
element = WebDriverWait(driver, 10, 0.5).until(EC.presence_of_element_located((By.CLASS_NAME, "gettell")))
element.click()
except Exception, e:
print e
end_time = datetime.datetime.now()
print "Sds", (t - start_time).seconds
print "time", (end_time - start_time).seconds
driver.quit()
测试结果:
E:\usr\Anaconda2\python.exe C:/Users/Administrator/Desktop/ershouche/wait.py
start_time: 2017-06-24 03:29:41.140000
Message:
Screenshot: available via screen
Sds 0
time 11
Process finished with exit code 0
可以看到总用时10秒, 打开网页小于1秒,等待10秒。查找元素小于1 秒。下例查找网页中存在的元素:
import datetime
from selenium import webdriver
from selenium.webdriver import DesiredCapabilities
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
dcap = dict(DesiredCapabilities.PHANTOMJS)
user_agent = "Mozilla/5.0WindowsNT6.1WOW64AppleWebKit/535.8KHTML,likeGeckoBeamrise/17.2.0.9Chrome/17.0.939.0Safari/535.8"
dcap["phantomjs.page.settings.userAgent"] = user_agent
driver = webdriver.PhantomJS(desired_capabilities=dcap)
# driver.implicitly_wait(10)
start_time = datetime.datetime.now()
print 'start_time: ', start_time
driver.get('https://www.baidu.com')
t = datetime.datetime.now()
try:
element = WebDriverWait(driver, 10, 0.5).until(EC.presence_of_element_located((By.ID, "su")))
element.click()
except Exception, e:
print e
end_time = datetime.datetime.now()
print "Sds", (t - start_time).seconds
print "time", (end_time - start_time).seconds
driver.quit()
测试结果:
E:\usr\Anaconda2\python.exe C:/Users/Administrator/Desktop/ershouche/wait.py
start_time: 2017-06-24 03:42:04.557000
Sds 0
time 0
Process finished with exit code 0
从此结果看出,打开网页速度小于1秒,查找元素时间少于1秒,程序执行只需要不到一秒就完成了。所以我们得到的结果是:
- WebDriverWait()只要在最大时间内找到元素就会继续向下执行程序,没有找到就继续按照时间间隔去查找,直到超过最大时间限制则抛出超时异常
implicitly_wait(timeout)用例
import datetime
from selenium import webdriver
from selenium.webdriver import DesiredCapabilities
dcap = dict(DesiredCapabilities.PHANTOMJS)
user_agent = "Mozilla/5.0WindowsNT6.1WOW64AppleWebKit/535.8KHTML,likeGeckoBeamrise/17.2.0.9Chrome/17.0.939.0Safari/535.8"
dcap["phantomjs.page.settings.userAgent"] = user_agent
driver = webdriver.PhantomJS(desired_capabilities=dcap)
start_time = datetime.datetime.now()
print 'start_time: ', start_time
driver.implicitly_wait(10)
driver.get('https://www.baidu.com')
t = datetime.datetime.now()
driver.save_screenshot("ssd.png")
ts = datetime.datetime.now()
try:
driver.find_element_by_id("su").click()
except Exception, e:
print e
end_time = datetime.datetime.now()
print "Sds", (t - start_time).seconds
print "time", (end_time - start_time).seconds
print "s", (ts - start_time).seconds
driver.quit()
测试结果如下:
start_time: 2017-06-24 04:14:36.779000
Sds 0
time 0
s 0
打开网页时间小于1秒, 总用时1秒,查找元素时间也小于1秒,说明webdriver在能找到元素时无需等待。再来查找一个页面中不存在的元素
import datetime
from selenium import webdriver
from selenium.webdriver import DesiredCapabilities
dcap = dict(DesiredCapabilities.PHANTOMJS)
user_agent = "Mozilla/5.0WindowsNT6.1WOW64AppleWebKit/535.8KHTML,likeGeckoBeamrise/17.2.0.9Chrome/17.0.939.0Safari/535.8"
dcap["phantomjs.page.settings.userAgent"] = user_agent
driver = webdriver.PhantomJS(desired_capabilities=dcap)
start_time = datetime.datetime.now()
print 'start_time: ', start_time
driver.implicitly_wait(10)
driver.get('https://www.baidu.com')
t = datetime.datetime.now()
try:
driver.find_element_by_id("su").click()
except Exception, e:
print e
ts = datetime.datetime.now()
try:
driver.find_element_by_id("sudf").click()
except Exception, e:
print e
end_time = datetime.datetime.now()
print "Sds", (t - start_time).seconds
print "time", (end_time - start_time).seconds
print "s", (ts - start_time).seconds
driver.quit()
测试结果:
E:\usr\Anaconda2\python.exe C:/Users/Administrator/Desktop/ershouche/wait.py
start_time: 2017-06-24 04:18:43.785000
Message: {"errorMessage":"Unable to find element with id 'sudf'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"85","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:57585","User-Agent":"Python http auth"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"id\", \"sessionId\": \"27957130-5851-11e7-a4d7-9740be247d0a\", \"value\": \"sudf\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/27957130-5851-11e7-a4d7-9740be247d0a/element"}}
Screenshot: available via screen
Sds 0
time 10
s 0
Process finished with exit code 0
这个结果是打开网页用时小于1秒, 第一次查找元素耗时小于1秒, 第二次查找没找到,则webdriver等待了10秒中,从这四个例子中可以看到,WebDriverWait()是设置间隔不断去找,找到就继续执行,找不到就抛出超时异常,implicitly_wait(timeout)先找一次,找不到了等待timeout的时间之后继续找,找到了就继续向下执行,找不到就抛出异常;在下面的程序里如果出现查找元素的情况规则同上步骤,因此他是对整个模块起作用的,不需要重写。而sleep就比较死板了,我设置睡眠多长时间,它就睡多长时间。
综合上面的例子可得,当页面加载不完全时适合使用implicitly_wait(timeout),当局部JS加载缓慢时我们可使用WebDriverWait(),我不建议用sleep来等待页面或JS的加载。
如有疑问请加qq群:526855734