使用selenium，网页长时间加载不完的情况

最新推荐文章于 2024-08-06 00:03:32 发布

djshichaoren

最新推荐文章于 2024-08-06 00:03:32 发布

阅读量1.9w

点赞数 3

分类专栏：爬虫

本文链接：https://blog.csdn.net/djshichaoren/article/details/79062486

版权

爬虫专栏收录该内容

6 篇文章 0 订阅

订阅专栏

有一些网页的加载时间很长，新华网直接永远加载不完，但是需要爬取的信息已经加载出来了，如果这时候再等下去就是浪费时间。

将可能出现长时间加载不完的地方try except，

如果网页需要爬取得内容已经加载完毕，即使网页没有加载完成，except之后的driver也可以用

from selenium.common.exceptions import TimeoutException

为什么 from selenium.common.exceptions.TimeoutException as TimeoutException 无法点出TimeoutException

                try:
                    try:
                        driver.get("https://c3.zgdhhjha.com/scholar/")
                    except TimeoutException,e:
                        print type(e)
                        print 'time out in search page'
                    try:
                        driver.find_element_by_id('gs_hdr_tsi').send_keys(line[i])
                        driver.find_element_by_xpath('//*[@id="gs_hdr_tsb"]/span/span[1]').click()
                        a = driver.find_element_by_xpath('//*[@id="gs_res_ccl_mid"]/div/div[2]/div[3]/a[3]')

                    except TimeoutException, e:
                        print 'time out in crawl page'

                    with open('citaresult.txt','a') as wfile:
                        wfile.write(str(i+1)+','+a.text+'\n')

                except Exception, e:# 这里不用，可以判断是否是no such element异常
                    print 'Exception:', e
                    print 'num:', i, 'title:', line[i]
                    driver.close()
                    chromedriver = 'F:\chromedriver_win32\chromedriver.exe'  # chromedriver的路径
                    os.environ["webdriver.chrome.driver"] = chromedriver
                    driver = webdriver.Chrome(chromedriver)