用Python 做爬虫设计项目,使用webdriver.Chrome() 浏览/播放视频页面时报类似如下问题:
播放视频时长读取不正确,有时播放时长还为0 ,同时报错如下:
[28920:44660:1012/165605.013:ERROR:page_load_metrics_update_dispatcher.cc(165)] Invalid first_paint 0.581 s for first_image_paint 0.565 s
代码如下:
- driver = webdriver.Chrome(r'C:\MyWorkPlace\chromedriver_win32\chromedriver.exe')
- av = r'https://www.xuexi.cn/lgpage/detail/index.html?id=3824517766003391372&item_id=3824517766003391372' # 随便找个网页,举个栗子
- time.sleep(5)
- driver.get(av)
- try:
- element = driver.find_element(By.CSS_SELECTOR, "body")
- except:
- logTxtLine("element读取出错 in : %s" % av)
- actions = ActionChains(driver)
- actions.move_to_element(element)
- actions.perform()
- #try:
- # driver.find_element_by_css_selector('.outter').click() # 点击播放视频
- #except:
- # logTxtLine("点击播放出错 in : %s" % av)
- try:
- driver.find_element(By.CSS_SELECTOR, ".outter").click() # 点击播放视频
- except:
- logTxtLine("点击播放出错 in : %s" % av)
- try:
- # 获取视频时长
- video_duration_str = driver.find_element_by_xpath("//span[@class='duration']").get_attribute('innerText')
- except Exception as err:
- logTxtLine("视频信息读取出错 in %s:" % av + str(err))
分享一下调查过程,希望对大家有帮助:
1. 怀疑播放按键没有正确定位,尝试修改代码为: driver.find_element_by_css_selector('.outter').click() # 点击播放视频,发现问题没有解决;排除 !
2.怀疑视频文件没有加载完成,增加延时(30S保险点),可以解决视频时长不正确的问题;但是[28920:44660:1012/165605.013:ERROR:page_load_metrics_update_dispatcher.cc(165)] Invalid first_paint 0.581 s for first_image_paint 0.565 s 还是存在;
3. 怀疑page_load_metrics_update_dispatcher.cc 文件出错,但没有找到该文件的资料,也没耐心去分析,直接换Fixfox()问题解决。
代码如下:
driver = webdriver.Chrome(r'C:\MyWorkPlace\chromedriver_win32\chromedriver.exe')- driver = webdriver.Firefox()
- driver.maximize_window()
- av = r'https://www.xuexi.cn/lgpage/detail/index.html?id=3824517766003391372&item_id=3824517766003391372' # 随便找个网页,举个栗子
- time.sleep(5)
- driver.get(av)
- time.sleep(30)
- try:
- element = driver.find_element(By.CSS_SELECTOR, "body")
- except:
- logTxtLine("element读取出错 in : %s" % av)
- actions = ActionChains(driver)
- actions.move_to_element(element)
- actions.perform()
- #try:
- # driver.find_element_by_css_selector('.outter').click() # 点击播放视频
- #except:
- # logTxtLine("点击播放出错 in : %s" % av)
- try:
- driver.find_element(By.CSS_SELECTOR, ".outter").click() # 点击播放视频
- except:
- logTxtLine("点击播放出错 in : %s" % av)
- try:
- # 点完播放后,等待一下再读取视频时长
- time.sleep(30)
- # 获取视频时长
- video_duration_str = driver.find_element_by_xpath("//span[@class='duration']").get_attribute('innerText')
- except Exception as err:
- logTxtLine("视频信息读取出错 in %s:" % av + str(err))