各个版本下载地址查看需要翻墙:https://sites.google.com/chromium.org/driver/downloads
直接下载地址(不需要翻墙),可根据版本修改对应的url:https://chromedriver.storage.googleapis.com/96.0.4664.45/chromedriver_linux64.zip
xpath常见语法:其中
selenium中取元素对应的方法为:
browser.find_elements(By.XPATH,'//*//ul[@id="page1zw"]//div[@class="brief"]/a[@href]')
selenium 元素中取属性对应的方法为:
element.get_attribute('href')
xpath 中路径定位:
/从根节点选取
//从根节点下任意子节点或子节点的子节点选取(嵌套)
.选取当前节点
..选取当前节点的父节点
@用来进行元素对应的属性【id,class】过滤
url="http://news.cctv.com/2021/12/13/ARTI4LnM7gQBDbGMQ1SEQM1v211213.shtml?spm=C94212.PnPr887gR6ub.EJaHnJ2d9CJb.5"
#你需要爬取的网页
browser = webdriver.Chrome(executable_path='../data/chromedriver/chromedriver.exe')
browser.get(url)
wait = WebDriverWait(browser, 5)
wait.until(
lambda driver: driver.find_elements(By.XPATH,'//*//ul[@id="page1zw"]//div[@class="brief"]/a[@href]'))
result1 = browser.find_elements(By.XPATH,'//*//ul[@id="page1zw"]//div[@class="brief"]/a[@href]')
current_content =
for each in result1:
print(each.get_attribute('href'))