selenium的使用:
利用浏览器打开网页:
driver = webdriver.PhantomJS()
driver.get('http://t.people.com.cn/indexV3.action')
selenium可以模拟浏览器中的任何操作
可以模拟登陆网页:
elem_user = driver.find_element_by_xpath('//*[@id="userName"]')
elem_user.send_keys('用户名')
elem_user = driver.find_element_by_xpath('//*[@id="password_text"]')
elem_user.send_keys('密码')
elem_sub = driver.find_element_by_xpath('/html/body/div[3]/div[2]/div[2]/div[2]/form/div[4]/input')
elem_sub.click()
elem_sub = driver.find_element_by_xpath('/html/body/div[2]/div/div[3]/a[2]')
elem_sub.click()
可以捕捉到特定的块
a = driver.find_elements_by_class_name('list_item')
可以模拟页面的滚动
driver.execute_script("window.scrollTo(0,document.body.scrollHeight)")#页面滚动到最低部
爬人民网微博的整体代码
# -*- coding: UTF-8 -*-
from selenium import webdriver
import time
driver = webdriver.PhantomJS()
driver.get('http://t.people.com.cn/indexV3.action')
elem_user = driver.find_element_by_xpath('//*[@id="userName"]')
elem_user.send_keys('')
elem_user = driver.find_element_by_xpath('//*[@id="password_text"]')
elem_user.send_keys('')
elem_sub = driver.find_element_by_xpath('/html/body/div[3]/div[2]/div[2]/div[2]/form/div[4]/input')
elem_sub.click()
elem_sub = driver.find_element_by_xpath('/html/body/div[2]/div/div[3]/a[2]')
elem_sub.click()
data = driver.page_source
print data
for k in range(1,3):
driver.maximize_window()
a = driver.find_elements_by_class_name('list_item')
l=[]
for i in a:
if i not in l:
l.append(i)
time.sleep(2)
driver.execute_script("window.scrollTo(0,document.body.scrollHeight)")
for m in l:
print m.text
driver.close()
元素选取
单个元素选取
- find_element_by_id
- find_element_by_name
- find_element_by_xpath
- find_element_by_link_text
- find_element_by_partial_link_text
- find_element_by_tag_name
- find_element_by_class_name
- find_element_by_css_selector
多个元素选取
- find_elements_by_name
- find_elements_by_xpath
- find_elements_by_link_text
- find_elements_by_partial_link_text
- find_elements_by_tag_name
- find_elements_by_class_name
- find_elements_by_css_selector
另外还可以利用 By 类来确定哪种选择方式
fromselenium.webdriver.common.byimportBy
driver.find_element(By.XPATH,'//button[text()="Some text"]')
driver.find_elements(By.XPATH,'//button')
|