目录
一. 用lxml.etree 解析string为html格式
1、如果是用webdriver获取的页面源码,直接将源码字符串解析成html, 用etree.HTML()
from selenium import webdriver
from lxml import etree
url = "https://appexchange.salesforce.com/appxStore"
browser = webdriver.Chorme()
browser.get(url)
page_code = browser.page_source
html_code = etree.HTML(page_code) # 将page_code 字符串解析成html
app_names_xpath = '//*[@id="appx-table-results"]/li[*]/a/span[2]/span[2]/span[1]/s