HTML解析库
(1)BeautifulSoup
from bs4 import BeautifulSoup
links = BeautifulSoup(html,'lxml').find_all("h4")
(2)lxml
from lxml import etree
links1 = etree.HTML(html).xpath("//h4")
links2 = etree.HTML(html).xpath("//ul[@class='livelist-mod']//li//p//text()")
(3)获取登录的excution值
import requests
from bs4 import BeautifulSoup
def get_execution():
res = requests.get(url = "xxx", verify=False)
execution = BeautifulSoup(res.text, "lxml").find(attrs={"name":"execution"})["value"]
return execution
(4)采用正则匹配的方式进行指定左右边界,(.*)中的内容会提取出来赋值给变量。
extract:
baidu: <titile>(.*)</title>