1.抓取首页
def get_one_page(url):
try:
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36'
}
response =requests.get(url,headers=headers)
if response.status_code == 200:
return response.text
return None
except RequestException:
return None
2.正则提取
所以这里的正则表达式为:<div class="name">(.*?)</div>
使用迭代器
def parse_one_page(html):
pattern =re