唐松编《python网络爬虫从入门到实践》第47页3.4.3自我实践题
全部代码为原创代码
完整代码如下:
import requests
import re
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36'
}
info_lists = []
def get_info(url):
res = requests.get(url,headers=headers)
filmnames = re.findall('<a href=".*?">.*?<span class="title">(.*?)</span>', res.text, re.S)
Englishnames = re.findall('<span class="title">.nbsp;/.nbsp;(.*?)</span>',res.text,re.S)
Hongkongnames= re.findall('<span class="other">.nbsp;/.nbsp;(.*?)</span>',res.text,re.S)
levels = re.findall('导演:(.*?).nbsp;.nbsp;.nbsp;主演:.*?',res.text,re.S)
comments= re.findall('<span class="inq">(.*?)</span>',res.text,re.S)
scores &