爬虫目的说明:
此爬虫简单到不能再简单了,主要内容就是爬取豆瓣top250电影页面的内容,然后将该内容导入了数据库。下面先上结果图:
爬虫部分代码:
def getlist(listurl, result):
time.sleep(2)
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36'}
res = requests.get(listurl, headers=headers)
soup = BeautifulSoup(res.text, 'html.parser')
movielist = soup.select('.grid_view li')
for m in movielist:
rank = m.select('em')[0].text
if len(m.select('.title')) > 1:
english_name =