本文依然以虎扑为例,将爬取的数据保存到mysql数据库中:
首先,导入相应的库
import requests
from bs4 import BeautifulSoup
import time
import random
import MySQLdb
定义方法爬取数据
def get_information(page=0):
url = 'https://bbs.hupu.com/bxj-postdate-' + str(page+1)
headers={
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36",
"Referer": "https://bbs.hupu.com/bxj"
}
r = requests.get(url,headers=headers)
soup = BeautifulSoup(r.content.decode("utf-8"),"html.parser")
out = soup.find("ul",attrs={
"class":"for-list"}