web界面展示(代码末尾下载)
爬取数据
登录豆瓣
爬取次数过多,豆瓣会限制访问,只能采取登录,获取cookie
#登录 获取cookie
def login():
data1 = {
'ck': '',
'name': '豆瓣账号',
'password': '密码',
'remember': 'false',
'ticket': ''
}
header = {
"Host": "accounts.douban.com",
"Origin": "https://accounts.douban.com",
"Referer": "https://accounts.douban.com/passport/login?redir=https%3A%2F%2Fwww.douban.com%2Fgroup%2Ftopic%2F110169492%2F",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:79.0) Gecko/20100101 Firefox/79.0",
"X-Requested-With": "XMLHttpRequest"
}
url_basic = 'https://accounts.douban.com/j/mobile/login/basic'
s = requests.session()
s.post(url_basic, headers=header, data=data1)
return s
s = login()
创建数据库连接
#创建数据库连接
def getDb():
return pymysql.connect(host="localhost",port=3306,user="root",password="123456",db="movie",charset="utf8")
爬取电影的类型
爬取电影的 类型并存入数据库中
#爬取电影类型
def getTags():
db = getDb()
url = "https://movie.douban.com/j/search_tags?type=movie&source=";
header1 = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36"
}
resp = s.get(url, headers=header1)
data = resp.json()
list = data["tags"]
cursor = db.cursor()
# 存入数据库中
for info in list:
# 使用cursor()方法创建一个游标对象
# 使用execute()方法执行SQL语句
cursor.execute("insert into movie_type(name) values(%s)",
info)
db.commit()
# 关闭游标和数据库的连接
cursor.close()
db.close()
爬取电影信息
需要传入电影的类型id,以及类型名
def getmoviesInfo(type_id,name):
print(