python一键电影搜索与下载
概述
使用python搜索并爬取豆瓣电影信息,包括评分,主演,导演,类型,上映时间,电影简介等信息,然后再从电影天堂搜索并爬取电影下载链接.
准备工作
安装python3.6
略
安装requests库(用于请求静态页面)
pip install requests -i https://mirrors.ustc.edu.cn/pypi/web/simple
安装lxml库(用于解析html文件)
pip install lxml -i https://mirrors.ustc.edu.cn/pypi/web/simple
本教程爬取的电影信息来自豆瓣电影,下载链接来自电影天堂
https://movie.douban.com/j/subject_suggest?q=电影名称
http://s.ygdy8.com/plus/so.php?keytype=0&pagesize=10&searchtype=title&keyword=电影名称
页面分析
豆瓣电影搜索
豆瓣电影搜索的链接如下:
https://movie.douban.com/j/subject_suggest?q=电影名称
只需要一个参数q,它的值是utf-8编码的电影名称,比如我们要搜索 星际穿越 相关信息, 其中 %e6%98%9f%e9%99%85%e7%a9%bf%e8%b6%8a 是 星际穿越 的url格式的utf-8编码.:
https://movie.douban.com/j/subject_suggest?q=%e6%98%9f%e9%99%85%e7%a9%bf%e8%b6%8a
服务器返回的搜索结果是一个json文件 subject_suggest.json ,如下:
[
{
"episode" : "",
"id" : "1889243",
"img" : "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2206088801.jpg",
"sub_title" : "Interstellar",
"title" : "星际穿越",
"type" : "movie",
"url" : "https://movie.douban.com/subject/1889243/?suggest=%E6%98%9F%E9%99%85%E7%A9%BF%E8%B6%8A",
"year" : "2014"
},
{
"episode" : "",
"id" : "26263467",
"img" : "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2445481443.jpg",
"sub_title" : "The Science of Interstellar",
"title" : "《星际穿越》中的科学",
"type" : "movie",
"url" : "https://movie.douban.com/subject/26263467/?suggest=%E6%98%9F%E9%99%85%E7%A9%BF%E8%B6%8A",
"year" : "2014"
},
{
"episode" : "",
"id" : "26255844",
"img" : "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2519643575.jpg",
"sub_title" : "Interstellar: Nolan's Odyssey",
"title" : "星际穿越:诺兰的奥德赛",
"type" : "movie",
"url" : "https://movie.douban.com/subject/26255844/?suggest=%E6%98%9F%E9%99%85%E7%A9%BF%E8%B6%8A",
"year" : "2014"
}
]
共搜索到了3个与 星际穿越 相关的结果,其中我们需要关注的有:
key | 含义 |
---|---|
title | 标题 |
sub_title | 子标题(英文标题) |
url | 详情链接 |
我们需要再次打开搜索结果中对应的电影详情链接,获取电影的评分,导演,主演,类型,上映时间,简介,影评等信息.
比如我们打开搜索结果的第一项,结果如下:
我们打开它的源码看看(按F12打开调试):
可以看到其head中的一个标签 *** /html/head/script[@type=“application/ld+json”] *** 中存放的是一个json文件,这个json中就包含了我们需要的所有电影信息,提取出来如下:
{
"@context": "http://schema.org",
"name": "星际穿越 Interstellar",
"url": "/subject/1889243/",
"image": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2206088801.jpg",
"director":
[
{
"@type": "Person",
"url": "/celebrity/1054524/",
"name": "克里斯托弗·诺兰 Christopher Nolan"
}
]
,
"author":
[
{
"@type": "Person",
"url": "/celebrity/1275104/",
"name": "乔纳森·诺兰 Jonathan Nolan"
}
,
{
"@type": "Person",
"url": "/celebrity/1054524/",
"name": "克里斯托弗·诺兰 Christopher Nolan"
}
,
{
"@type": "Person",
"url": "/celebrity/1018568/",
"name": "基普·索恩 Kip Thorne"
}
]
,
"actor":
[
{
"@type": "Person",
"url": "/celebrity/1040511/",
"name": "马修·麦康纳 Matthew McConaughey"
}
,
{
"@type": "Person",
"url": "/celebrity/1048027/",
"name": "安妮·海瑟薇 Anne Hathaway"
}
,
{
"@type": "Person",
"url": "/celebrity/1000225/",
"name": "杰西卡·查斯坦 Jessica Chastain"
}
,
{
"@type": "Person",
"url": "/celebrity/1022593/",
"name": "卡西·阿弗莱克 Casey Affleck"
}
,
{
"@type": "Person",
"url": "/celebrity/1054509/",
"name": "迈克尔·凯恩 Michael Caine"
}
,