使用Python3获取360影视首页上电影的名称，年份，评价，播放链接并保存为txt文本_1、获取网页源代码对应的代码 2、提取电影片名和图片网址对应代码 3、输出电影-CSDN博客

本文链接：https://blog.csdn.net/qq_37400312/article/details/60336668

首先，我们需要下载python3.0以上的版本以及requests和BeautifulSoup这两个第三方包

第三方包下载连接：

requests-2.5.0.tar.gz 链接：http://download.csdn.net/download/ls1160/8242547

beautifulsoup4-4.5.3.tar.gz 链接：链接：http://download.csdn.net/download/qq_37400312/9770777

第三方包安装方法：

将requests-2.13.0.tar.gz压缩包解压后，打开cmd，跳转至文件路径，输入python setup.py install按回车进行安装

（beautifulsoup4-4.5.3.tar.gz方法相同）

然后，我们转入正题：

1.获取网页源代码

import requests
html = requests.get('http://www.360kan.com/dianying/index.html')
print(html.text)

我们可以编译一下，若获取成功则会输出http://www.360kan.com/dianying/index.html网页上的代码

2.使用BeautifulSoup解析网页

from bs4 import BeautifulSoup
soup = BeautifulSoup(html.text,'html.parser')

3.获取全部电影名称

for news in soup.select('.w-newfigure'):
    if len(news.select('.s1')) > 0
        title = news.select('.s1')[0].text
        print (title)

4.获取一个年份

year = soup.select('.w-newfigure span')[0].text
print (year)

5.获取一个电影名称

name = soup.select('.w-newfigure span')[1].text
print (name)

6.获取一个评分

name = soup.select('.w-newfigure span')[2].text
print (name)

7.获取全部评分

for news in soup.select('.w-newfigure'):
    if len(news.select('.s2')) > 0:#无评分的直接跳过
        comment = news.select('span')[2].text
        print (comment)

8.获取全部电影年份、名称和评分

for news in soup.select('.w-newfigure'):
    if len(news.select('.s2')) > 0:#
        year = news.select('span')[0].text#无年份的直接去掉
        name = news.select('span')[1].text#无名称的直接去掉
        comment = news.select('span')[2].text#无评分的直接去掉
        print (year,name,comment)

9.获取全部影片介绍页面

for news in soup.select('.w-newfigure'):
    if awoidUrl(news.select('a ')[0]['href']):
        url1 = news.select('a ')[0]['href']
        print(url1)

10.构造获取立即播放链接函数

def getUrl(url):
    html2 = requests.get('%s'%url)
    soup2 = BeautifulSoup(html2.text,"html.parser")
    if len(soup2.select(".top-list-btns a")[0]['href'])>0:
        url2 = soup2.select(".top-list-btns a")[0]['href']
        return url2

因为360影视首页的电影只给了影片介绍页面的链接，影片的播放链接在介绍页面内，所以我们要构造一个函数，再获取一次链接

11.构造避免http://v.360kan.com/网址和空网址函数

import re
def awoidUrl(url):
    if len(url) > 0:
        m = re.search('http://(.*).com',url)
        newurl = m.group(1)#1获取括号内的内容
        #print (newurl)
        if newurl == "www.360kan":
            return 1
        else:
            return 0
    else:
        return 0

因为有些网址获取不到或网址为会员才能观看，所以我们直接跳过
12.获取全部影片序号，电影名称，上映年份，评分，立即播放网址

i=0
for news in soup.select('.w-newfigure'):
    if awoidUrl(news.select('a ')[0]['href']):
        url1 = news.select('a ')[0]['href']
        try:
            url2 = getUrl(url1)
            year = news.select('span')[0].text#无年份的直接去掉
            name = news.select('span')[1].text#无名称的直接去掉
            comment = news.select('span')[2].text#无评分的直接去掉
            i=i+1
            print ("序号:%s\t电影名称:%s\t\t\t上映年份:%s\t评分:%s\n网址:%s"%(i,name,year,comment,url2))
        except:
            continue

13.程序完整代码，抓取360影视首页资料并保存至同一目录下的“保存内容.txt”中

#-*- coding: utf-8 -*-
import requests
html = requests.get('http://www.360kan.com/dianying/index.html')


from bs4 import BeautifulSoup
soup = BeautifulSoup(html.text,'html.parser')


def getUrl(url):
    html2 = requests.get('%s'%url)
    soup2 = BeautifulSoup(html2.text,"html.parser")
    if len(soup2.select(".top-list-btns a")[0]['href'])>0:
        url2 = soup2.select(".top-list-btns a")[0]['href']
        return url2

import re
def awoidUrl(url):
    if len(url) > 0:
        m = re.search('http://(.*).com',url)
        newurl = m.group(1)#1获取括号内的内容
        #print (newurl)
        if newurl == "www.360kan":
            return 1
        else:
            return 0
    else:
        return 0


import urllib.request
import urllib.parse
f = open("保存内容.txt",'wb')
i=0
for news in soup.select('.w-newfigure'):
    if awoidUrl(news.select('a ')[0]['href']):
        url1 = news.select('a ')[0]['href']
        try:
            url2 = getUrl(url1)
            year = news.select('span')[0].text#无年份的直接去掉
            name = news.select('span')[1].text#无名称的直接去掉
            comment = news.select('span')[2].text#无评分的直接去掉
            str = ("序号:%s\t电影名称:%s\t\t\t上映年份:%s\t评分:%s\n网址:%s\n\n"%(i,name,year,comment,url2))

            fo = open("保存内容.txt", "r+")
            fo.seek(0, 2)
            fo.write( str )
            
            i=i+1
            print ("%s保存完毕"%i)
        except:
            continue

fo.close()
print("over")

14.至此，我们可以通过上述代码成功抓取360影视首页上的电影资料，但由于有些影视的链接等信息有些特殊，所以本人选择了直接忽略，例如没有评分的电影就选择不进行抓取等，所以抓取结果不全，仅供学习参考，如文章有错误或有什么好的意见，欢迎提出，谢谢。