python3.7爬电影下载链接_pyqt setsylesheet-CSDN博客

本文链接：https://blog.csdn.net/baidu_41628693/article/details/85260087

本项目主要是运用python语言编写程序爬取网页上电影的下载链接，并把该程序设置可视化界面，并把它转换成可执行文件exe。

总体分为两个py文件，一个是主要的负责爬虫的，命名为movie.py；另一个负责设计本程序的Ui界面，命名为movie_ui.py。

一、movie_ui.py

首先介绍控制ui设计的py文件。

整体完成的样子是这样的。

将结构分解就是这样

需要的库包:PyQt5里的QtCore、QtGui、QtWidgets

from PyQt5 import QtCore,QtGui,QtWidgets
from PyQt5.QtWidgets import *
from PyQt5.QtCore import *
from PyQt5.QtGui import *

创建一个类Ui_Window并创建一个方法setupUi用来放置里面的控件

class Ui_Window(object):
    def setupUi(self,MainWindow):
        #设定窗口大小
        MainWindow.resize(800,600)
        #禁止缩放
        MainWindow.setFixedSize(MainWindow.width(), MainWindow.height())
        #图标
        MainWindow.setWindowIcon(QIcon(':/icon.png'))

resize(宽度,高度)方法设定窗口的分辨率。

setFixedSize(宽度,高度)方法设定最大的分辨率，这里用前面设置的大小的值表示不允许缩放或是最大化。因为这里放置在里面的控件都是用绝对定位来放置的，所以缩放可能会导致错位或是不美观，所以禁止缩放。

setWindowIcon(QIcon('图片路径'))方法可以改变窗口的图标显示，这里图片的路径用':/'，后面会说明。

设置主窗口

        self.centralwidget=QWidget(MainWindow) #设置主窗口

设置logo

        #logo
        self.label_logo=QLabel(self.centralwidget)
        self.label_logo.setGeometry(QRect(270,20,80,80))  #设置位置
        self.label_logo.setStyleSheet('border-image:url(:/icon.png);') #加载图片

setGeometry(QRect(a,b,c,d))方法设置控件的位置和大小，a表示距离主窗口左面的距离，b表示距离主窗口上面的距离，c表示这个控件的宽度，d表示这个控件的高度。

setSyleSheet()方法设置该控件的样式，这里使用bord-image加载一张图片作为logo。

设置logo右边的文字

        #logo标题
        self.label_title=QLabel(self.centralwidget)
        self.label_title.setGeometry(QRect(370,20,200,80))
        font=QFont() 
        font.setFamily('华文彩云') #字体
        font.setPointSize(18) #大小
        font.setBold(True)  #加粗
        self.label_title.setFont(font) #设置字体

font=QFont()创建自定义字体，setFamily()改变字体，setPointSize()改变字体大小，setBold(True)加粗，setFont()把自定义字体加载到控件里。

设置logo下面的下划线

        #下划线
        self.line=QFrame(self.centralwidget)
        self.line.setGeometry(QRect(0,90,800,20))
        self.line.setFrameShape(QFrame.HLine) #水平线
        self.line.setFrameShadow(QFrame.Sunken) #设置阴影(面板下沉)

setFrameShape(QFrame.HLine)设置QFrame为水平线。

setFrameShadow(QFrame.Sunken)设置QFrame的阴影为下沉。

设置查询框左边的文字

        #查询文字
        self.search_title=QLabel(self.centralwidget)
        self.search_title.setGeometry(QRect(45,105,220,80))
        font=QFont()
        font.setFamily('仿宋')
        font.setPointSize(13)
        font.setBold(True)
        self.search_title.setFont(font)

设置查询框

        #查询框
        self.search_edit=QLineEdit(self.centralwidget) #一行编辑框
        self.search_edit.setGeometry(QRect(230,125,410,40))
        font=QFont()
        font.setFamily('仿宋')
        font.setPointSize(11)
        font.setBold(True)
        self.search_edit.setFont(font)

设置查询按钮

        #查询按钮
        self.search_button=QPushButton(self.centralwidget)
        self.search_button.setGeometry(QRect(660,125,80,40))
        font=QFont()
        font.setFamily('仿宋')
        font.setPointSize(13)
        font.setBold(True)
        self.search_button.setFont(font)

设置底部显示结果的框

        #结果
        self.groupBox=QGroupBox(self.centralwidget) #QGroupBox 有标题的组合框
        self.groupBox.setGeometry(QRect(20,180,760,400))
        self.result=QTextEdit(self.groupBox) #多行编辑框
        self.result.setReadOnly(True)
        self.result.setGeometry(QtCore.QRect(10,20,740,360))
        font=QFont()
        font.setFamily("仿宋")
        font.setPointSize(12)
        font.setBold(True)
        self.groupBox.setFont(font)
        self.result.setFont(font)

setReadOnly(True)设置成只读不能编辑

加载主窗口

        MainWindow.setCentralWidget(self.centralwidget)  #加载主窗口

设置控件的说明文字并在setupUi方法里面调用它

    def retranslateUi(self,MainWindow):
        MainWindow.setWindowTitle('Movie')
        self.label_title.setText('电影资源查询')
        self.search_title.setText('电影名或关键字')
        self.search_button.setText('查询')
        self.groupBox.setTitle('电影链接')

        self.retranslateUi(MainWindow) #加载说明函数

至此ui设定完毕。

二、movie.py

接着介绍控制爬虫的py文件。

我们要爬的是bd-film.cc上面的电影下载链接。

需要的库包:requests、re、sys、selenium。还有把movie_ui.py和pyqt5的库包也导入

#爬虫
import requests
import re
import sys
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
#Ui
from movie_ui import Ui_Window
from PyQt5 import QtCore,QtGui,QtWidgets
from PyQt5.QtWidgets import *
from PyQt5.QtCore import *
from PyQt5.QtGui import *

创建一个类mywindow并继承QMainWindow和Ui_Window

class mywindow(QMainWindow,Ui_Window):
    def __init__(self):
        super(mywindow,self).__init__() #初始化父类
        self.setupUi(self) #建立Ui
        self.search_button.clicked.connect(self.result_data) #按钮连接函数

clicked.connect(方法)为按钮设置调用方法

建立result_data方法获取下载链接

def result_data(self):
        text=self.search_edit.text()  #获取输入值
        all_result=0
        page=1 #页数
        self.result.setText('') #清空输出框
        while True:
            url='https://www.bd-film.cc/search_'+str(page)+'.jspx?q='+text #网址
            html=requests.get(url)     #请求网页
            html.encoding='utf-8'      #修改编码格式
            film_html=re.findall('<a href="(.*?)" title=.*'+text+'.*"',html.text)  #使用re表达式获取网页代码
            if len(film_html)<1:
                self.result.setText('无匹配信息') 
                break #没资源就退出
            
            #去重
            film_html_qc=[]
            for f_h in film_html:
                if f_h not in film_html_qc:
                    film_html_qc.append(f_h)

首先在bd-film.cc中输入憨豆特工查找。

然后观察电影的搜索网址——http://www.bd-film.cc/search_1.jspx?q=憨豆特工，可以发现http://www.bd-film.cc/search_、1、.jspx?q=、憨豆特工，四部分组成。所以显然可以设定两个变量page——页数，text——获取输入框的值。

用requests.get()方法来请求网址获得网址，再用re.findall()方法用正则表达式获取搜索页的所有网址结果。如没结果则直接退出循环。如有结果，需进行去重操作，不然会返回的列表出现有重复的元素。

加载无界面的谷歌浏览器

            #加载无界面谷歌浏览器
            chrome_options=Options()
            chrome_options.add_argument('--headless')
            chrome_options.add_argument('--disable-gpu')
            driver=webdriver.Chrome(options=chrome_options)

由于有下载链接的网页是动态页面，直接用requests不能获取完整的代码，所以这里运用加载谷歌浏览器的插件chromedriver来加载网页，并规定了无界面的协议，提高运行的速度。

爬取下载链接

            count=0
            for fh in film_html_qc:
                if count==len(film_html_qc): #查找完退出循环
                    break
                driver.get(fh) #加载网页
                count+=1
                #time.sleep(5)
                html2=driver.page_source #得到页面的源码
                
                
                #种子链接
                magnet_url=re.findall('"magnet(.*?)"',html2)
                magnet_url=list(set(magnet_url)) #去重
                #迅雷链接
                thunder_url=re.findall('"thunder(.*?)"',html2)
                thunder_url=list(set(thunder_url)) #去重
                #ed2k链接
                ed2k_url=re.findall('"ed2k(.*?)"',html2)
                ed2k_url=list(set(ed2k_url)) #去重
                #百度网盘链接
                pan_url=re.findall('pan.baidu.com(.*?)"',html2)
                pan_password=re.findall('</a><span>(.*?)</span>',html2) #密码

用driver.get()方法加载网页，用page_source获得源码。再用正则表达式找出分别magnet、ed2k、thunder、百度云链接并去重。

输出下载链接

                result=0
                #输出标题
                if magnet_url or thunder_url or ed2k_url or pan_url or pan_password:
                    result+=1
                    self.result.append(driver.title)
                #输出种子链接
                if magnet_url:
                    zy_count=1
                    for mu in magnet_url:
                        self.result.append('种子链接'+str(zy_count)+'：magent'+mu)
                        zy_count+=1
                    
                #输出迅雷链接
                if thunder_url:
                    zy_count=1
                    for tu in thunder_url:
                        self.result.append('迅雷链接'+str(zy_count)+'：thunder'+tu)
                        zy_count+=1
                    
                #输出ed2k链接
                if ed2k_url:
                    zy_count=1
                    for eu in ed2k_url:
                        self.result.append('ed2k链接'+str(zy_count)+'：ed2k'+eu)
                        zy_count+=1
                    
                #输出百度网盘链接
                if pan_url:
                    self.result.append('百度网盘链接'+'：pan.baidu.com'+pan_url[0]+' 密码：'+pan_password[0])
                all_result+=result #计算每页的数量
                self.result.append('\n')

用append()来输出那些下载链接到ui里的textedit，并记录每一页搜索到的数量。前面有写每点一次按钮都会用setText('')清空textedit的文字。

判断

            driver.quit() #浏览器退出
            #如果少于24个结果则退出查找，否则页数加1
            if len(film_html_qc)<24:
                break
            else:
                page+=1
        self.result.append('一共找到个'+str(all_result)+'结果') #计算总数量

每一页爬完，浏览器退出。判断之前的用正则表达式搜索到网址的数量，因为一页有24个网址，如果小于24个，则就不用到下一页继续爬，不然page加一，继续爬。爬完输出总数量。

主函数

if __name__=='__main__':
    #创建窗口
    app=QApplication(sys.argv) 
    win=mywindow()
    win.show()
    sys.exit(app.exec())

到这里就可以实现用可视化界面爬取内容。

三、可执行文件

如果想把py文件变成可执行文件即exe文件，就得运用到pyinstaller库包，安装了这个库包后，在命令行里跳转到py文件的目录输入pyinstaller -F 文件名.py即可把py文件变成一个可执行文件exe。

在py文件的目录会生成dist文件夹，里面的exe就是生成好的程序。（可以把其他生成的文件或文件夹都删除）

四、图片的处理——qrc

打开可执行文件时，会发现图片丢失，这是因为图片并没有打包进可执行文件里。所以这里需要将图片转换base64编码存储到py文件，并把它import进去。

创建一个images.qrc文件

<!DOCTYPE RCC>

<RCC version="1.0">
<qresource>

<file alias="icon.png">icon.png</file>

</qresource>

</RCC>

alias后写图片的路径，<file></file>里写转换后的名字。然后在命令行里输入pyrcc5 -o images.py images.qrc

pyrcc5是pyqt5自带的功能，可以将qrc文件转成py文件。

转换成在movie_ui.py里import images.py。并且ui设置里面图片路径都要改成将./改成:/。

至此，所有操作都完成了。

运行结果：