python多线程爬取某瓣Top250电影信息存为txt(requests库,multiprocessing库,os库)

今天,忍不住了,找我同学借电脑来编程,一天不编程,全身难受。

代码如下:

from lxml import etree
import requests
import time
import re
from multiprocessing.dummy import Pool
import random
import os

"""
#encoding="utf-8"
#Author:Mr.Pan_学狂
#finish_time:2022/2/21 23:39
"""

url_ls = []
for n in range(0,226,25):
    url = 'https://movie.douban.com/top250?start={}&filter='.format(n)
    url_ls.append(url)
print(url_ls)
def spider(url):
    # url = "https://movie.douban.com/top250?start={}&filter=".format(0)
    headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36"
    }
    response = requests.get(url,headers=headers)
    response.encoding="utf-8"
    html = response.text
    # print(html)
    reg1 = '<span class="title">(.*?)</span>'
    movie_name = re.findall(reg1,html)
    reg2 = """<p class="">
                            (.*?)...<br>"""
    person = re.findall(reg2,html)
    person_ls = []
    for p in person:
        if '&nbsp;' in p:
            p = p.replace('&nbsp;','')
            person_ls.append(p)

    reg3 = """<br>
                            (.*?)
                        </p>"""
    movie_info = re.findall(reg3,html)
    info_ls = []
    for info in movie_info:
        if '&nbsp;' in info:
            f = info.replace('&nbsp;', '')
            info_ls.append(f)
    print(person_ls)
    print(info_ls)
    movie_name_ls = []
    for name in movie_name:
        if '&nbsp;/&nbsp' in name:
            continue
        else:
            movie_name_ls.append(name)
    print(movie_name_ls)

    if os.path.exists('E:/movie/'):
        length = len(movie_name_ls)
        for n in range(length):
            with open('E:/movie/movie_data.txt','a+',encoding="utf-8") as f:
                f.write(movie_name_ls[n]+"\n"+person_ls[n]+"\n"+info_ls[n]+"\n")
    else:
        os.mkdir('E:/movie/')
        length = len(movie_name_ls)
        for n in range(length):
            with open('E:/movie/movie_data.txt','a+',encoding="utf-8") as f:
                f.write(movie_name_ls[n]+"\n"+person_ls[n]+"\n"+info_ls[n]+"\n")

    # return person_ls,info_ls,movie_name_ls

if __name__ == '__main__':
    pool = Pool(2)#开启两个线程
    try:
        pool.map(spider, url_ls)  # 多线程爬取
    except Exception:
        pass

运行结果:
在E盘自动生成了文件夹和文件,文件内容如下:
在这里插入图片描述
我的电脑显示器秀逗了。而且,我编程瘾犯了。我找同学借电脑来编程序,顺便发篇文章,表示一下歉意。因为,买的新显示器寄的是顺丰,也需要两三天时间,我这几天之类可能不会发文。毕竟,我同学也不是不需要用到电脑。所以,大家见谅啊。

最后,感谢大家前来观看鄙人的文章,文中或有诸多不妥之处,还望指出和海涵。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

不羁_神话

感谢支持,欢迎交流。

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值