今天,忍不住了,找我同学借电脑来编程,一天不编程,全身难受。
代码如下:
from lxml import etree
import requests
import time
import re
from multiprocessing.dummy import Pool
import random
import os
"""
#encoding="utf-8"
#Author:Mr.Pan_学狂
#finish_time:2022/2/21 23:39
"""
url_ls = []
for n in range(0,226,25):
url = 'https://movie.douban.com/top250?start={}&filter='.format(n)
url_ls.append(url)
print(url_ls)
def spider(url):
# url = "https://movie.douban.com/top250?start={}&filter=".format(0)
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36"
}
response = requests.get(url,headers=headers)
response.encoding="utf-8"
html = response.text
# print(html)
reg1 = '<span class="title">(.*?)</span>'
movie_name = re.findall(reg1,html)
reg2 = """<p class="">
(.*?)...<br>"""
person = re.findall(reg2,html)
person_ls = []
for p in person:
if ' ' in p:
p = p.replace(' ','')
person_ls.append(p)
reg3 = """<br>
(.*?)
</p>"""
movie_info = re.findall(reg3,html)
info_ls = []
for info in movie_info:
if ' ' in info:
f = info.replace(' ', '')
info_ls.append(f)
print(person_ls)
print(info_ls)
movie_name_ls = []
for name in movie_name:
if ' / ' in name:
continue
else:
movie_name_ls.append(name)
print(movie_name_ls)
if os.path.exists('E:/movie/'):
length = len(movie_name_ls)
for n in range(length):
with open('E:/movie/movie_data.txt','a+',encoding="utf-8") as f:
f.write(movie_name_ls[n]+"\n"+person_ls[n]+"\n"+info_ls[n]+"\n")
else:
os.mkdir('E:/movie/')
length = len(movie_name_ls)
for n in range(length):
with open('E:/movie/movie_data.txt','a+',encoding="utf-8") as f:
f.write(movie_name_ls[n]+"\n"+person_ls[n]+"\n"+info_ls[n]+"\n")
# return person_ls,info_ls,movie_name_ls
if __name__ == '__main__':
pool = Pool(2)#开启两个线程
try:
pool.map(spider, url_ls) # 多线程爬取
except Exception:
pass
运行结果:
在E盘自动生成了文件夹和文件,文件内容如下:
我的电脑显示器秀逗了。而且,我编程瘾犯了。我找同学借电脑来编程序,顺便发篇文章,表示一下歉意。因为,买的新显示器寄的是顺丰,也需要两三天时间,我这几天之类可能不会发文。毕竟,我同学也不是不需要用到电脑。所以,大家见谅啊。
最后,感谢大家前来观看鄙人的文章,文中或有诸多不妥之处,还望指出和海涵。