多线程c语言爬虫,python 多线程爬虫

最新推荐文章于 2023-06-26 15:40:04 发布

小儿外科裴医生

最新推荐文章于 2023-06-26 15:40:04 发布

阅读量90

点赞数

文章标签：多线程c语言爬虫

这是一个使用Python的urllib2、re和threading库实现的批量下载网页中图片的脚本。脚本首先从配置的URL路径中获取HTML内容，然后通过正则表达式提取图片链接，并使用多线程进行下载。每个图片文件名根据其在列表中的位置和原始文件名进行重命名。整个过程展示了Python在网络爬虫和多线程下载方面的应用。

摘要由CSDN通过智能技术生成

# coding=utf-8

import urllib2 as request

import re

import os

import threading,time,random

####

config_url_paths = [

r'''http://image.baidu.com/''',

]

config_save_path = r'''D:\\video\\t\\web\\image_cool\\52\\'''

re_fliter_jpg_full_path = re.compile(r'src="(.+?\.jpg)"')

re_filter_jpg_name = re.compile(r'/([^/]+\.jpg)')

class jpg_downloader(threading.Thread):

def __init__(self, url, filename):

global cnt_threads,mutex

threading.Thread.__init__(self)

cnt_threads = cnt_threads + 1

savepath = config_save_path + filename;

self._url = url;

self._savepath = savepath;

self._id = cnt_threads;

print('cnt:'+str(self._id)+' url:'+url+' path:'+savepath+'\r\n');

def run(self):

# global count,mutex

# threadname = threading.currentThread.getName();

jpg = request.urlopen(self._url).read()

print(str(self._id) + 'download finish \r\n')

File = open(self._savepath,'wb')

File.write(jpg)

File.flush()

File.close()

print(str(self._id) + 'thread_end \r\n')

def get_html(url):

page = request.urlopen(url)

html = page.read()

return html

def getImg(html):

imglist = re.findall(re_fliter_jpg_full_path,html)

return imglist

def downloads(urls):

global cnt_threads,mutex

cnt = 0

threads = []

cnt_threads = 0;

mutex = threading.Lock()

for url in urls:

filename = re.search(re_filter_jpg_name,url).group(1)

filename = '%03d'%cnt + "-" + filename

threads.append(jpg_downloader(url,filename));

cnt = cnt + 1

for t in threads:

t.start()

for t in threads:

t.join()

print('join')

return

print('hello ready to start')

img_list = []

for url in config_url_paths:

html = get_html(url)

img_targets = getImg(html)

for img in img_targets:

img_list.append(img)

print(len(img_list))

downloads(img_list)

print("finish")

小儿外科裴医生

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。