python 网络爬虫代码

最新推荐文章于 2024-04-27 16:19:17 发布

VIP文章开水

最新推荐文章于 2024-04-27 16:19:17 发布

阅读量1.9w

点赞数

分类专栏： Python 文章标签：网络爬虫 python url input download class

本文链接：https://blog.csdn.net/cashey1991/article/details/6262704

版权

爬虫是封装在WebCrawler类中的，Test.py调用爬虫的craw函数达到下载网页的功能。

运用的算法：广度遍历

关于网络爬虫的详细信息请参考百度百科

Test.py
-------------------------------------------------------------------------

# -*- coding: cp936 -*-
import WebCrawler

url = raw_input('设置入口url(例-->http://www.baidu.com): \n')
thNumber = int(raw_input('设置线程数:'))    #之前类型未转换出bug
Maxdepth = int(raw_input('最大搜索深度：'))

wc = WebCrawler.WebCrawler(thNumber, Maxdepth)
wc.Craw(url)

WebCrawler.py
-------------------------------------------------------------------------

# -*- coding: cp936 -*-
import threading
import GetUrl
import urllib

g_mutex = threading.Lock()
g_pages = []      #线程下载页面后，将页面内容添加到这个list中
g_dledUrl = []    #所有下载过的url
g_toDlUrl = []    #当

最低0.47元/天解锁文章

开水

关注

0
点赞
踩
19

收藏

觉得还不错? 一键收藏
7
评论
python 网络爬虫代码

爬虫是封装在WebCrawler类中的，Test.py调用爬虫的craw函数达到下载网页的功能。运用的算法：广度遍历关于网络爬虫的详细信息请参考百度百科 Test.py-------------------------------------------------------------------------# -*- coding: cp936 -*-impor
复制链接

扫一扫