python爬取58同城_python爬虫程序 58同城二手交易信息爬取

最新推荐文章于 2023-12-26 15:39:04 发布

头像收藏家

最新推荐文章于 2023-12-26 15:39:04 发布

阅读量721

点赞数

文章标签： python爬取58同城

本文链接：https://blog.csdn.net/weixin_33288893/article/details/113984274

版权

本脚本分为5部分：spider_main 主程序

url_manager url管理器

html_downloader 网页下载器

html_parser 网页解析器

html_outputer 网页解析器

spider_main程序源码import html_downloader

import html_outputer

import html_parser

import url_manager

class SpiderMain(object):

# 初始化

def __init__(self):

self.urls = url_manager.UrlManager()

self.downloader = html_downloader.HtmlDownloader()

self.parser = html_parser.HtmlParser()

self.outputer = html_outputer.HtmlOutputer()

def craw(self,start,end):

for i in range(start,end):

url = 'http://bj.58.com/pbdn/0/pn{}/'.format(i)

print('爬取第{}个列表页，网址是：{}'.format(i,url))

html_cont = self.downloader.download(url)

# 提取链接

new_urls = self.parser.parser_url(html_cont)

# 把提取待爬取的url放入url管

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

关注关注