使用Python的BeautifulSoup爬取赶集网

最新推荐文章于 2020-12-23 10:49:12 发布

小明同学的程序笔记

最新推荐文章于 2020-12-23 10:49:12 发布

阅读量873

点赞数

分类专栏： python爬虫初体验

本文链接：https://blog.csdn.net/qq_38792076/article/details/80077043

版权

1、channel_extracing.py

from bs4 import BeautifulSoup
import requests


start_url = 'http://bj.ganji.com/wu/'
url_host = 'http://bj.ganji.com'

def get_index_url(url):
    # url = start_url
    wb_data = requests.get(url)
    soup = BeautifulSoup(wb_data.text, 'lxml')
    links = soup.select('.fenlei > dt > a')
    for link in links:
        page_url = url_host + link.get('href')
        #print(page_url)

#get_index_url(start_url)

channel_list = '''
http://bj.ganji.com/jiaju/
http://bj.ganji.com/rirongbaihuo/
http://bj.ganji.com/shouji/
http://bj.ganji.com/bangong/
http://bj.ganji.com/nongyongpin/
http://bj.ganji.com/jiadian/
http://bj.ganji.com/ershoubijibendiannao/
http://bj.ganji.com/ruanjiantushu/
http://bj.ganji.com/yingyouyunfu/
http://bj.ganji.com/diannao/
http://bj.ganji.com/xianzhilipin/
http://bj.ganji.com/fushixiaobaxuemao/
http://bj.ganji.com/meironghuazhuang/
http://bj.ganji.com/shuma/

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

小明同学的程序笔记

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
使用Python的BeautifulSoup爬取赶集网

1、channel_extracing.pyfrom bs4 import BeautifulSoupimport requestsstart_url = 'http://bj.ganji.com/wu/'url_host = 'http://bj.ganji.com'def get_index_url(url): # url = start_url wb_data ...
复制链接

扫一扫