python3爬虫的实例

最新推荐文章于 2024-07-24 16:36:32 发布

「已注销」

最新推荐文章于 2024-07-24 16:36:32 发布

阅读量170

点赞数

分类专栏：学习笔记

本文链接：https://blog.csdn.net/hujinlong6930/article/details/96641532

版权

学习笔记专栏收录该内容

9 篇文章 0 订阅

订阅专栏

import requests
from bs4 import BeautifulSoup

#从指定url下载网页
response = requests.get(url = 'https://www.autohome.com.cn/news/')

#下载成功后的信息是以字节形式进行存储，需要进行编码处理
response.encoding = response.apparent_encoding

#输出下载的网页信息
#print(response.text)

#将网页的信息存储为soup对象进行下一步处理，features是使用的处理引擎，默认的有html.parser,需要安装的是lxml
soup = BeautifulSoup(response.text,features='html.parser')

#寻找网页中指定id的块
target = soup.find(id = 'auto-channel-lazyload-article')

#print(target)

#查找li标签部分
#li_list = target.find('li') find只查找到第一条记录

li_list = target.find_all('li') #find_all 查询所有的li

#使用循环进行全部输出
for i in li_list:
    a = i.find('a')
    if a:#处理有的i中不含有a标签的问题
        print(a.attrs.get('href'))#得到a标签下的href
        txt = a.find('h3')#得到a标签下的h3标签
        print(txt)#将信息进行输出