老年人第一次py爬虫玩半天Requests，BeautifulSoup4爬虫

最新推荐文章于 2021-01-26 17:29:01 发布

绳结

最新推荐文章于 2021-01-26 17:29:01 发布

阅读量165

点赞数

分类专栏：老年人第一次py爬虫

本文链接：https://blog.csdn.net/a450479378/article/details/105752391

版权

老年人第一次py爬虫专栏收录该内容

4 篇文章 0 订阅

订阅专栏

老年人第一次py爬虫

下载pycharm
下载Requests，BeautifulSoup4库
使用Requests例子
BeautifulSoup4基础使用
拓展

下载pycharm

略

下载Requests，BeautifulSoup4库

在这里插入图片描述
例：下载Requests（请求框架） BeautifulSoup4(格式化和检索html代码框架)同理

使用Requests例子

url 按教程使用 ‘https://bj.xiaozhu.com/’
那么简单的输出就是

import requests
header = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 '
                  '(KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36'
}
res = requests.get('http://bj.xiaozhu.com/', headers=header)
print(res.text)

其中header是怕一些网站防爬的浏览器请求头，以谷歌浏览器为例子获取请求头的方式
在这里插入图片描述
f12 -> network -> 网页请求 ->user-agent 就是

BeautifulSoup4基础使用

import requests
from bs4 import BeautifulSoup
# 请求头
header = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 '
                  '(KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36'
}
# 获取请求数据
res = requests.get('http://bj.xiaozhu.com/', headers=header)
# 使用BeautifulSoup4解析
soup = BeautifulSoup(res.text, 'html.parser')
# BeautifulSoup的find_all()函数 意味着找出所有class叫group_item的div标签下的内容
# print(soup.find_all('div', "group_item"))
# BeautifulSoup的select()函数能按标签顺序查找内容 能使用浏览器快速得到 所得的结果都是集合
prices = (soup.select('#page_list > ul > li > div.result_btm_con.lodgeunitname > div > span.result_price > i'))
# 遍历取得的集合
for price in prices:
	# get_text()能取得标签包括的数据 例如<li>nnn</li>使用后就打印nnn
    print(price.get_text())

另：
使用审查快速定位元素位置
在这里插入图片描述
另：使用copy selector快速获取’#page_list > ul > li:nth-child(1) > div.result_btm_con.lodgeunitname > div:nth-child(1) > span > i’
要注意就是nth-child只获取了列表的第一个要获取该列表所有的要去掉
另外要注意的是有个能获取的第一个元素是body,用BeautifulSoup(res.text）res.text会直接吧body内的取出进行格式化这时候第一个元素是body会报错。直接去掉就行
在这里插入图片描述

拓展

玩了一下午
遍历了https://bj.xiaozhu.com/ 第一页的数据取了id值
进入详情页https://bj.xiaozhu.com/fangzi/{id}.html爬了一些数据做练习，
这网站有滑动验证防恶意进攻如果报错或突然打印不出东西进网页弄一下滑动验证就行
在这里插入图片描述
就这样以后得空在看看怎么玩

绳结

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
老年人第一次py爬虫玩半天Requests，BeautifulSoup4爬虫

老年人第一次py爬虫下载pycharm下载Requests，BeautifulSoup4库使用Requests例子BeautifulSoup4基础使用拓展下载pycharm略下载Requests，BeautifulSoup4库例：下载Requests（请求框架） BeautifulSoup4(格式化和检索html代码框架)同理使用Requests例子url 按教程使用 ‘htt...
复制链接

扫一扫