爬虫5-BeautifulSoup模块简解

最新推荐文章于 2023-02-09 18:13:27 发布

chde2Wang

最新推荐文章于 2023-02-09 18:13:27 发布

阅读量130

点赞数

分类专栏：教学python 爬虫

本文链接：https://blog.csdn.net/weixin_38383877/article/details/116035976

版权

教学python 同时被 2 个专栏收录

17 篇文章 0 订阅

订阅专栏

爬虫

10 篇文章 0 订阅

订阅专栏

1、html标记语言了解

<html>
<meta http-equiv="Content-Type"content="text/html;charset=utf-8">
<h1>我的祖国</h1>
<h1 align="center">我的祖国</h1>
# h1 标签
# align  属性
# center 属性值
<标签 属性="属性值">被标记的内容</标签>
<img src="xxx.jpg"/>
<a href="http://www.baidu.com">百度</a>
</html>

2、BeautifulSoup模块介绍

# 1.拿到页面源代码
# 2.使用bs4进行解析 拿到数据
import requests
from bs4 import BeautifulSoup
import csv
url = "http://www.xinfadi.com.cn/marketanalysis/0/list/1.shtml"
resp = requests.get(url)

# # 解析数据
# # 1.把页面源代码交给BeautifulSoup进行处理 生成bs对象
# # page = BeautifulSoup(resp.text)
page = BeautifulSoup(resp.text, "html.parser")
# # 2.从bs对象中查找对象
# # find(标签名，属性=值)
# # find_all(标签名，属性=值)
table = page.find("table", class_="hq_table")  # class 是python中的关键字
# # table = page.find("table", attrs={"class": "hq_table"})  #等价于上一行 可以避免class
# print(table)
# 拿到所有数据行

trs = table.find_all("tr")
trs = table.find_all("tr")[1:]
f = open("菜价.csv", mode="w",encoding='utf-8')
csvwriter = csv.writer(f)
for tr in trs:
    tds = tr.find_all("td")  # 拿到每行的td
    print(tds)
    name = tds[0].text
    low = tds[1].text
    average = tds[2].text
    high = tds[3].text
    gui = tds[4].text
    kind = tds[5].text
    date = tds[5].text
    print(name, low, average, high, gui, kind, date)
    csvwriter.writerow([name, low, average, high, gui, kind, date])
f.close()
resp.close()