4_bs4解析

宋建国

已于 2022-03-09 17:36:51 修改

阅读量191

点赞数

分类专栏：自用参考文章标签： python 开发语言

于 2022-03-09 08:27:18 首次发布

本文链接：https://blog.csdn.net/hot7732788/article/details/123367882

版权

自用参考专栏收录该内容

14 篇文章 0 订阅

订阅专栏

# 安装
# pip install bs4 -i 清华

# 1. 拿到页面源代码
# 2. 使用bs4进行解析. 拿到数据
import requests
from bs4 import BeautifulSoup
import csv

url = "http://www.xinfadi.com.cn/marketanalysis/0/list/1.shtml"
resp = requests.get(url)

f = open("菜价.csv", mode="w")
csvwriter = csv.writer(f)

# 解析数据
# 1. 把页面源代码交给BeautifulSoup进行处理, 生成bs对象
page = BeautifulSoup(resp.text, "html.parser")  # 指定html解析器
# 2. 从bs对象中查找数据
# find(标签, 属性=值)
# find_all(标签, 属性=值)
# table = page.find("table", class_="hq_table")  # class是python的关键字
table = page.find("table", attrs={"class": "hq_table"})  # 和上一行是一个意思. 此时可以避免class
# 拿到所有数据行
trs = table.find_all("tr")[1:]
for tr in trs:  # 每一行
    tds = tr.find_all("td")  # 拿到每行中的所有td
    name = tds[0].text  # .text 表示拿到被标签标记的内容
    low = tds[1].text  # .text 表示拿到被标签标记的内容
    avg = tds[2].text  # .text 表示拿到被标签标记的内容
    high = tds[3].text  # .text 表示拿到被标签标记的内容
    gui = tds[4].text  # .text 表示拿到被标签标记的内容
    kind = tds[5].text  # .text 表示拿到被标签标记的内容
    date = tds[6].text  # .text 表示拿到被标签标记的内容
    csvwriter.writerow([name, low, avg, high, gui, kind, date])

f.close()
print("over1!!!!")


#############################################################################
# 1.拿到主页面的源代码. 然后提取到子页面的链接地址, href
# 2.通过href拿到子页面的内容. 从子页面中找到图片的下载地址 img -> src
# 3.下载图片
import requests
from bs4 import BeautifulSoup
import time

url = "https://www.umei.cc/bizhitupian/weimeibizhi/"
resp = requests.get(url)
resp.encoding = 'utf-8'  # 处理乱码

# print(resp.text)
# 把源代码交给bs
main_page = BeautifulSoup(resp.text, "html.parser")
alist = main_page.find("div", class_="TypeList").find_all("a")

for a in alist:
    href = a.get('href')  # 直接通过get就可以拿到属性的值
    # 拿到子页面的源代码
    href = "https://www.umei.cc" + href
    child_page_resp = requests.get(href)

    child_page_resp.encoding = 'utf-8'
    child_page_text = child_page_resp.text
    # 从子页面中拿到图片的下载路径
    child_page = BeautifulSoup(child_page_text, "html.parser")
    p = child_page.find("p", align="center")
    img = p.find("img")
    src = img.get("src")
    # 下载图片
    img_resp = requests.get(src) #将获得到的所有字节写入文件就出来图片了，因为图片就是字节组成的
    # img_resp.content  # 这里拿到的是字节
    img_name = src.split("/")[-1]  # 拿到url中的最后一个/以后的内容
    with open("img1/"+img_name, mode="wb") as f:
        f.write(img_resp.content)  # 图片内容写入文件

    print("over!!!", img_name)
    time.sleep(1)

resp.close()
print("all over!!!")

宋建国

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
4_bs4解析

# 安装# pip install bs4 -i 清华# 1. 拿到页面源代码# 2. 使用bs4进行解析. 拿到数据import requestsfrom bs4 import BeautifulSoupimport csvurl = "http://www.xinfadi.com.cn/marketanalysis/0/list/1.shtml"resp = requests.get(url)f = open("菜价.csv", mode="w")csvwriter = csv
复制链接

扫一扫

专栏目录