【Python~分享】解析本地HTML文档,替换里面所有 img 标签的链接

注:实现解析本地HTML文档,将其中的网络图片下载到本地,并将其中的网络图片地址,改为本地地址

import requests
import os
from bs4 import BeautifulSoup

def getContent(url):
    try:
        r=requests.get(url,timeout=20)
        r.raise_for_status()
        return r.content
    except:
        return ""

def writeFile(path,content):
    if not os.path.exists(path):
        with open(path,"wb") as file:
            file.write(content)
    else:pass

def dealUrl(url,dir):
    str=url.split("/")
    img=str[-1]
    path=dir+img
    return path

def readUrl():
    file="D:\\pa_chong\\new.txt"
    if os.path.exists(file):
        with open(file) as fi:
            url_img=fi.read()
           # for img in url_img:
            print(url_img)

def findSrc(html):
    srcs=[]
    with open(html, 'r', encoding='utf-8') as f:
        fi=f.read()
        soup=BeautifulSoup(fi,"html.parser")
        imgs=soup.find_all("img")
        for img in imgs:
            if img["src"] not in srcs:
                srcs.append(img["src"])
        bgimg=soup.find("section",{"class":"section section_welcome"})
        srcs.append(bgimg["data-image-src"])

    with open("D:\\pa_chong\\new.txt","w") as f:
        for src in srcs:
            f.write(src)
            f.write("\n")

    return srcs

def changehtml(html,list):
    with open(html, 'r', encoding='utf-8') as f:
        fi = f.read()
        soup = BeautifulSoup(fi, "html.parser")
        imgs = soup.find_all("img")
        for img in imgs:
            img["src"]=img["src"].split("/")[-1]
        bgimg = soup.find("section", {"class": "section section_welcome"})
        bgimg["data-image-src"]=bgimg["data-image-src"].split("/")[-1]
        global fx
        fx = soup
    with open(html,"w",encoding="utf-8") as f:
        f.write(str(fx))

def main():
    html="D:\\pa_chong\\index.html"
    list=findSrc(html)#获取图片路径
    changehtml(html,list)
    dir="D:\\pa_chong\\"
    for item in list:
        url=item
        content=getContent(url)
        path=dealUrl(url,dir)
        writeFile(path,content)
    readUrl()

main()

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值