python 处理 Html

from urllib.parse import urljoin
from Common.PSoup import *
class HtmlCommon:
  def handleHtmlString(self,htmlString,url,dic={}):
      psoup=PSoup()
      docBody=psoup.getPSoup(htmlString)
      bodyElement = docBody.find("body")

      # <editor-fold desc="填充字典参数的数据">

      str=""
      for item in dic.items():
          key,value = item;
          str=str+"<div id='"+key+"'>"+value+"</div>"

      if bodyElement!=None:
         bodyElement.append(str)
      else:
          htmlString = "<body>" + htmlString + "</body>"
          docBody = psoup.getPSoup(htmlString)
          bodyElement = docBody.find("body")
          bodyElement.append(str)

      htmlString = docBody.html()

      # </editor-fold>

      # <editor-fold desc="替换A标签和Img标签的路径">

      docA = psoup.getPSoup(htmlString)
      elesA = docA.find("a")
      for da in elesA.items():
          href=da.attr("href")
          if href!=None:
              nhref = urljoin(url, href)
              da.attr("href", nhref)



      htmlString = docA.html()
      docI = psoup.getPSoup(htmlString)
      elesI = docI.find("img")
      for ds in elesI.items():
          src=ds.attr("src")
          if src!=None:
              nsrc=urljoin(url,src)
              ds.attr("src",nsrc)




      htmlString=docI.html()

      # </editor-fold>


      return htmlString







评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值