chrome浏览器插件开发-实践-爬取页面图片资源

FortheOne

已于 2024-04-10 10:41:55 修改

阅读量404

点赞数 7

文章标签： chrome 前端

于 2024-03-13 17:42:14 首次发布

本文链接：https://blog.csdn.net/FortheOne/article/details/136684888

版权

功能描述

通过插件实现一键获取当前页面的所有图片资源，将所有图片展示到插件界面，方便查看和下载图片。

开发思路和方案

html的图片资源有两种，一种是img标签，另一种是页面元素的背景图片。插入到页面的内容脚本可以访问document，通过document.getElementsByTagName('img')获取所有img标签，拿到所有img图片的地址；通过document.querySelectorAll('*')获取所有元素，然后使用window.getComputedStyle方法获取节点计算后的CSS样式，用.getPropertyValue('background-image')获取背景图属性的值，最后用正则提取背景图属性值中的图片地址。具体代码如下

function getTagImgs(doc) {
  const htmlCol = [...doc.getElementsByTagName('img')]
  return htmlCol.filter(img => img.currentSrc).map(img => {
    return img.currentSrc
  })
}

function getBgImgs(doc) {
  // 定义一个正则表达式来匹配CSS中的background-image属性值。
  // 该正则可以捕获格式为url("path/to/image")中的路径字符串。
  const srcChecker = /url\(\s*?['"]?\s*?(\S+?)\s*?["']?\s*?\)/i

  // 返回从文档中查询到的所有DOM元素的结果。
  return Array.from(
    Array.from(doc.querySelectorAll('*'))
      .reduce((collection, node) => {
        // 使用window.getComputedStyle方法获取节点的所有计算后的CSS样式。
        let prop = window.getComputedStyle(node, null)
          .getPropertyValue('background-image')
        let match = srcChecker.exec(prop)
        if (match) {
          collection.add(match[1])
        }
        return collection
      }, new Set())
  )
}

内容脚本获取到所有图片地址后，可以通过官方API chrome.runtime.sendMessage，然后在popup页面通过chrome.runtime.onMessage.addListener接收到所有图片地址。测试可行。

优化

问题：嵌套的iframe内的图片资源无法获取

解决方案：通过document.getElementsByTagName('iframe')获取所有iframe，遍历所有iframe通过contentWindow.document获取当前iframe的document（必须和父级是同源的才能访问iframe的document），然后递归处理所有iframe。

function findImages() {
  let srcList = []
  const loopIframe = doc => {
    const imgList = getTagImgs(doc)
    const imgBgList = getBgImgs(doc)
    srcList = [...new Set([...srcList, ...imgList, ...imgBgList])]

    const iframeCollection = doc.getElementsByTagName('iframe')
    if (iframeCollection?.length > 0) {
      const list = [...iframeCollection]
      list.forEach(item => {
        const curDoc = item?.contentWindow?.document
        if (curDoc) {
          loopIframe(curDoc)
        }
      })
    }
  }

  loopIframe(document)
  return srcList
}

再优化

问题：伪元素无法直接获取，无法拿到伪元素的背景图

解决方案：通过window.getComputedStyle(node, '::after')、window.getComputedStyle(node, '::before')获取每个元素的伪元素样式，通过getPropertyValue('background-image')获取伪元素样式的属性值。

function getBgImgs(doc) {
  // 定义一个正则表达式来匹配CSS中的background-image属性值。
  // 该正则可以捕获格式为url("path/to/image")中的路径字符串。
  const srcChecker = /url\(\s*?['"]?\s*?(\S+?)\s*?["']?\s*?\)/i

  // 返回从文档中查询到的所有DOM元素的结果。
  return Array.from(
    Array.from(doc.querySelectorAll('*'))
      .reduce((collection, node) => {
        // 使用window.getComputedStyle方法获取节点的所有计算后的CSS样式。
        const prop = window.getComputedStyle(node, null)
          .getPropertyValue('background-image')
          const match = srcChecker.exec(prop)
        if (match) {
          collection.add(match[1])
        }
        const afterProp = window.getComputedStyle(node, '::after')
          .getPropertyValue('background-image')
          console.log('afterProp', afterProp)
        const afterMatch = srcChecker.exec(afterProp)
        if (afterMatch) {
          collection.add(afterMatch[1])
        }
        const beforeProp = window.getComputedStyle(node, '::before')
          .getPropertyValue('background-image')
        const beforeMatch = srcChecker.exec(beforeProp)
        if (beforeMatch) {
          collection.add(beforeMatch[1])
        }
        return collection
      }, new Set())
  )
}

再再优化

可以通过PerformanceObserver监控所有资源的加载，然后从所有资源中过滤出需要的图片资源。

FortheOne

关注

7
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
chrome浏览器插件开发-实践-爬取页面图片资源

插入到页面的内容脚本可以访问document，通过document.getElementsByTagName('img')获取所有img标签，拿到所有img图片的地址；通过document.querySelectorAll('*')获取所有元素，然后使用window.getComputedStyle方法获取节点计算后的CSS样式，用.getPropertyValue('background-image')获取背景图属性的值，最后用正则提取背景图属性值中的图片地址。先到这吧，下次再更新使用。
复制链接

扫一扫