几种js渲染方式的demo

最新推荐文章于 2024-04-21 09:31:40 发布

doyus

最新推荐文章于 2024-04-21 09:31:40 发布

阅读量494

点赞数 1

分类专栏： python 文章标签： selenium python

本文链接：https://blog.csdn.net/weixin_42658996/article/details/108215461

版权

python 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

几种js渲染方式的demo

使用selenium
使用requests_html（基于pyppeteer）
使用splash
使用pyppeteer

使用selenium

import selenium.webdriver
import time
import selenium.webdriver

class Spider(object):
    def __init__(self):
        self.spidername="xxx_com"
        self.chromedriver_path = Chrome_path.chrome_path()
        self.url = 'http://www.xxx.com'
        # self.chromedriver_path = r"D:\cz_project\singleSpider\tool\chromedriver_win32\chromedriver.exe"
        options = selenium.webdriver.ChromeOptions()
        # options.add_experimental_option("excludeSwitches", ["enable-automation"])
        # options.add_experimental_option('useAutomationExtension', False)
        options.add_argument('window-size=1200x2000')  # 指定浏览器分辨率
        options.add_argument('--disable-gpu')  # 谷歌文档提到需要加上这个属性来规避bug
        # options.add_argument('--hide-scrollbars')  # 隐藏滚动条, 应对一些特殊页面
        # options.add_argument('blink-settings=imagesEnabled=false')  # 不加载图片, 提升速度
        # options.add_argument('--headless')
        self.browser = selenium.webdriver.Chrome(options=options, executable_path=self.chromedriver_path)
        script = 'Object.defineProperty(navigator, "webdriver", {get:()=>false,});'
        self.browser.execute_script(script)
        
    def get_html(self, url):
        self.browser.set_window_size(1200, 1000)
        self.brower.get(url)
        time.sleep(0.8)
        detail_html = self.brower.page_source
        print(detail_html)

使用requests_html（基于pyppeteer）

from requests_html import HTMLSession

class Spider():
    
    def __init__(self):
        self.session = HTMLSession()
        
    def get_html(self, url):
        obj = self.session.get(url)
        obj.encoding = 'utf-8'
        obj.html.render(sleep=0.1)
        print(obj.html.html)

使用splash

import requests
class Spider()
    def __init__(self):
        self.script =  """
        splash:go(args.url)
        splash:wait(1)
        return splash:html()
        """
        
    def get_html(self):
        resp = requests.post('http://xxx.xxx.xxx.xxx:8050//run', json={
            'lua_source': self.script,
            'url': 'https://www.yuncaitong.cn/publish/demand.shtml'
        })
        png_data = resp.text

使用pyppeteer

import asyncio
from pyppeteer import launch

class Spider():

    async def get_html(self, url):
        self.browser = await launch()
        page = await self.browser.newPage()
        await page.goto(url)
        content = await page.content()
        print(content)
        dimensions = await page.evaluate('''() => {
            return {
                width: document.documentElement.clientWidth,
                height: document.documentElement.clientHeight,
                deviceScaleFactor: window.devicePixelRatio,
            }
        }''')
        print(dimensions)
        await self.browser.close()


if __name__ == "__main__":
    spider = Spider()
    asyncio.get_event_loop().run_until_complete(spider.get_html("http://www.baidu.com"))

doyus

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
几种js渲染方式的demo

几种js渲染方式的demo使用selenium使用requests_html（基于pyppeteer）使用splash使用pyppeteer使用seleniumimport selenium.webdriverimport timeimport selenium.webdriverclass Spider(object): def __init__(self): self.spidername="xxx_com" self.chromedriver_pat
复制链接

扫一扫