python网络爬虫之每日股票交易信息

最新推荐文章于 2022-06-23 00:11:46 发布

Lin_Zhenyu

最新推荐文章于 2022-06-23 00:11:46 发布

阅读量360

点赞数

分类专栏：网络爬虫股票文章标签： python

本文链接：https://blog.csdn.net/Lin_Zhenyu/article/details/118928616

版权

网络爬虫同时被 2 个专栏收录

2 篇文章 1 订阅

订阅专栏

股票

1 篇文章 0 订阅

订阅专栏

该代码示例展示了如何利用Pyppeteer库在Python中实现反反爬虫策略，通过设置用户代理和修改navigator.webdriver属性来避免被目标网站识别为爬虫。然后，它从指定的股票列表页面抓取股票代码，并进一步获取每个股票的详细信息，如名称和价格。整个过程在非隐藏的Chromium浏览器中进行。

摘要由CSDN通过智能技术生成

import re
import asyncio
from bs4 import BeautifulSoup
import pyppeteer as pyp
async def antiAntiCrawler(page):   #为page添加反反爬虫手段
    await page.setUserAgent('Mozilla/5.0 (Windows NT 6.1; \Win64; x64)')
    await page.evaluateOnNewDocument('()=>{Object.defineProperties(navigator,\{webdriver:{get:()=>false}})}')
async def getStockCodes(page):
    codes=[]
    html=await page.content()
    soup=BeautifulSoup(html,"html.parser")
    for x in soup.find_all("li"):
        a=x.find("a")
        if ("(" in a.text and ")" in a.text):
            codes.append(a.text)
    return codes
async def getStockInfo(url):
    broser = await pyp.launch(headless=False, executablePath='D:\chromium\chrome-win\chrome.exe')   #启动Chromium，非隐藏启动，路径根据自己情况而定
    page = await broser.newPage() #在浏览器中打开一个新页面
    await antiAntiCrawler(page)
    await page.goto(url)   #装入url对应的网页
    codes=await getStockCodes(page)
    for x in codes[:3]:
        print("-----",x)
        pos1,pos2=x.index("("),x.index(")")
        code=x[pos1+1:pos2]
        url="http://quote.eastmoney.com/sh"+code+".html"
        await page.goto(url)
        html=await page.content()
        pt=r'<td>([^<]*)</td>.*?<td[^>]*id="gt\d*?"[^>]*>([^<]*)</td>'
        for x in re.findall(pt,html,re.DOTALL):
            print(x[0],x[1])
    await broser.close()
url="https://www.banban.cn/gupiao/list_sh.html"
loop=asyncio.get_event_loop()
loop.run_until_complete(getStockInfo(url))

Lin_Zhenyu

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
python网络爬虫之每日股票交易信息

import reimport asynciofrom bs4 import BeautifulSoupimport pyppeteer as pypasync def antiAntiCrawler(page): #为page添加反反爬虫手段 await page.setUserAgent('Mozilla/5.0 (Windows NT 6.1; \Win64; x64)') await page.evaluateOnNewDocument('()=>{Object..
复制链接

扫一扫