[Python 爬虫]爬取论坛、博客个人访问量等内容

最新推荐文章于 2023-06-07 17:43:23 发布

Abby_QI

最新推荐文章于 2023-06-07 17:43:23 发布

阅读量511

点赞数 1

分类专栏： QQbot 文章标签： python

本文链接：https://blog.csdn.net/qq_34916678/article/details/106958093

版权

QQbot 专栏收录该内容

8 篇文章 2 订阅

订阅专栏

本文以CSDN论坛为例：

源码：

from selenium import webdriver
from lxml import html
from PIL import Image
from selenium.webdriver.support.select import Select
import requests
import re
import urllib
import time
import cv2
import pytesseract
import socket
import numpy as np

def MidString(content,startStr,endStr):                     #
    startIndex = content.index(startStr)                    #
    if startIndex>=0:                                       #
        startIndex += len(startStr)                         #
        endIndex = content.index(endStr)                    #
        return content[startIndex:endIndex]                 #

def html():
    #伪装成浏览器访问，直接访问的话csdn会拒绝
    user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
    headers = {'User-Agent':user_agent}
    Ques_Url = 'https://blog.csdn.net/qq_34916678'
    r = requests.get(Ques_Url,headers=headers)
    a = r.text
    pattern = 'style="min-width:58px" title=".*">'
    Res_1 = re.search(pattern,a)
    Res_2 = MidString(str(Res_1),"title=\"","\">")
    print(Res_2)
if __name__ == "__main__":
    #SATRT()
    html()

程序仅作学习交流用途，请勿用于其他非法、商业等营利性用途！

Abby_QI

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
[Python 爬虫]爬取论坛、博客个人访问量等内容

源码：from selenium import webdriverfrom lxml import htmlfrom PIL import Imagefrom selenium.webdriver.support.select import Selectimport requestsimport reimport urllibimport timeimport cv2import pytesseractimport socketimport numpy as npdef Mi.
复制链接

扫一扫