博客访问量v1.0 - 入门级爬虫，Python练手必备

最新推荐文章于 2024-08-26 13:41:43 发布

ambit_tsai-微信

最新推荐文章于 2024-08-26 13:41:43 发布

阅读量390

点赞数 2

分类专栏： Python大法文章标签：访问量阅读数刷提高提升

本文链接：https://blog.csdn.net/ambit_tsai/article/details/80840410

版权

Python大法专栏收录该内容

2 篇文章 0 订阅

订阅专栏

见一帖子问：写了几篇CSDN博文，发现阅读数没有提高，怎么办？
有人答：雇水军刷。
对于程序猿而言，雇水军显得“有辱身份”。刚好本屌正在研究Python，正所谓“实践出真知”，于是乎拿自己博客来练练手。
（PS：假的访问量并没有什么卵用）

思路

实现

urllib.request：用于发送HTTP请求
BeautifulSoup：用于从HTML或XML文件中提取数据

# -*-coding:utf-8-*-
"""
博客访问量
@version 1.0
@requires Python 3.6.4
@author 范围兄 <ambit_tsai@qq.com>
"""
from urllib import request
from bs4 import BeautifulSoup
from time import sleep

# 博客
BLOG = 'ambit_tsai'
# 爬取间隔
CRAWL_INTERVAL = 40

def access_article(soup):
    print('>>访问文章')
    tags = soup.select('#mainBox h4.text-truncate > a')
    for tag in tags:
        href = tag['href']
        print('*', href[-25:], tag.contents[2].strip())
        try:
            res = request.urlopen(href)
        except Exception as ex:
            print('!', ex)
            return

def crawl_blog(blog):
    print('>>爬取博客:', blog)
    url = 'https://blog.csdn.net/' + blog
    print('*', url)
    res = None
    try:
        res = request.urlopen(url)
    except Exception as ex:
        print('!', ex)
        return
    if res.status != 200:
        print('!', res.status, 'URL访问失败')
        return
    soup = BeautifulSoup(res.read().decode())
    access_article(soup)    # 访问列表页的文章

# 开始爬取
while 1:
    print('=========================')
    crawl_blog(BLOG)
    print('>>挂起', CRAWL_INTERVAL, '秒')
    sleep(CRAWL_INTERVAL)