Python爬虫小实践：使用BeautifulSoup+Request爬取CSDN博客的个人基本信息

最新推荐文章于 2024-08-03 14:18:54 发布

HW140701

最新推荐文章于 2024-08-03 14:18:54 发布

阅读量4.9k

点赞数

分类专栏： Python

本文为博主原创文章，未经博主允许不得转载，如需转载请先得到博主的同意，如需疑问，请联系stubbornhuang@qq.com，也可以加入计算机图形图像群526867211，以及访问我的个人站点：www.stubbornhuang.com，谢谢。

本文链接：https://blog.csdn.net/HW140701/article/details/55048364

版权

Python 专栏收录该内容

28 篇文章 3 订阅

订阅专栏

好久都没有动Python了，自从在网上买了《Python网络数据采集》这本书之后一直没有时间写自己的小的Demo,今天再网络上无意中看见

http://www.cnblogs.com/mfryf/p/3695844.html：

开发记录_自学Python写爬虫程序爬取csdn个人博客信息

这篇博客，自己想要不花一个小时复习下BeutifulSoup，然后实现与上面的那个博客一模一样的功能，其实自己以前就想写这样的一个东西，像上面博主一样，也只是想每天看一下博客的访问量有没有上涨，哈哈哈哈哈。

然后自己就分析了下网站的源码，动手写了下BeautifulSoup+Request类型的，刚刚过完年，脑子不够使。随意写了一下，也重用了自己以前写的一些代码，然后删删改改。

以下附上代码

#__author__ = 'Administrat
#coding=utf-8
import io
import os
import sys
import urllib
from urllib.request import  urlopen
from urllib  import request
from bs4 import BeautifulSoup
import re
import requests
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gb18030')
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}
req=request.Request("http://blog.csdn.net/hw140701",headers=headers)
html=urlopen(req)
bsObj=BeautifulSoup(html.read(),"html.parser")
Resultlist1=bsObj.find(id="blog_rank").findAll(name='li')
Resultlist2=bsObj.find(id="blog_statistics").findAll(name='li')
if None !=Resultlist1:
    for list1 in Resultlist1:
       print(list1.get_text())

if None !=Resultlist2:
    for list2 in Resultlist2:
       print(list2.get_text())

以下是运行的结果

以自己的博客作为实验例子。