Python网络爬虫实例

最新推荐文章于 2024-06-19 17:27:45 发布

iteye_10717

最新推荐文章于 2024-06-19 17:27:45 发布

阅读量202

点赞数

分类专栏： Python 文章标签： python 爬虫

本文链接：https://blog.csdn.net/iteye_10717/article/details/82600130

版权

Python 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

视频地址：

http://edu.51cto.com/lesson/id-12393.html

下载博客文章实例

源码：

import urllib
import time

#下载博客所有文章
i = 0
url = ['']*50
con = urllib.urlopen('http://blog.sina.com.cn/s/articlelist_3973495073_0_1.html').read()
title = con.find(r'<a title=')
href = con.find(r'href=', title)
html = con.find(r'.html', href)

while title != -1 and href != -1 and html != -1 and i < 50:
    url[i] = con[href + 6:html + 5]
    print url[i]
    title = con.find(r'<a title=', html)
    href = con.find(r'href=', title)
    html = con.find(r'.html', href)
    i = i + 1
else:
    print 'find end!'

j = 0
while j < 50:
    content = urllib.urlopen(url[j]).read()
    open(r'hanhan/'+url[j][-26:],'w+').write(content)
    print 'downloading', url[j]
    j = j + 1
    time.sleep(1)
else:
    print 'download articles finished!'

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

iteye_10717

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python网络爬虫实例

视频地址：http://edu.51cto.com/lesson/id-12393.html 下载博客文章实例源码：import urllibimport time#下载博客所有文章i = 0url = ['']*50con = urllib.urlopen('http://blog.sina.com.cn/s/articlelist_39734950...
复制链接

扫一扫