我的第一个小爬虫

最新推荐文章于 2024-07-27 12:20:46 发布

roosterhpf

最新推荐文章于 2024-07-27 12:20:46 发布

阅读量570

点赞数

分类专栏： python 文章标签： python

本文链接：https://blog.csdn.net/dapeng0112/article/details/32335859

版权

python 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

import urllib


con=urllib.urlopen('http://blog.sina.com.cn/twocold').read()
tail=0


while con.find(r'class="blog_title"',tail)>0:
    title=con.find(r'class="blog_title"',tail)
    start=con.find(r'http://',title)
    tail=con.find(r'.html',start)
    url=con[start:tail+5]
    fist=url.find(r's/blog')
    filename=url[fist+2:]
    content=urllib.urlopen(url).read()
    openfile=open(filename,"w+")
    openfile.write(content)
    openfile.close()

else:
    print "it is end!"

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

roosterhpf

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
我的第一个小爬虫

import urllibcon=urllib.urlopen('http://blog.sina.com.cn/twocold').read()tail=0while con.find(r'class="blog_title"',tail)>0:title=con.find(r'class="blog_title"',tail)start=con.find(r
复制链接

扫一扫