获取所有的列表

import urllib
import time
##读取指定的网址
url = []
page = 1
while page <= 11:
    url_con = urllib.urlopen('http://blog.sina.com.cn/s/articlelist_1193111400_0_'+str(page)+'.html').read()
    print 'con' ,url_con

    i = 0
    title = url_con.find(r'<a title=')

    print "title",title
    href = url_con.find(r'href=',title)
    print "href",href

    html = url_con.find(r'.html',href)
    print "html",html


    while title != -1 and href != -1 and html != -1 and i < 40:
        url.append(url_con[href+6:html+5])
        print page,url[i]
        title = url_con.find(r'<a title=',html)
        
        href = url_con.find(r'href=',title)
        
        html = url_con.find(r'.html',href)
        
        filename = url[-26:]

        i = i + 1
    else:
        print page, 'find end'
    page = page + 1
else:
    print 'all find end !'
j = 0
k = len(url)
print "url sum:",k
while j < k:
    content = urllib.urlopen(url[j]).read()
    filename = url[j][-26:]
    open(r'blog/'+ filename,'w').write(content)
    j = j + 1
    time.sleep(5)
View Code

 以上代码是获取所有博客文章列表,并读取其内容,并输出html

转载于:https://www.cnblogs.com/y15821933792/p/7797211.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值