doubanzhaofang

Larsongo

于 2018-03-30 00:05:23 发布

阅读量122

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/lisheninasiainfo/article/details/79751291

版权

python 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

 # -*-coding:utf-8 -*-
      2 import urllib2
      3 import urllib
      4 import re
      5 import time
      6 import thread
      7 
      8 page = 113566835
      9 url = 'https://www.douban.com/group/topic/' + str(page)
     10 user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
     11 headers = {'User-Agent':user_agent}
     12 try:
     13     request = urllib2.Request(url,headers = headers)
     14     response = urllib2.urlopen(request)
     15     content = response.read().decode('utf-8')
     16     pattern = re.compile('<div.*?richtext">.*?<p>(.*?)</p><div.*?image-float-center">',re.S)
     17     items = re.findall(pattern,content)
     18     for item in items:
     19         replacePP = re.compile('</p><p>')
     20         info = re.sub(replacePP,"\n",item)
     21         print info
     22 except urllib2.URLError, e:
     23     if hasattr(e, "code"):
     24         print e.code
     25     if hasattr(e, "reason"):
     26         print e.reason

优惠劵

Larsongo

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
doubanzhaofang

\ # -*-coding:utf-8 -*- 2 import urllib2 3 import urllib 4 import re 5 import time 6 import thread 7 8 page = 113566835 9 url = 'https://www.douban.com...
复制链接

扫一扫