url 爬虫 1

最新推荐文章于 2023-03-06 13:05:29 发布

APusasa

最新推荐文章于 2023-03-06 13:05:29 发布

阅读量278

点赞数

分类专栏： python的学习

本文链接：https://blog.csdn.net/APusasa/article/details/48829347

版权

python的学习专栏收录该内容

5 篇文章 0 订阅

订阅专栏

#encoding: utf-8
'''
模块功能：

日期：2015-09-25
版本：1.0
'''
import urllib
import re


#------------ 筛选方法 ---------------
def get_informtion(html_code,regular):
   #regular = '<h2>位置信息</h2>.*?</div>.*?<ul>.*?<li>.*?<strong>(.*?)</strong>.*?<span>(.*?)</span>'
   pattern = re.compile(regular,re.S)
   return re.findall(pattern,html_code)

#---------- 楼盘详细信息 -------
def houses_more_infor(url):
   global number
   html_code = get_html_code(url)
#   从楼盘首页得到户型图网址
   regular = '<li class="">.*?<a class="statis-log" data-statis="FCAS-14-011" href="(.*?)">户型</a>.*?</li>'
   for item  in get_informtion(html_code,regular):
      s = head+item
      print s
#    
def get_html_code():
    f = urlopen(url)
    return f.read()

(.*?)就是item值

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

APusasa

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
url 爬虫 1

#encoding: utf-8'''模块功能：日期：2015-09-25版本：1.0'''import urllibimport re#------------ 筛选方法 ---------------def get_informtion(html_code,regular): #regular = '位置信息.*?.*?.*?.*?(.*?).*?(.*?)'
复制链接

扫一扫