每天学一点之Python100例（27~28）

最新推荐文章于 2024-07-18 15:53:23 发布

MayCoding

最新推荐文章于 2024-07-18 15:53:23 发布

阅读量166

点赞数

文章标签： python

本文链接：https://blog.csdn.net/shishanqing920610/article/details/80813786

版权

  问题27：获取网页 

  分析：1.连接远程网页服务器；2.发送HTTP请求这个网页；3.从网页服务器的返回读取HTML代码 

  demoCode: 

  #! /usr/bin/python3 

  import urllib.request 

  def getHtml(self,url): 

  page = urllib.request.urlopen(url) 

  html = page.read() 

  return html 

  if __name__=='__main()__': 

  getHtml("http://www.baidu.com") 

  问题28：利用re模块爬取网页上的图片 

  分析:先通过url获取请求，然后通过正则表达式获取图片文件的格式，再通过re模块里的compile()和findall()方法获取图片的url，最后再通过urlretrieve方法获取出图片 

  demoCode: 

  #! /usr/bin/python3 

  import urllib.request 

  import re 

  class wormhtml(object): 

  def __init__(self): 

  pass 

  def getHtml(self,url): 

  page = urllib.request.urlopen(url) 

  html = page.read() 

  return html 

  worm_html = wormhtml() 

  class wormimage(object): 

  def __init__(self): 

  pass 

  def getimage(self): 

  html_other = worm_html.getHtml("http://tieba.baidu.com/p/3205263090") 

  reg = r'src="([.\S]*\.jpg)" pic_ext="jpeg"' 

  imagereg = re.compile(reg) 

  imgurls = re.findall(imagereg, str(html_other)) 

  x = 1 

  for imgurl in imgurls: 

  urllib.request.urlretrieve(imgurl, '/home/ssq/work/python/samplecode/img/%s.jpg' % x) 

  x +=1 

  wormimg = wormimage() 

  wormimg.getimage() 

关注