python爬虫实例

最新推荐文章于 2022-03-14 20:34:48 发布

波小澜

最新推荐文章于 2022-03-14 20:34:48 发布

阅读量357

点赞数

分类专栏： python 文章标签： python 爬虫实例

本文链接：https://blog.csdn.net/lyq2013/article/details/49281299

版权

python 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

本文用一个简单的例子说明如何用python进行爬虫。

python 2.7.5
Ubuntu 14.04

所需的python库

urllib：用来抓取和解析网页
re：处理正则表达式

代码块

下面的例子是用python爬虫获取某网页的图片，并保存到本地

import urllib
import re
import os

def getHtml(url):
    page = urllib.urlopen(url)
    html = page.read()
    return html

def getImg(html):
    reg = r'src="(.+?\.jpg)" pic_ext'
    imgre = re.compile(reg)
    imglist = re.findall(imgre, html)

    # save the pics to a new folder.
    os.mkdir('pics')
    local = os.getcwd() + '/pics/'
    x = 0

    for imgurl in imglist:
        urllib.urlretrieve(imgurl, local + '%s.jpg' % x)
        x += 1

html = getHtml("http://tieba.baidu.com/p/2460150866")

print getImg(html)