Python 超级简单的网站html分析框架BeautifulSoup

Python 超级简单的网站爬取数据框架BeautifulSoup

案例

比如:我们要爬取这个 网站 的左侧栏所有的列表数据,如图所示
这里写图片描述

我们分析下这个网页的左侧栏的html结构,如图
这里写图片描述

结果发现就在id为leftcolumn下搜有的a标签,那么python代码该如何写了?

# coding: utf-8

import urllib2
from bs4 import BeautifulSoup


url_request = urllib2.urlopen('http://www.runoob.com/python/python-tutorial.html')
html_doc = url_request.read().decode('utf-8', 'ignore')

soup = BeautifulSoup(html_doc, 'html.parser')
# print(soup.prettify())

anchor_list = soup.find(id='leftcolumn').find_all('a')
for anchor in anchor_list:
    astring = "title: " + anchor.get('title') + ", href=http://www.runoob.com/" + anchor.get('href')
    print(astring)

输出的结果是:

title: Python 基础教程, href=http://www.runoob.com//python/python-tutorial.html
title: Python 简介, href=http://www.runoob.com//python/python-intro.html
title: Python 环境搭建, href=http://www.runoob.com//python/python-install.html
title: Python 中文编码, href=http://www.runoob.com/python-chinese-encoding.html
title: Python 基础语法, href=http://www.runoob.com//python/python-basic-syntax.html
title: Python 变量类型, href=http://www.runoob.com//python/python-variable-types.html
title: Python 运算符, href=http://www.runoob.com//python/python-operators.html
title: Python 条件语句, href=http://www.runoob.com//python/python-if-statement.html
title: Python 循环语句, href=http://www.runoob.com//python/python-loops.html
title: Python While 循环语句, href=http://www.runoob.com//python/python-while-loop.html
title: Python for 循环语句, href=http://www.runoob.com//python/python-for-loop.html
title: Python 循环嵌套, href=http://www.runoob.com//python/python-nested-loops.html
title: Python break 语句, href=http://www.runoob.com//python/python-break-statement.html
title: Python continue  语句, href=http://www.runoob.com//python/python-continue-statement.html
title: Python pass 语句, href=http://www.runoob.com//python/python-pass-statement.html
title: Python Number(数字), href=http://www.runoob.com//python/python-numbers.html
title: Python 字符串, href=http://www.runoob.com//python/python-strings.html
title: Python 列表(List), href=http://www.runoob.com//python/python-lists.html
title: Python 元组, href=http://www.runoob.com//python/python-tuples.html
title: Python 字典(Dictionary), href=http://www.runoob.com//python/python-dictionary.html
title: Python 日期和时间, href=http://www.runoob.com//python/python-date-time.html
title: Python 函数, href=http://www.runoob.com//python/python-functions.html
title: Python 模块, href=http://www.runoob.com//python/python-modules.html
title: Python 文件I/O, href=http://www.runoob.com//python/python-files-io.html
title: Python File 方法, href=http://www.runoob.com/file-methods.html
title: Python 异常处理, href=http://www.runoob.com//python/python-exceptions.html
title: Python OS 文件/目录方法, href=http://www.runoob.com/os-file-methods.html
title: Python 内置函数, href=http://www.runoob.com/python-built-in-functions.html
title: Python 面向对象, href=http://www.runoob.com//python/python-object.html
title: Python正则表达式, href=http://www.runoob.com//python/python-reg-expressions.html
title: Python CGI编程, href=http://www.runoob.com//python/python-cgi.html
title: python 操作MySQL数据库, href=http://www.runoob.com//python/python-mysql.html
title: Python 网络编程, href=http://www.runoob.com/python-socket.html
title: Python SMTP发送邮件, href=http://www.runoob.com//python/python-email.html
title: Python 多线程, href=http://www.runoob.com//python/python-multithreading.html
title: Python XML解析, href=http://www.runoob.com//python/python-xml.html
title: Python GUI 编程(Tkinter), href=http://www.runoob.com//python/python-gui-tkinter.html
title: Python2.x3​​.x版本区别, href=http://www.runoob.com//python/python-2x-3x.html
title: Python IDE, href=http://www.runoob.com//python/python-ide.html
title: Python JSON, href=http://www.runoob.com//python/python-json.html
title: Python 100例, href=http://www.runoob.com//python/python-100-examples.html

这里这是举一个简单的例子,想玩更多丰富的html分析,打开你的脑洞想象吧~!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值