BeautifulSoup

python 3.5.1
我装的最新的python3.6.1
python

urllib
from urllib.request import urlopen

beautifulSoup4
from bs4 import BeautifulSoup

安装BeautifulSoup4
linux:
sudo apt-get install python-bs4
mac:
sudo easy_install pip
pip install beautifulsoup4
windows:
pip install beautifulsoup4
pip3 install beautifulsoup4

3
3.1 urllib基本用法

urllib是python3.x中提供的一系列操作url的库,可以轻松的模拟用户使用浏览

器访问网页

模拟真实浏览器:
携带User-Agent头
req= request.Request(url)
req.add_header(key,value)
resp = reuqest.urlopen(req)
print(resp.read().decode(“utf-8”))

使用Post:
导入urllib库下面的parse
from urllib import parse
使用urlencode生成post数据
postData=parse.urlencode([
(key1,val1),
(key2,val2),
(keyn,valn)
])

使用postData发送post请求
request.urlopen(req,data=postData.encode(‘utf-8’))
得到请求状态
resp.status
得到服务器的类型
resp.reason

www.thsrc.com.tw/tw/TimeTable/SearchResult

3.3 beautifulsoup
https://www.crummy.com/software/BeautifulSoup/#Download

https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/#id4

beautiful.py

from bs4 import BeautifulSoup as bs
import  re
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""
soup = bs(html_doc,'html.parser')

#print(soup.prettify())

#print(soup.title)

# print(soup.find(id="link3").string)

# print(soup.find(id="link3").get_text())

# for link in soup.findAll("a"):
#     print(link.string)

# print(soup.find("p",{"class":"story"}).string) #这样是不行的
# print(soup.find("p",{"class":"story"}).get_text())

data = soup.findAll("a",href=re.compile("^http://example\.com"))
print(data)
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值