Python最简单的网页抓取

1. Install 2 python packages:

$ sudo pip install requests
$ sudo easy_install beautifulsoup4


2. Creat test.py

#coding=utf-8
import requests
from bs4 import BeautifulSoup
#get url
def get_html( url):
response = requests.get(url)
response.encoding = 'utf-8'
return response.text
#get title
def get_title( html):
soup = BeautifulSoup(html, 'html.parser')
soup.select( 'p')[ 0].get_text()
title_content = soup.select( 'title')[ 0].get_text()
return title_content

#get text
def print_p( html):
soup = BeautifulSoup(html, 'html.parser')
for p in soup.select( 'p'):
print p.get_text()
url = "http://www.cityu.edu.hk/"
html = get_html(url)
title_content = get_title(html)
print title_content
print_p(html)


3.Go to folder of test.py then execute

$ python test.py


4. Output



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值