RoboBrowser: Your friendly neighborhood web scraper
RoboBrowser is a simple, Pythonic library for browsing the web without a
standalone web browser. RoboBrowser can fetch a page, click on links and
buttons, and fill out and submit forms. If you need to interact with web
services that don't have APIs, RoboBrowser can help.
import re
from robobrowser import RoboBrowser
# Browse to Genius
browser = RoboBrowser(history=True)
browser.open('http://genius.com/')
# Search for Porcupine Tree
form = browser.get_form(action='/search')
form #
form['q'].value = 'porcupine tree'
browser.submit_form(form)
# Look up the first song
songs = browser.select('.song_link')
browser.follow_link(songs[0])
lyrics = browser.select('.lyrics')
lyrics[0].text # \nHear the sound of music ...
# Back to results page
browser.back()
# Look up my favorite song
song_link = browser.get_link('trains')
browser.follow_link(song_link)
# Can also search HTML using regex patterns
lyrics = browser.find(class_=re.compile(r'\blyrics\b'))
lyrics.text # \nTrain set and match spied under the blind...
RoboBrowser combines the best of two excellent Python libraries:
Requests
and
BeautifulSoup
. RoboBrowser
represents browser sessions using Requests and HTML responses using
BeautifulSoup, transparently exposing methods of both libraries:
import re
from robobrowser import RoboBrowser
browser = RoboBrowser(user_agent='a python robot')
browser.open('https://github.com/')
# Inspect the browser session
browser.session.cookies['_gh_sess'] # BAh7Bzo...
browser.session.headers['User-Agent'] # a python robot
# Search the parsed HTML
browser.select('div.teaser-icon') # [
,# ...
browser.find(class_=re.compile(r'column', re.I)) #
#