lxml.cssselect
lxml supports a number of interesting languages for tree traversal and element
selection. The most important is obviously XPath, but there is also
ObjectPath in the lxml.objectify module. The newest child of this family
is CSS selection, which is made available in form of the lxml.cssselect
module.
Although it started its life in lxml, cssselect is now an independent project.
It translates CSS selectors to XPath 1.0 expressions that can be used with
lxml's XPath engine. lxml.cssselect adds a few convenience shortcuts into
that package.
To install cssselect, run
pip install cssselect
lxml will then import and use it automatically.
The CSSSelector class
The most important class in the lxml.cssselect module is CSSSelector. It
provides the same interface as the XPath class, but accepts a CSS selector
expression as input:
>>>from lxml.cssselect import CSSSelector
>>>sel = CSSSelector('div.content')
>>>sel #doctest: +ELLIPSIS
>>>sel.css
'div.content'
The selector actually compiles to XPath, and you can see the
expression by inspecting the object:
>>>sel.path
"descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' content ')]"
To use the selector, simply call it with a document or element
object:
>>>from lxml.etree import fromstring
>>>h = fromstring('''
...
...text
...
>>>[e.get('id') for e in sel(h)]
['inner']
Using CSSSelector is equivalent to translating with cssselect
and using the XPath class:
>>>from cssselect import GenericTranslator
>>>from lxml.etree import XPath
>>>sel = XPath(GenericTranslator().css_to_xpath('div.content'))
CSSSelector takes a translator parameter to let you choose which
translator to use. It can be 'xml' (the default), 'xhtml', 'html'
or a Translator object.
The cssselect method
lxml Element objects have a cssselect convenience method.
>>>h.cssselect('div.content') == sel(h)
True
Note however that pre-compiling the expression with the CSSSelector or
XPath class can provide a substantial speedup.
The method also accepts a translator parameter. On HtmlElement
objects, the default is changed to 'html'.
Supported Selectors
Most Level 3 selectors are supported. The details are in the
cssselect documentation.
Namespaces
In CSS you can use namespace-prefix|element, similar to
namespace-prefix:element in an XPath expression. In fact, it maps
one-to-one, and the same rules are used to map namespace prefixes to
namespace URIs: the CSSSelector class accepts a dictionary as its
namespaces argument.