I'm using the lxml.html library to parse an HTML document.
I located a specific tag, that I call content_tag, and I want to change its content (i.e. the text between
How do I do that? I tried content_tag.text = 'Hello world!' but then it escapes all the html tags, replacing < with < etc.
I want to inject the text without escaping any HTML. How can I do that?
解决方案
This is one way:
#!/usr/bin/env python2.6
from lxml.html import fromstring, tostring
from lxml.html import builder as E
fragment = """\
div = fromstring(fragment)
print tostring(div)
#
#
#
div.replace(div.get_element_by_id('inner'), E.DIV('Hello ', E.B('world!')))
print tostring(div)
#
#
Edit: So, I should have confessed earlier that I'm not all that familiar with lxml. I looked at the docs and source briefly, but didn't find a clean solution. Perhaps, someone more familiar will stop by and set us both straight.
In the meantime, this seems to work, but is not well tested:
import lxml.html
content_tag = lxml.html.fromstring('
content_tag.text = '' # assumes only text to start
for elem in lxml.html.fragments_fromstring('Hello world!'):
if type(elem) == str: #but, only the first?
content_tag.text += elem
else:
content_tag.append(elem)
print lxml.html.tostring(content_tag)
Edit again: and this version removes text and children
somehtml = 'Hello world!'
# purge element contents
content_tag.text = ''
for child in content_tag.getchildren():
content_tag.remove(child)
fragments = lxml.html.fragments_fromstring(somehtml)
if type(fragments[0]) == str:
content_tag.text = fragments.pop(0)
content_tag.extend(fragments)