html2text is a Python script that converts a page of HTML into clean,
easy-to-read plain ASCII text. Better yet, that ASCII also happens to be
valid Markdown (a text-to-HTML format).
Usage: html2text [(filename|url) [encoding]]
Option
Description
--version
Show program’s version number and
exit
-h, --help
Show this help message and exit
--ignore-links
Don’t include any formatting for
links
--escape-all
Escape all special characters.
Output is less readable, but
avoids corner case formatting
issues.
--reference-links
Use reference links instead of
links to create markdown
--mark-code
Mark preformatted and code blocks
with [code]…[/code]
For a complete list of options see the
docs
Or you can use it from within Python:
>>> import html2text
>>>
>>> print(html2text.html2text("
Zed's dead baby, Zed's dead.
"))**Zed's** dead baby, _Zed's_ dead.
Or with some configuration options:
>>> import html2text
>>>
>>> h = html2text.HTML2Text()
>>> # Ignore converting links from HTML
>>> h.ignore_links = True
>>> print h.handle("
Hello, world!")
Hello, world!
>>> print(h.handle("
Hello, world!"))
Hello, world!
>>> # Don't Ignore links anymore, I like links
>>> h.ignore_links = False
>>> print(h.handle("
Hello, world!"))
Hello, [world](http://earth.google.com/)!
Originally written by Aaron Swartz. This code is distributed under the
GPLv3.
How to install
$ pip install html2text
How to run unit tests
PYTHONPATH=$PYTHONPATH:. coverage run --source=html2text setup.py test -v
To see the coverage results:
coverage combine
coverage html
then open the ./htmlcov/index.html file in your browser.
Documentation
Documentation lives
here