Let's say I have a file a.txt which contains html encoded HTML, sth like this:
HTML preview...
</script></body></html>
In PHP I can do:
$content = file_get_contents('a.txt');
$start = strpos ($content, '
') + 6;$end = strpos ($content, '');
$html = html_entity_decode(substr($content, $start, $end-$start));
file_put_contents('b.html');
And it works perfectly. The file b.html becomes a proper HTML.
My question is: How can I do that in Python, assuming the file and encoded content is in UTF-8?
Edit: I experimented a bit with HTMLParser and BeautifulStoneSoup, but they corrupt UTF-8 encoding. I experimented with UnicodeDammit, but returning string to console or file brings an exception that chars are out of range.
Edit 2: Please answer with a code examples which work in the similar manner.
Solution 1