http://ai-depot.com/articles/the-easy-way-to-extract-useful-text-from-arbitrary-html/
This article shows you how to write a relatively simple script to extract text paragraphs from large chunks of HTML code, without knowing its structure or the tags used. It works on news articles and blogs pages with worthwhile text content, among others…
Do you want to find out how statistics and machine learning can save you time and effort mining text?