When using beautiful soup what is the difference between 'lxml' and "html.parser" and "html5lib"? When would you use one over the other and the benefits of each? from the times i used each they seem to be interchangeable but i do get corrected that i should be using a different one from people on here. Would like to strengthen my understanding of these. I have read a couple posts on here about this but they are not going over the uses much in any at all.
Example -
soup = BeautifulSoup(response.text, 'lxml')
解决方案
From the docs's summarized table of advantages and disadvantages:
html.parser - BeautifulSoup(markup, "html.parser")
Advantages: Batteries included, Decent speed, Lenient (as of Python 2.7.3 and 3.2.)
Disadvantages: Not very lenient (before Python 2.7.3 or 3.2.2)
lxml - BeautifulSoup(markup, "lxml")
Advantages: Very fast, Lenient
Disadvantages: External C dependency
html5lib - BeautifulSoup(markup, "html5lib")
Advantages: Extremely lenient, Parses pages the same way a web browser does, Creates valid HTML5
Disadvantages: Very slow, External Python dependency