If your document starts a declaration and never finishes it,
Beautiful Soup assumes the rest of your document is part of the
declaration. If the document ends in the middle of the declaration,
Beautiful Soup ignores the declaration totally. A couple examples:
如果你的文档开始了声明但却没有关闭,Beautiful Soup假定你的文档的剩余部分都是这个声明的一部分。
如果文档在声明的中间结束了,Beautiful Soup会忽略这个声明。如下面这个例子:
from BeautifulSoup import BeautifulSoup
BeautifulSoup("foo
# foo
soup = BeautifulSoup("foo")
print soup.prettify()
#
# foo
#
There are a couple ways to fix this; one is detailed here.
有几种方法来处理这种情况;其中一种在这里有详细介绍。
Beautiful Soup also ignores an entity reference that's not finished
by the end of the document:
Beautiful Soup 也会忽略实体引用,如果它没有在文档结束的时候关闭:
BeautifulSoup("<foo>")
# <foo
I've never seen this in real web pages, but it's probably out there
somewhere.
我从来没有在实际的网页中遇到这种情况,但是也许别的地方会出现。