在这种情况下,我需要将网页的源代码保存为html文件。但是如果你看网页,有很多部分,我不需要它们,我只想保存文章本身的源代码。在
代码:from urllib.request import urlopen
page = urlopen('http://www.abcde.com')
page_content = page.read()
with open('page_content.html', 'wb') as f:
f.write(page_content)
我可以从我的代码中保存整个源代码,但是我怎么能只保存我想要的部分呢?在
解释一下:
^{pr2}$
我需要保存这个标签内的源代码,而不是提取标签中的句子。在
我想要的结果是这样保存:
Apple
The apple tree (Malus pumila, commonly and erroneously called Malus domestica) is a deciduous tree in the rose family best known for its sweet, pomaceous fruit, the apple.
It is cultivated worldwide as a fruit tree, and is the most widely grown species in the genus Malus.
Appe is red
Germanic paganism
Greek mythology
【Jane】
Credit : Wiki