您可以简单地去除所有标签:
>>> import re
>>> txt = """
...
...
Everyday Italian... Giada De Laurentiis
... 2005
... 300.00
...
...
...
...
Harry Potter... J K. Rowling
... 2005
... 625.00
...
... """
>>> exp = re.compile(r'<.>')
>>> text_only = exp.sub('',txt).strip()
>>> text_only
'Everyday Italian
Giada De Laurentiis
2005
300.00
Harry Potter
J K. Rowling
2005
6
25.00'
但是,如果您只想在Linux中搜索某些文本的文件,则可以使用grep:
burhan@sandbox:~$grep "Harry Potter" file.xml
Harry Potter如果要搜索文件,请使用上面的grep命令,或打开文件并在Python中搜索它:
>>> import re
>>> exp = re.compile(r'<.>')
>>> with open('file.xml') as f:
... lines = ''.join(line for line in f.readlines())
... text_only = exp.sub('',lines).strip()
...
>>> if 'Harry Potter' in text_only:
... print 'It exists'
... else:
... print 'It does not'
...
It exists