This is the sample xml document :
Everyday ItalianGiada De Laurentiis
2005
300.00
Harry PotterJ K. Rowling
2005
625.00
I want to extract the text without specifying the elements how can i do this , because i have 10 such documents. I want so because my problem is that user is entering some word which I don't know , it has to be searched in all of the 10 xml documents in their respective text portions. For this to happen I should know where the text lies without knowing about the element. One more thing that all these documents are different.
Please Help!!
解决方案
You could simply strip out any tags:
>>> import re
>>> txt = """
...
...
Everyday Italian... Giada De Laurentiis
... 2005
... 300.00
...
...
...
...
Harry Potter... J K. Rowling
... 2005
... 625.00
...
... """
>>> exp = re.compile(r'<.>')
>>> text_only = exp.sub('',txt).strip()
>>> text_only
'Everyday Italian\n Giada De Laurentiis\n 2005\n 300.00\n
\n\n \n Harry Potter\n J K. Rowling \n 2005\n 6
25.00'
But if you just want to search files for some text in Linux, you can use grep:
burhan@sandbox:~$ grep "Harry Potter" file.xml
Harry PotterIf you want to search in a file, use the grep command above, or open the file and search for it in Python:
>>> import re
>>> exp = re.compile(r'<.>')
>>> with open('file.xml') as f:
... lines = ''.join(line for line in f.readlines())
... text_only = exp.sub('',lines).strip()
...
>>> if 'Harry Potter' in text_only:
... print 'It exists'
... else:
... print 'It does not'
...
It exists