I am new to python and have a question:
I have checked similar questions, checked the tutorial dive into python, checked the python documentation, googlebinging, similar Stack Overflow questions and a dozen other tutorials.
I have a section of python code that reads a text file containing 20 tweets. I am able to extract these 20 tweets using the following code:
with open ('output.txt') as fp:
for line in iter(fp.readline,''):
Tweets=json.loads(line)
data.append(Tweets.get('text'))
i=0
while i < len(data):
print data[i]
i=i+1
The above while loop iterates perfectly and prints out the 20 tweets (lines) from output.txt.
However, these 20 lines contain Non-English Character data like "Los ladillo a los dos, soy maaaala o maloooooooooooo", URLs like "http://t.co/57LdpK", the string "None" and Photos with a URL like so "Photo: http://t.co/kxpaaaaa(I have edited this for privacy)
I would like to purge the output of this (which is a list), and exclude the following:
The None entries
Anything beginning with the string "Photo:"
It would be a bonus also if I can exclude non-unicode data
I have tried the following bits of code
Using data.remove("None:") but I get the error list.remove(x): x not in list.
Reading the items I do not want into a set and then doing a comparison on the output but no luck.
Researching into list comprehensions, but wonder if I am looking at the right solution here.
I am from an Oracle background where there are functions to chop out any wanted/unwanted section of output, so really gone round in circles in the last 2 hours on this. Any help greatly appreciated!
解决方案
Try something like this:
def legit(string):
if (string.startswith("Photo:") or "None" in string):
return False
else:
return True
whatyouwant = [x for x in data if legit(x)]
I'm not sure if this will work out of the box for your data, but you get the idea. If you're not familiar, [x for x in data if legit(x)] is called a list comprehension