python遍历data、并输出结果_遍历python列表的最佳方法是什么,排除某些值并输出结果...

博主分享了在Python中处理文本数据时遇到的问题,如何从包含20条推文的文本文件中提取信息。尽管已经尝试过多种方法,如查找相似问题、查阅文档和使用列表推导式,但仍然无法有效过滤掉非英文字符、以'Photo:'开头的字符串、'None'值。提出了希望删除这些不需要的数据并排除非Unicode数据的需求。解决方案是使用一个名为`legit`的函数进行过滤,通过列表推导式实现。
摘要由CSDN通过智能技术生成

1586010002-jmsa.png

I am new to python and have a question:

I have checked similar questions, checked the tutorial dive into python, checked the python documentation, googlebinging, similar Stack Overflow questions and a dozen other tutorials.

I have a section of python code that reads a text file containing 20 tweets. I am able to extract these 20 tweets using the following code:

with open ('output.txt') as fp:

for line in iter(fp.readline,''):

Tweets=json.loads(line)

data.append(Tweets.get('text'))

i=0

while i < len(data):

print data[i]

i=i+1

The above while loop iterates perfectly and prints out the 20 tweets (lines) from output.txt.

However, these 20 lines contain Non-English Character data like "Los ladillo a los dos, soy maaaala o maloooooooooooo", URLs like "http://t.co/57LdpK", the string "None" and Photos with a URL like so "Photo: http://t.co/kxpaaaaa(I have edited this for privacy)

I would like to purge the output of this (which is a list), and exclude the following:

The None entries

Anything beginning with the string "Photo:"

It would be a bonus also if I can exclude non-unicode data

I have tried the following bits of code

Using data.remove("None:") but I get the error list.remove(x): x not in list.

Reading the items I do not want into a set and then doing a comparison on the output but no luck.

Researching into list comprehensions, but wonder if I am looking at the right solution here.

I am from an Oracle background where there are functions to chop out any wanted/unwanted section of output, so really gone round in circles in the last 2 hours on this. Any help greatly appreciated!

解决方案

Try something like this:

def legit(string):

if (string.startswith("Photo:") or "None" in string):

return False

else:

return True

whatyouwant = [x for x in data if legit(x)]

I'm not sure if this will work out of the box for your data, but you get the idea. If you're not familiar, [x for x in data if legit(x)] is called a list comprehension

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值