python大列表,Python大型多列表高效查询

博客探讨了在处理CSV文件时如何通过使用`itertools.islice`和构建字典来提高效率。对比了`filter`、`ifilter`和字典的方法,结果显示在大量数据时,字典作为查找索引具有显著的性能优势,特别是在循环遍历更多排名时。
摘要由CSDN通过智能技术生成

您可以itertools.islice而不是读取所有行并使用itertools.ifilter:import csv

from itertools import islice,ifilter

MAINDIR = "../"

with open(MAINDIR + "atp_players.csv") as pf, open(MAINDIR + "atp_rankings_current.csv") as rf:

players = list(csv.reader(pf))

rankings = csv.reader(rf)

# only get first ten rows using islice

for i in islice(rankings, None, 10):

# ifilter won't create a list, gives values in the fly

player = next(ifilter(lambda x: x[0] == i[2], players),"")

不太确定filter(lambda x: x[0]==i[2],players)[0]在做什么,你似乎每次都在搜索整个玩家列表,只保留第一个元素。可能需要对列表进行一次排序,将第一个元素作为键,使用二等分搜索,或者构建一个dict,第一个元素作为键,行作为值,然后简单地进行查找。在

^{pr2}$

你使用什么样的默认值,或者如果需要的话,你必须决定。在

如果在每行的开头有重复的元素,但只想返回第一个出现的元素:with open(MAINDIR + "atp_players.csv") as pf, open(MAINDIR + "atp_rankings_current.csv") as rf:

players = {}

for row in csv.reader(pf):

key = row[0]

if key in players:

continue

players[key] = row

rankings = csv.reader(rf)

for i in islice(rankings, None, 10):

player = players.get(i[2])

输出:Djokovic(SRB),(R) Points: 11360

Federer(SUI),(R) Points: 9625

Nadal(ESP),(L) Points: 6585

Wawrinka(SUI),(R) Points: 5120

Nishikori(JPN),(R) Points: 5025

Murray(GBR),(R) Points: 4675

Berdych(CZE),(R) Points: 4600

Raonic(CAN),(R) Points: 4440

Cilic(CRO),(R) Points: 4150

Ferrer(ESP),(R) Points: 4045

十位玩家的代码计时显示ifilter是最快的,但当我们提高排名时,我们将看到dict获胜,以及您的代码扩展的程度:In [33]: %%timeit

MAINDIR = "tennis_atp-master/"

pf = open ("/tennis_atp-master/atp_players.csv") players = [p for p in csv.reader(pf)]

rf =open( "/tennis_atp-master/atp_rankings_current.csv")

rankings = [r for r in csv.reader(rf)]

for i in rankings[:10]:

player = filter(lambda x: x[0]==i[2],players)[0]

....:

10 loops, best of 3: 123 ms per loop

In [34]: %%timeit

with open("/tennis_atp-master/atp_players.csv") as pf, open( "/tennis_atp-master/atp_rankings_current.csv") as rf: players = list(csv.reader(pf))

rankings = csv.reader(rf) # only get first ten rows using islice

for i in islice(rankings, None, 10):

# ifilter won't create a list, gives values in the fly

player = next(ifilter(lambda x: x[0] == i[2], players),"")

....:

10 loops, best of 3: 43.6 ms per loop

In [35]: %%timeit

with open("/tennis_atp-master/atp_players.csv") as pf, open( "/tennis_atp-master/atp_rankings_current.csv") as rf:

players = {}

for row in csv.reader(pf):

key = row[0]

if key in players:

continue

players[row[0]] = row

rankings = csv.reader(rf)

for i in islice(rankings, None, 10):

player = players.get(i[2])

pass

....:

10 loops, best of 3: 50.7 ms per loop

现在有了100名球员,你会发现dict的速度和10名球员一样快。构建dict的成本已被不断的时间查找所抵消:In [38]: %%timeit

with open("/tennis_atp-master/atp_players.csv") as pf, open("/tennis_atp-master/atp_rankings_current.csv") as rf:

players = list(csv.reader(pf))

rankings = csv.reader(rf)

# only get first ten rows using islice

for i in islice(rankings, None, 100):

# ifilter won't create a list, gives values in the fly

player = next(ifilter(lambda x: x[0] == i[2], players),"")

....:

10 loops, best of 3: 120 ms per loop

In [39]: %%timeit

with open("/tennis_atp-master/atp_players.csv") as pf, open( "/tennis_atp-master/atp_rankings_current.csv") as rf:

players = {}

for row in csv.reader(pf):

key = row[0]

if key in players:

continue

players[row[0]] = row

rankings = csv.reader(rf)

for i in islice(rankings, None, 100):

player = players.get(i[2])

pass

....:

10 loops, best of 3: 50.7 ms per loop

In [40]: %%timeit

MAINDIR = "tennis_atp-master/"

pf = open ("/tennis_atp-master/atp_players.csv")

players = [p for p in csv.reader(pf)]

rf =open( "/tennis_atp-master/atp_rankings_current.csv")

rankings = [r for r in csv.reader(rf)]

for i in rankings[:100]:

player = filter(lambda x: x[0]==i[2],players)[0]

....:

1 loops, best of 3: 806 ms per loop

对于250名玩家:# your code

1 loops, best of 3: 1.86 s per loop

# dict

10 loops, best of 3: 50.7 ms per loop

# ifilter

10 loops, best of 3: 483 ms per loop

整个循环测试的排名:# your code

1 loops, best of 3: 2min 40s per loop

# dict

10 loops, best of 3: 67 ms per loop

# ifilter

1 loops, best of 3: 1min 3s per loop

因此,您可以看到,当我们循环使用更多的排名时,dict选项在运行时是最有效的,并且可以非常好地扩展。在

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值