python从文件中随机选取,在Python中从目录(包含大量文件)中选择一个随机文件...

I have a directory with a large number of files (~1mil). I need to choose a random file from this directory. Since there are so many files, os.listdir naturally takes an eternity to finish.

Is there a way I can circumvent this problem? Maybe somehow get to know the number of files in the directory (without listing it) and choose the 'n'th file where n is randomly generated?

The files in the directory are randomly named.

解决方案

Alas, I don't think there is a solution to your problem. One, I don't know of portable API that will return you the number of entries in directory (w/o enumerating them first). Two, I don't think there is API to return you directory entry by number and not by name.

So overall, a program will have to enumerate O(n) directory entries to get a single random one. The trivial approach of determining number of entries and then picking one will either require enough RAM to hold the full listing (os.listdir()) or will have to enumerate 2nd time the directory to find the random(n) item - overall n+n/2 operations on average.

There is slightly better approach - but only slightly - see randomly-selecting-lines-from-files. In short there is a way to pick random item from list/iterator with unknown length, while reading one item at a time and ensure that any item may be picked with equal probability. But this won't help with os.listdir() because it already returns list in memory that already contains all 1M+ entries - so you can as well ask it about len() ...

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值