Python中产生UUID函数uuid1与uuid4的选择

UUID:Universally Unique Identifier(通用唯一标识符),是INTERNET的一个规范。


UUID RFC: http://tools.ietf.org/html/rfc4122.html

这份rfc定义了UUID,并且规定5个版本的UUID产生方法,所有的语言如PYTHON,RUBY都是基于该rfc,一般版本2不实现。

UUID python: http://docs.python.org/library/uuid.html

Bug: # 828390


If all you want is a unique ID, you should probably call uuid1() or uuid4(). Note that uuid1() may compromise privacy since it createsa UUID containing the computer’s network address.uuid4() creates arandom UUID.


下面是stackoverflow上的一个问题,

http://stackoverflow.com/questions/1785503/when-should-i-use-uuid-uuid1-vs-uuid-uuid4-in-python

uuid1() is guaranteed to not produce any collisions. I wouldn't use it if it's important that there's no connection between theuuid and the computer.

uuid4() generates, as you said, a random UUID. The chance of a collision is really, really,really small. Small enough, that you shouldn't worry about it. The problem is, that a bad random-number generator makes it more likely to have collisions.

总的来说,uuid1将会包含一些计算机的信息(网卡地址),根据实验的结果,产生的UUID后面几段的数字会完全一样,而uuid4具有更大的匿名性。


有人提出了一个更深刻的问题:

http://stackoverflow.com/questions/703035/when-are-you-truly-forced-to-use-uuid-as-part-of-the-design/786541#786541

他觉得他看不到UUID的意义,如果需要一个不重复的随机数,有许多真正不重复的方法,那UUID还是可能重复(或者说冲突)


下面是为Ruby写UUID库的人回答的:

I wrote the UUID generator/parser for Ruby, so I consider myself to be reasonably well-informed on the subject. There are four major UUID versions:

Version 4 UUIDs are essentially just 16 bytes of randomness pulled from a cryptographically secure random number generator, with some bit-twiddling to identify the UUID version and variant. These are extremely unlikely to collide, but it could happen if a PRNG is used or if you just happen to have really, really, really, really, really bad luck.

Version 5 and Version 3 UUIDs use the SHA1 and MD5 hash functions respectively, to combine a namespace with a piece of already unique data to generate a UUID. This will, for example, allow you to produce a UUID from a URL. Collisions here are only possible if the underlying hash function also has a collision.

Version 1 UUIDs are the most common. They use the network card's MAC address (which unless spoofed, should be unique), plus a timestamp, plus the usual bit-twiddling to generate the UUID. In the case of a machine that doesn't have a MAC address, the 6 node bytes are generated with a cryptographically secure random number generator. If two UUIDs are generated in sequence fast enough that the timestamp matches the previous UUID, the timestamp is incremented by 1. Collisions should not occur unless one of the following happens: The MAC address is spoofed. One machine running two different UUID generating applications produces UUIDs at the exact same moment. Two machines without a network card or without user level access to the MAC address are given the same random node sequence, and generate UUIDs at the exact same moment. We run out of bytes to represent the timestamp and rollover back to zero.

Realistically, none of these events occur by accident within a single application's ID space. Unless you're accepting IDs on, say, an Internet-wide scale, or with an untrusted environment where malicious individuals might be able to do something bad in the case of an ID collision, it's just not something you should worry about. It's critical to understand that if you happen to generate the same version 4 UUID as I do, in most cases, it doesn't matter. I've generated the ID in a completely different ID space from yours. My application will never know about the collision so the collision doesn't matter. Frankly, in a single application space without malicious actors, the extinction of all life on earth will occur long before you have a collision, even on a version 4 UUID, even if you're generating quite a few UUIDs per second.

Also, 2^64 * 16 is 256 exabytes. As in, you would need to store 256 exabytes worth of IDs before you had a 50% chance of an ID collision in a single application space.


Bob论证了通常应用范围内没有恶意软件或个人存在的情况下,UUID只有非常小的可能性会冲突,uuid1()使用MAC地址来保证不重复,因为理论上全球的MAC地址不会重复,而uuid3和uuid5通过MD5和SHA1的算法不重复来保证唯一性,uuid4是一个基于密码安全的随机数发生器,并且有bit-twiddling(位旋转)机制,只有极微小的可能性重复。


最后自己有一个问题,到底uuid1还是uuid4的冲突可能性低?

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值