flickr 对于分布式系统生成全局唯一ID的解决方案

140 篇文章 1 订阅

flickr 对于分布式系统生成全局唯一ID的解决方案

原文地址 

http://code.flickr.com/blog/2010/02/08/ticket-servers-distributed-unique-primary-keys-on-the-cheap/

Ticket Servers:Distributed Unique Primary Keys on the Cheap

This is the first post in the Using,Abusing and Scaling MySQL atFlickr series.

Ticket servers aren’t inherently interesting, but they’re animportant building block at Flickr. They are core to topics we’llbe talking about later, like sharding and master-master. Ticketservers give us globally (Flickr-wide) unique integers to serve asprimary keys in our distributed setup.

Why?

Sharding (aka datapartioning) is how we scale Flickr’s datastore. Instead ofstoring all our data on one really big database, we have lots ofdatabases, each with some of the data, and spread the load betweenthem. Sometimes we need to migrate data between databases, so weneed our primary keys to be globally unique. Additionally our MySQLshards are built as master-master replicant pairs for resiliency.This means we need to be able to guarantee uniqueness within ashard in order to avoid key collisions. We’d love to go on usingMySQL auto-incrementing columns for primary keys like everyoneelse, but MySQL can’t guarantee uniqueness across physical andlogical databases.

GUIDs?

Given the need for globally unique ids the obvious question is, whynot use GUIDs? Mostly because GUIDs are big, and they index badlyin MySQL. One of the ways we keep MySQL fast is we index everythingwe want to query on, and we only query on indexes. So index size isa key consideration. If you can’t keep your indexes in memory, youcan’t keep your database fast. Additionally ticket servers give ussequentiality which has some really nice properties includingmaking reporting and debugging more straightforward, and enablingsome caching hacks.

Consistent Hashing?

Some projects like Amazon’sDynamo provide a consistent hashing ring ontop of the datastore to handle the GUID/sharding issue. This isbetter suited for write-cheap environments(e.g. LSMTs),while MySQL is optimized for fast random reads.

Centralizing Auto-Increments

If we can’t make MySQL auto-increments work across multipledatabases, what if we just used one database? If we inserted a newrow into this one database every time someone uploaded a photo wecould then just use the auto-incrementing ID from that table as theprimary key for all of our databases.

Of course at 60+ photos a second that table is going to get prettybig. We can get rid of all the extra data about the photo, and justhave the ID in the centralized database. Even then the table getsunmanageably big quickly. And there are comments, and favorites,and group postings, and tags, and so on, and those all need IDstoo.

REPLACE INTO

A little over a decade ago MySQL shipped with a non-standardextension to the ANSI SQL spec, “REPLACEINTO”. Later “INSERTON DUPLICATE KEY UPDATE” came along and solvedthe original problem much better. However REPLACE INTO is stillsupported.

REPLACE works exactly like INSERT, except that if an old row in thetable has the same value as a new row for a PRIMARY KEY or a UNIQUEindex, the old row is deleted before the new row is inserted.

This allows us to atomically update in a place a single row in adatabase, and get a new auto-incremented primary ID.

Putting It All Together

A Flickr ticket server is a dedicated database server, with asingle database on it, and in that database there are tableslike Tickets32 for32-bit IDs, and Tickets64 for64-bit IDs.

The Tickets64 schema looks like:

CREATE TABLE `Tickets64` (
  `id` bigint(20) unsigned NOT NULL auto_increment,
  `stub` char(1) NOT NULL default '',
  PRIMARY KEY  (`id`),
  UNIQUE KEY `stub` (`stub`)
) ENGINE=MyISAM

SELECT * from Tickets64 returns a singlerow that looks something like:

+-------------------+------+
| id                | stub |
+-------------------+------+
| 72157623227190423 |    a |
+-------------------+------+

When I need a new globally unique 64-bit ID I issue the followingSQL:

REPLACE INTO Tickets64 (stub) VALUES ('a');
SELECT LAST_INSERT_ID();

SPOFs

You really really don’t know want provisioning your IDs to be asingle point of failure. We achieve “high availability” by runningtwo ticket servers. At this write/update volume replicating betweenthe boxes would be problematic, and locking would kill theperformance of the site. We divide responsibility between the twoboxes by dividing the ID space down the middle, evens and odds,using:

TicketServer1:
auto-increment-increment = 2
auto-increment-offset = 1

TicketServer2:
auto-increment-increment = 2
auto-increment-offset = 2

We round robin between the two servers to load balance and dealwith down time. The sides do drift a bit out of sync, I think wehave a few hundred thousand more odd number objects then evenlynumbered objects at the moment, but this hurts no one.

More Sequences

We actually have more tables thenjust Tickets32 and Tickets64 onthe ticket servers. We have a sequences for Photos, for Accounts,for OfflineTasks,and for Groups, etc. OfflineTasks get their own sequence because weburn through so many of them we don’t want to unnecessarily run upthe counts on other things. Groups, and Accounts get their ownsequence because we get comparatively so few of them. Photos havetheir own sequence that we made sure to sync to our oldauto-increment table when we cut over because its nice to know howmany photos we’ve had uploaded, and we use the ID as a short handfor keeping track.

So There’s That

It’s not particularly elegant, but it works shockingly well for ushaving been in production since Friday the 13th, January 2006, andis a great example of the Flickr engineeringdumbestpossible thing that will work designprinciple.

More soon.

Belorussian translationprovided by PC.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值