tiny url design

What is tinyurl?

tinyurl is a URL service that users enter a long URL and then the service return a shorter and unique url such as "http://tiny.me/5ie0V2". The highlight part can be any string with 6 letters containing [0-9, a-z, A-Z]. That is, 62^6 ~= 56.8 billions unique strings.


一般流程

  1. Ask questions; Understand the constraints and use cases.
  2. abstract design,画图
  3. bottlenecks, 现在的和以后traffic,data量变大时
  4. Address these bottlenecks usingscalable system design.


1.和面试官讨论use cases

  • shortening: take a url=>return a shorter one
  • redirection: take a url=>redirect to the original one
  • high availability 
2.估算系统scale
至少:amount of traffic the system handles 和 the amount of data

3.Abstract design
按layer进行,这里没有BI layer
(1)application layer
shortening service, redirecting service
(2)data storage layer
database schema: id, original_url, short_url

转换算法的实现:

Suppose we have a database which contains three columns: id (auto increment), actual url, and shorten url. id的另一种实现方法可以是md5(original_url+random_salt)


Intuitively, we can design a hash function that maps the actual url to shorten url. But string to string mapping is not easy to compute.

Notice that in the database, each record has a unique id associated with it. What if we convert the id to a shorten url?


Basically, we need a  Bijective function  f(x) = y such that
  • Each x must be associated with one and only one y;
  • Each y must be associated with one and only one x.
In our case, the set of x's are integers while the set of y's are 6-letter-long strings. Actually, each 6-letter-long string can be considered as a number too, a 62-base numeric, if we map each distinct character to a number,
e.g.  0-0, ..., 9-9, 10-a, 11-b, ..., 35-z, 36-A, ..., 61-Z.
Then, the problem becomes  Base Conversion  problem which is bijection (if not overflowed :).
 public String shorturl(int id, int base, HashMap map) {
  StringBuilder res = new StringBuilder();
  while (id > 0) {
    int digit = id % base;
    res.append(map.get(digit));
    id /= base;
  }
  while (res.length() < 6)  res.append('0');
  return res.reverse().toString();
}
For each input long url, the corresponding id is auto generated (in O(1) time). The base conversion algorithm runs in O( k ) time where  k  is the number of digits (i.e.  k=6 ).


4.分析bottlenecks,然后scale

(1)traffic

10% from shortening, 90% from redirection

request per second: 400 (shortening:40, redirection:360) 

因为这2个计算都很light,所以traffic不是bottleneck

一共需要6 billion in 5 years

original url: 500 bytes

short url: 算出来是6 bytes

data written per second: 40*(500+6) = 20k

data read per second: 360*506 bytes=180k

所以the data going in and out of the pipe is not much, I/O 不是bottleneck。

(2)data

3TB for all urls, 36 GB for short urls (5 years)

是bottleneck,所以需要scale


5.scalable design

(1)application service layer

add a load balancer + machine cluster over time: when spike traffic, increase machines. delete them when normal.   (amazon ELB)

(2)data storage

1)billions of objects

2)each objects is small, <1k

3)no relationships between objects

4)reads are 9x more frequent than writes (360, 40)

5)3TB original urls, 36GB short urls


第一种方法: mysql

1)use one table:  short_url: varchar(6), original_url:varchar(512)

2)unique index on the short url (36GB+index overhead), and hold it in memory

3)sharding, 用short_url的第一个char mod partition个数 

4)master-slave replication, master-master replication


第二种方法

We can use  Distributed Database . But maintenance for such a db would be much more complicated (replicate data across servers, sync among servers to get a unique id, etc.).

Alternatively, we can use  Distributed Key-Value Datastore .
Some distributed datastore (e.g. Amazon's  Dynamo ) uses  Consistent Hashing  to hash servers and inputs into integers and locate the corresponding server using the hash value of the input. We can apply base conversion algorithm on the hash value of the input.

The basic process can be:
Insert
  1. Hash an input long url into a single integer;
  2. Locate a server on the ring and store the key--longUrl on the server;
  3. Compute the shorten url using base conversion (from 10-base to 62-base) and return it to the user.
Retrieve
  1. Convert the shorten url back to the key using base conversion (from 62-base to 10-base);
  2. Locate the server containing that key and return the longUrl.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值