tiny url design

最新推荐文章于 2021-04-23 12:46:57 发布

proudmore

最新推荐文章于 2021-04-23 12:46:57 发布

阅读量1.4k

点赞数

分类专栏： system design

system design 专栏收录该内容

16 篇文章 0 订阅

订阅专栏

What is tinyurl?

tinyurl is a URL service that users enter a long URL and then the service return a shorter and unique url such as "http://tiny.me/5ie0V2". The highlight part can be any string with 6 letters containing [0-9, a-z, A-Z]. That is, 62^6 ~= 56.8 billions unique strings.

一般流程

Ask questions; Understand the constraints and use cases.
abstract design，画图
bottlenecks, 现在的和以后traffic，data量变大时
Address these bottlenecks usingscalable system design.

1.和面试官讨论use cases

shortening: take a url=>return a shorter one
redirection: take a url=>redirect to the original one
high availability

2.估算系统scale

至少：amount of traffic the system handles 和 the amount of data

3.Abstract design

按layer进行,这里没有BI layer

(1)application layer

shortening service, redirecting service

(2)data storage layer

database schema: id, original_url, short_url

转换算法的实现：

Suppose we have a database which contains three columns: id (auto increment), actual url, and shorten url. id的另一种实现方法可以是md5(original_url+random_salt)

Intuitively, we can design a hash function that maps the actual url to shorten url. But string to string mapping is not easy to compute.

Notice that in the database, each record has a unique id associated with it. What if we convert the id to a shorten url?

Basically, we need a Bijective function f(x) = y such that

Each x must be associated with one and only one y;
Each y must be associated with one and only one x.

In our case, the set of x's are integers while the set of y's are 6-letter-long strings. Actually, each 6-letter-long string can be considered as a number too, a 62-base numeric, if we map each distinct character to a number,

e.g. 0-0, ..., 9-9, 10-a, 11-b, ..., 35-z, 36-A, ..., 61-Z.

Then, the problem becomes Base Conversion problem which is bijection (if not overflowed :).

 public String shorturl(int id, int base, HashMap map) {
  StringBuilder res = new StringBuilder();
  while (id > 0) {
    int digit = id % base;
    res.append(map.get(digit));
    id /= base;
  }
  while (res.length() < 6)  res.append('0');
  return res.reverse().toString();
}

For each input long url, the corresponding id is auto generated (in O(1) time). The base conversion algorithm runs in O( k ) time where k is the number of digits (i.e. k=6 ).

4.分析bottlenecks，然后scale

(1)traffic

10% from shortening, 90% from redirection

request per second: 400 (shortening:40, redirection:360)

因为这2个计算都很light，所以traffic不是bottleneck

一共需要6 billion in 5 years

original url: 500 bytes

short url: 算出来是6 bytes

data written per second: 40*(500+6) = 20k

data read per second: 360*506 bytes=180k

所以the data going in and out of the pipe is not much, I/O 不是bottleneck。

(2)data

3TB for all urls, 36 GB for short urls (5 years)

是bottleneck，所以需要scale

5.scalable design

(1)application service layer

add a load balancer + machine cluster over time: when spike traffic, increase machines. delete them when normal. (amazon ELB)

(2)data storage

1)billions of objects

2)each objects is small, <1k

3)no relationships between objects

4)reads are 9x more frequent than writes (360, 40)

5)3TB original urls, 36GB short urls

第一种方法: mysql

1)use one table: short_url: varchar(6), original_url:varchar(512)

2)unique index on the short url (36GB+index overhead), and hold it in memory

3)sharding, 用short_url的第一个char mod partition个数

4)master-slave replication, master-master replication

第二种方法

We can use Distributed Database . But maintenance for such a db would be much more complicated (replicate data across servers, sync among servers to get a unique id, etc.).

Alternatively, we can use Distributed Key-Value Datastore .
Some distributed datastore (e.g. Amazon's Dynamo ) uses Consistent Hashing to hash servers and inputs into integers and locate the corresponding server using the hash value of the input. We can apply base conversion algorithm on the hash value of the input.

The basic process can be:
Insert

Hash an input long url into a single integer;
Locate a server on the ring and store the key--longUrl on the server;
Compute the shorten url using base conversion (from 10-base to 62-base) and return it to the user.

Retrieve