本文是在android平台实现的短url
数据库中保存short url与long url是一一对应的,一个short url对应一个long url,但一个long url可能对应多个short url,这个比较容易理解,比如,同一个long url在腾讯、新浪生成的短url就不一样,就是一个long url通过上面的算法来生成短url也能生成4个。
随着微博的兴起和流行,短url的使用也非常火。
一、主要目的可能有以下几点:
1、微博一般长度为140个字,这是从Twitter引进过来,因为人脑处理信息有限,每次最多140个字。如果用原始url,这个url可能会很长,可能会占去一条微博一半的长度,会导致阅读的美观性,还会影响这条微博其他内容的表达。
2、短url容易管理、统计。
下面先来看看短网址映射算法的理论(网上找的资料):
① 将长网址用md5算法生成32位签名串,分为4段,,每段8个字符;
② 对这4段循环处理,取每段的8个字符, 将他看成16进制字符串与0x3fffffff(30位1)的位与操作,超过30位的忽略处理;
③ 将每段得到的这30位又分成6段,每5位的数字作为字母表的索引取得特定字符,依次进行获得6位字符串;
④ 这样一个md5字符串可以获得4个6位串,取里面的任意一个就可作为这个长url的短url地址。
很简单的理论,我们并不一定说得到的URL是唯一的,但是我们能够取出4组URL,这样几乎不会出现太大的重复。
二、实现:
1、MD5的实现(
MD5
.java
):
package com.syl.shorturl;
import java.security.MessageDigest;
public class MD5 {private final static String[] hexDigits = {"0", "1", "2", "3", "4", "5", "6", "7","8", "9", "a", "b", "c", "d", "e", "f"};
public static String byteArrayToHexString(byte[] b){StringBuffer resultSb = new StringBuffer();for (int i = 0; i < b.length; i++){resultSb.append(byteToHexString(b[i]));}return resultSb.toString();}
private static String byteToHexString(byte b){int n = b;if (n < 0) {n = 256 + n;}int d1 = n / 16;int d2 = n % 16;return hexDigits[d1] + hexDigits[d2];}
public static String MD5Encode(String originalStr){String resultString = null;try {resultString=new String(originalStr);MessageDigest md = MessageDigest.getInstance("MD5");resultString.trim();
resultString=byteArrayToHexString(md.digest(resultString.getBytes("UTF-8")));}catch (Exception ex){}return resultString;}}
2、shortUrl实现(
MainActivity
.java
):
如上,生成短url,其中有一个为:http://syl.cn/meYZzqpackage com.syl.shorturl;
import android.os.Bundle;import android.app.Activity;import android.view.Menu;import android.view.View;import android.view.View.OnClickListener;import android.widget.Button;import android.widget.EditText;import android.widget.TextView;
public class MainActivity extends Activity {private static final String longUrl = "http://songyuanlin1101.blog.163.com";private static final String longUrlHead = "http://";private static final String shortUrlHead = "http://syl.cn/";private String str = "";private TextView tv;private EditText et;private Button bt;@Overrideprotected void onCreate(Bundle savedInstanceState) {super.onCreate(savedInstanceState);setContentView(R.layout.activity_main);tv = (TextView)findViewById(R.id.text);et = (EditText)findViewById(R.id.original);bt = (Button)findViewById(R.id.CovertBT);bt.setOnClickListener(new OnClickListener() {@Overridepublic void onClick(View v) {// TODO Auto-generated method stubfor (int i = 0; i < ShortUrl(et.getText().toString()).length; i++) {str = ShortUrl(et.getText().toString())[i];System.out.println(str);}tv.setText(str);}});}public static String[] ShortUrl(String string){String key = "syl"; //自定义生成MD5加密字符串前的混合KEYString[] chars = new String[]{ //要使用生成URL的字符"a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","0","1","2","3","4","5","6","7","8","9","A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"};String hex = MD5.MD5Encode(key + string);int hexLen = hex.length();int subHexLen = hexLen / 8;String[] ShortStr = new String[4];for (int i = 0; i < subHexLen; i++) {String outChars = "";int j = i + 1;String subHex = hex.substring(i * 8, j * 8);long idx = Long.valueOf("3FFFFFFF", 16) & Long.valueOf(subHex, 16);for (int k = 0; k < 6; k++) {int index = (int) (Long.valueOf("0000003D", 16) & idx);outChars += chars[index];idx = idx >> 5;}ShortStr[i] = shortUrlHead + outChars;}return ShortStr;}
@Overridepublic boolean onCreateOptionsMenu(Menu menu) {// Inflate the menu; this adds items to the action bar if it is present.getMenuInflater().inflate(R.menu.main, menu);return true;}
}
如下图:
3、存储url
在存放这个URL的数据方面,可以使用TTServer,网上有很多人推荐。
该数据库读写非常快。insert:0.4sec/1000000 recordes(2500000qps),写入100万数据只需要0.4秒。search:0.33sec/1000000 recordes (3000000 qps),读取100万数据只需要0.33秒。
可以看到对于字典类型的数据Key/Value的查询,这个数据库可以说是我目前见过效率非常高的,况且他如此的小巧,用来对short url/long url的配对再好不过。
数据库中保存short url与long url是一一对应的,一个short url对应一个long url,但一个long url可能对应多个short url,这个比较容易理解,比如,同一个long url在腾讯、新浪生成的短url就不一样,就是一个long url通过上面的算法来生成短url也能生成4个。
当然,我们通过上面的方法生成的那个短url,如果就这样直接通过浏览器地址栏访问,一定是访问不到的,因为浏览器无法通过这个短url解析出原始的long url。
那为什么新浪微博、腾讯微博生成的short url就可以直接通过浏览器地址栏访问,他们同样也是保存他们自己的数据库里面。个人觉得应该是浏览器通过这些short url能解析出他们的域名,进而能够通过这些short url匹配到他们那些数据库中对应的原始long url。