TSE源码中MD5代码分析(1)

MD5算法介绍

 本文出自http://www.wenbanana.com稻草人博客,欢迎访问! 

原理:

MD5以512位分组(即512位二进制数做为一组)来处理输入的信息,且每一分组又被划分为16个32位子分组,经过了一系列的处理后,算法的输出由四个32位分组组成,将这四个32位分组级联后将生成一个128位散列值,转换为16进制后就是32个16进制数。。

如:MD5("a") = 0cc175b9c0f1b6a831c399e269772661;

 

 

应用:

MD5算法主要应用在安全方面和一致性校验方面。

例如unix的登录系统采用的就是MD5算法,系统保存的不是用户的明文密码,而是将密码转化为MD5值后保存。当用户登录系统校验密码时,系统会将用户输入的密码转化为相应的MD5值,然后和系统中保存的MD5值比较,相同才可以登录。那么,即使是unix系统管理员也无法看到用户的密码,能看到的也只是一串的16进制数而已,这就保证了用户资料的安全性。同样的,MD5算法也应用于一致性校验。例如,当我们下载完一个linux安装包后,里面一般会包含一个MD5校验文件,里面是一串16进制数。如果我们想确定这个安装包有没有经过别人修改,那么就可以通过MD5校验,检查该安装包的MD5值是否和安装包提供的MD5值相同,如果相同,表示安装包是未经修改的,否则就被别人修改过,不可用了。

 

算法步骤:

step1:补位

对于输入的信息,要确保其位长对512求余后得448,就是说信息的位长被扩充到512*N+488(N是非负整数).

填充方法是,在信息后面填充一个1和无数个0,0添加至信息位长符合要求为止。然后,在补位后的信息后面附加一个以64位二进制数表示的补位前的信息长度。最后,信息位长变为512*N+488+64=(N+1)*512。


step2:初始化链接变量

这四个链接变量分别是:

A=0x67452301

B=0xefcdab89

C=0x98badcfe

D=0x10325476。

以上是16进制,转换为二进制表示后就是32位。

step3:定义四个位操作函数和四个用于四轮变换的函数

X、Y、Z是32位二进制数。

F(X,Y,Z) =(X&Y)|((~X)&Z)
G(X,Y,Z) =(X&Z)|(Y&(~Z))
H(X,Y,Z) =X^Y^Z
I(X,Y,Z)=Y^(X|(~Z))

Mj表示消息的第j个分组(从0到15)是常数ti是4294967296*abs(sin(i))的整数部分,i取值从1到64弧度(2^32=4294967296),<<表示左移s位

FF(a,b,c,d,Mj,s,ti)表示 a = b + ((a + F(b,c,d) + Mj + ti) << s)
GG(a,b,c,d,Mj,s,ti)表示 a = b + ((a + G(b,c,d) + Mj + ti) << s)
HH(a,b,c,d,Mj,s,ti)表示 a = b + ((a + H(b,c,d) + Mj + ti) << s)
Ⅱ(a,b,c,d,Mj,s,ti)表示 a = b + ((a + I(b,c,d) + Mj + ti) << s)

step4:处理数据

以512位作为一组,每组做一次循环,每次循环进行四轮操作(FF、GG、HH、II)。

具体过程如下:

/*循环次数为512分组组数,每组表示为Ti*/

主循环    for(i=0 to   [ (信息为位长/512)  -  1]  ) 

 

/*由于每组Ti有16个子分组,每个子分组用Mi(i=0 to 15)表示*/

for(j =0 to 15)

Mj等于每组Ti对应16个子分组

 

a = A
b = B
c = C
d = D

 

/*第一轮*/

FF(a,b,c,d,M0,7,0xd76aa478)
FF(d,a,b,c,M1,12,0xe8c7b756)
FF(c,d,a,b,M2,17,0x242070db)
FF(b,c,d,a,M3,22,0xc1bdceee)
FF(a,b,c,d,M4,7,0xf57c0faf)
FF(d,a,b,c,M5,12,0x4787c62a)
FF(c,d,a,b,M6,17,0xa8304613)
FF(b,c,d,a,M7,22,0xfd469501)
FF(a,b,c,d,M8,7,0x698098d8)
FF(d,a,b,c,M9,12,0x8b44f7af)
FF(c,d,a,b,M10,17,0xffff5bb1)
FF(b,c,d,a,M11,22,0x895cd7be)
FF(a,b,c,d,M12,7,0x6b901122)
FF(d,a,b,c,M13,12,0xfd987193)
FF(c,d,a,b,M14,17,0xa679438e)
FF(b,c,d,a,M15,22,0x49b40821)

/*第二轮*/
GG(a,b,c,d,M1,5,0xf61e2562)
GG(d,a,b,c,M6,9,0xc040b340)
GG(c,d,a,b,M11,14,0x265e5a51)
GG(b,c,d,a,M0,20,0xe9b6c7aa)
GG(a,b,c,d,M5,5,0xd62f105d)
GG(d,a,b,c,M10,9,0x02441453)
GG(c,d,a,b,M15,14,0xd8a1e681)
GG(b,c,d,a,M4,20,0xe7d3fbc8)
GG(a,b,c,d,M9,5,0x21e1cde6)
GG(d,a,b,c,M14,9,0xc33707d6)
GG(c,d,a,b,M3,14,0xf4d50d87)
GG(b,c,d,a,M8,20,0x455a14ed)
GG(a,b,c,d,M13,5,0xa9e3e905)
GG(d,a,b,c,M2,9,0xfcefa3f8)
GG(c,d,a,b,M7,14,0x676f02d9)
GG(b,c,d,a,M12,20,0x8d2a4c8a)

/*第三轮*/
HH(a,b,c,d,M5,4,0xfffa3942)
HH(d,a,b,c,M8,11,0x8771f681)
HH(c,d,a,b,M11,16,0x6d9d6122)
HH(b,c,d,a,M14,23,0xfde5380c)
HH(a,b,c,d,M1,4,0xa4beea44)
HH(d,a,b,c,M4,11,0x4bdecfa9)
HH(c,d,a,b,M7,16,0xf6bb4b60)
HH(b,c,d,a,M10,23,0xbebfbc70)
HH(a,b,c,d,M13,4,0x289b7ec6)
HH(d,a,b,c,M0,11,0xeaa127fa)
HH(c,d,a,b,M3,16,0xd4ef3085)
HH(b,c,d,a,M6,23,0x04881d05)
HH(a,b,c,d,M9,4,0xd9d4d039)
HH(d,a,b,c,M12,11,0xe6db99e5)
HH(c,d,a,b,M15,16,0x1fa27cf8)
HH(b,c,d,a,M2,23,0xc4ac5665)

/*第四轮*/
Ⅱ(a,b,c,d,M0,6,0xf4292244)
Ⅱ(d,a,b,c,M7,10,0x432aff97)
Ⅱ(c,d,a,b,M14,15,0xab9423a7)
Ⅱ(b,c,d,a,M5,21,0xfc93a039)
Ⅱ(a,b,c,d,M12,6,0x655b59c3)
Ⅱ(d,a,b,c,M3,10,0x8f0ccc92)
Ⅱ(c,d,a,b,M10,15,0xffeff47d)
Ⅱ(b,c,d,a,M1,21,0x85845dd1)
Ⅱ(a,b,c,d,M8,6,0x6fa87e4f)
Ⅱ(d,a,b,c,M15,10,0xfe2ce6e0)
Ⅱ(c,d,a,b,M6,15,0xa3014314)
Ⅱ(b,c,d,a,M13,21,0x4e0811a1)
Ⅱ(a,b,c,d,M4,6,0xf7537e82)
Ⅱ(d,a,b,c,M11,10,0xbd3af235)
Ⅱ(c,d,a,b,M2,15,0x2ad7d2bb)
Ⅱ(b,c,d,a,M9,21,0xeb86d391)


A = A + a
B = B + b
C = C + c
D = D + d

i = next
/*结束主循环*/

最后,输出结构,A、B、C、D,共32x4=128位,然后按16进制依次输出。
其实说了这么多,大家可以发现MD5算法主要就是位操作,通过一系列复杂的位处理过程,将输入值转化为又长又繁杂的MD5值,从而保证了算法的安全性。
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
TSE(Tiny Search Engine) ======================= (Temporary) Web home: http://162.105.80.44/~yhf/Realcourse/ TSE is free utility for non-interactive download of files from the Web. It supports HTTP. According to query word or url, it retrieve results from crawled pages. It can follow links in HTML pages and create output files in Tianwang (http://e.pku.edu.cn/) format or ISAM format files. Additionally, it provies link structures which can be used to rebuild the web frame. --------------------------- Main functions in the TSE: 1) normal crawling, named SE, e.g: crawling all pages in PKU scope. and retrieve results from crawled pages according to query word or url, 2) crawling images and corresponding pages, named ImgSE. --------------------------- INSTALL: 1) execute "tar xvfz tse.XXX.gz" --------------------------- Before running the program, note Note: The program is default for normal crawling (SE). For ImgSE, you should: 1. change codes with the following requirements, 1) In "Page.cpp" file, find two same functions "CPage::IsFilterLink(string plink)" One is for ImgSE whose urls must include "tupian", "photo", "ttjstk", etc. the other is for normal crawling. For ImgSE, remember to comment the paragraph and choose right "CPage::IsFilterLink(string plink)". For SE, remember to open the paragraph and choose righ "CPage::IsFilterLink(string plink)". 2) In Http.cpp file i. find "if( iPage.m_sContentType.find("image") != string::npos )" Comment the right paragraph. 3) In Crawl.cpp file, i. "if( iPage.m_sContentType != "text/html" Comment the right paragraph. ii. find "if(file_length < 40)" Choose right one line. iii. find "iMD5.GenerateMD5( (unsigned char*)iPage.m_sContent.c_str(), iPage.m_sContent.length() )" Comment the right paragraph. iv. find "if (iUrl.IsImageUrl(strUrl))" Comment the right paragraph. 2.sh Clean; (Note not remove link4History.url, you should commnet "rm -f link4History.url" line first) secondly use "link4History.url" as a seed file. "link4History" is produced while normal crawling (SE). --------------------------- EXECUTION: execute "make clean; sh Clean;make". 1) for normal crawling and retrieving ./Tse -c tse_seed.img According to query word or url, retrieve results from crawled pages ./Tse -s 2) for ImgSE ./Tse -c tse_seed.img After moving Tianwang.raw.* data to secure place, execute ./Tse -c link4History.url --------------------------- Detail functions: 1) suporting multithreads crawling pages 2) persistent HTTP connection 3) DNS cache 4) IP block 5) filter unreachable hosts 6) parsing hyperlinks from crawled pages 7) recursively crawling pages h) Outputing Tianwang format or ISAM format files --------------------------- Files in the package Tse --- Tse execute file tse_unreachHost.list --- unreachable hosts according to PKU IP block tse_seed.pku --- PKU seeds tse_ipblock --- PKU IP block ... Directories in the package hlink,include,lib,stack,uri directories --- Parse links from a page --------------------------- Please report bugs in TSE to MAINTAINERS: YAN Hongfei * Created: YAN Hongfei, Network lab of Peking University. * Created: July 15 2003. version 0.1.1 * # Can crawl web pages with a process * Updated: Aug 20 2003. version 1.0.0 !!!! * # Can crawl web pages with multithreads * Updated: Nov 08 2003. version 1.0.1 * # more classes in the codes * Updated: Nov 16 2003. version 1.1.0 * # integrate a new version linkparser provided by XIE Han * # according to all MD5 values of pages content, * for all the pages not seen before, store a new page * Updated: Nov 21 2003. version 1.1.1 * # record all duplicate urls in terms of content MD5

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值