ELFHash 算法

最新推荐文章于 2020-03-10 17:37:59 发布

残阙的歌

最新推荐文章于 2020-03-10 17:37:59 发布

阅读量510

点赞数 1

分类专栏：算法

本文链接：https://blog.csdn.net/u010666884/article/details/50237405

版权

算法专栏收录该内容

4 篇文章 0 订阅

订阅专栏

最近在对Heritrix 进行线程策略优化的时候（原来是根据Hostname来开线程的，现改为根据hash函数算出来的key值开线程），需要用到ELFHash算法，上网找了找资料，自己总结下。

它对于长字符串和短字符串都很有效，字符串中每个字符都有同样的作用，它巧妙地对字符的ASCII编码值进行计算，ELFhash函数对于能够比较均匀地把字符串分布在散列表中。这些函数使用位运算使得每一个字符都对最后的函数值产生影响。

java 版ELFHash算法：

  private long  ELFHash(String str){
    	long hash=0;
    	long x=0;
    	for(int i=0;i<str.length();i++){
    		 hash = (hash << 4 )+str.charAt(i);<span style="color: rgb(0, 130, 0); font-family: Consolas, 'Courier New', Courier, mono, serif; line-height: 18px;">//hash左移4位，把当前字符ASCII存入hash低四位。 </span>
    		 if((x= hash & 0xF000000L) != 0){

<ol start="1" class="dp-cpp" style="padding: 0px; border: none; list-style-position: initial; list-style-image: initial; color: rgb(92, 92, 92); font-family: Consolas, 'Courier New', Courier, mono, serif; line-height: 26px; margin: 0px 0px 1px 45px !important;"><li style="border-style: none none none solid; border-left-width: 3px; border-left-color: rgb(108, 226, 108); list-style: decimal-leading-zero outside; background-color: rgb(248, 248, 248); line-height: 18px; margin: 0px !important; padding: 0px 3px 0px 10px !important;"><span style="margin: 0px; padding: 0px; border: none; color: black; background-color: inherit;">  <span class="comment" style="margin: 0px; padding: 0px; border: none; color: rgb(0, 130, 0); background-color: inherit;">//如果最高的四位不为0，则说明字符多余7个，现在正在存第7个字符，如果不处理，再加下一个字符时，第一个字符会被移出，因此要有如下处理。</span><span style="margin: 0px; padding: 0px; border: none; background-color: inherit;">  </span></span></li><li class="alt" style="border-style: none none none solid; border-left-width: 3px; border-left-color: rgb(108, 226, 108); list-style: decimal-leading-zero outside; color: inherit; line-height: 18px; margin: 0px !important; padding: 0px 3px 0px 10px !important;"><span style="margin: 0px; padding: 0px; border: none; color: black; background-color: inherit;">            <span class="comment" style="margin: 0px; padding: 0px; border: none; color: rgb(0, 130, 0); background-color: inherit;">//该处理，如果最高位为0，就会仅仅影响5-8位，否则会影响5-31位，因为C语言使用的算数移位</span><span style="margin: 0px; padding: 0px; border: none; background-color: inherit;">  </span></span></li><li style="border-style: none none none solid; border-left-width: 3px; border-left-color: rgb(108, 226, 108); list-style: decimal-leading-zero outside; background-color: rgb(248, 248, 248); line-height: 18px; margin: 0px !important; padding: 0px 3px 0px 10px !important;"><span style="margin: 0px; padding: 0px; border: none; color: black; background-color: inherit;">            <span class="comment" style="margin: 0px; padding: 0px; border: none; color: rgb(0, 130, 0); background-color: inherit;">//因为1-4位刚刚存储了新加入到字符，所以不能>>28</span><span style="margin: 0px; padding: 0px; border: none; background-color: inherit;">  </span></span></li></ol>
    			  hash ^= x>>24;

<span style="font-family: Consolas, 'Courier New', Courier, mono, serif; line-height: 18px; background-color: rgb(248, 248, 248);"> </span><span class="comment" style="margin: 0px; padding: 0px; border: none; color: rgb(0, 130, 0); background-color: rgb(248, 248, 248); font-family: Consolas, 'Courier New', Courier, mono, serif; line-height: 18px;">//上面这行代码并不会对X有影响，本身X和hash的高4位相同，下面这行代码&~即对28-31(高4位)位清零。</span>
    		 	  hash &= ~x;
    		 }
    	}

<span style="color: rgb(0, 130, 0); font-family: Consolas, 'Courier New', Courier, mono, serif; line-height: 18px; background-color: rgb(248, 248, 248);">//返回一个符号位为0的数，即丢弃最高位，以免函数外产生影响。(我们可以考虑，如果只有字符，符号位不可能为负)</span>
    	return hash & 0x7FFFFFFFL;
    }

应用场景：通常用来按规律地存放字符串并能快速查找。例如，我存放一堆字符串，但是要求能快速对查找出来。那么首先定义一个大数组存放，存放的index值根据ELF值算出来，如果算出的结果相同，那么用链表链接起来：

c版：

#include <iostream>
#include <fstream>
#include <string.h>

#define N 100001
#define strSize 15

using namespace std;

struct hash{
    bool used;
    char fn[strSize],en[strSize];
    hash* next;    //用于冲突时构造链表
        hash(){used=false; next=NULL;}
        hash(char *f,char *e)
    {
        strcpy(fn,f);
        strcpy(en,e);
        used=false;
        next=NULL;
    }    
}h[N];

int ELFhash(char *key){

    unsigned long h=0;
    unsigned long x=0;

    while(*key)
    {
        h=(h<<4)+(*key++);  //h左移4位，当前字符ASCII存入h的低四位
                if( (x=h & 0xF0000000L)!=0)
        { //如果最高位不为0，则说明字符多余7个，如果不处理，再加第九个字符时，第一个字符会被移出
          //因此要有如下处理
          h^=(x>>24);
          //清空28~31位
          h&=~x;
        }
    }
    return h % N;
}


int main()
{
    freopen("acm.txt","r",stdin);
    char str[30],en[strSize],fn[strSize];
    hash* p;
    int sign=1,key;

    while(gets(str))
    {
        if(str[0]=='\0')
        {
            sign=0;
            continue;
        }
        if(sign)   //输入字典
        {
            sscanf(str,"%s %s",&en,&fn);
            key=ELFhash(fn);    //获取hash值
            if(!h[key].used)    //对应到hash表中
            {
                h[key].used=true;
                strcpy(h[key].en,en);
                strcpy(h[key].fn,fn);
            }
            else   //处理冲突
            {
                p=&h[key];
                while(p->next != NULL) p=p->next;
                p->next=new hash(fn,en);
            }

        }
        else  //输入外文
        {
            key=ELFhash(str);
            if(!h[key].used) printf("eh\n");
            else
            {
                p=&h[key];
                while(p!=NULL)
                {
                    if(!strcmp(str,p->fn))
                    {
                        printf("%s\n",p->en);
                        break;
                    }
                    else
                    {
                        p=p->next;
                    }
                }
                if(p==NULL) printf("eh\n");  //不匹配的情况，不能少
            }
        
        }

    }
    return 0;
}

残阙的歌

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ELFHash 算法

最近在对Heritrix 进行线程策略优化的时候（原来是根据Hostname来开线程的，现改为根据hash函数算出来的key值开线程），需要用到ELFHash算法，上网找了找资料，自己总结下。它对于长字符串和短字符串都很有效，字符串中每个字符都有同样的作用，它巧妙地对字符的ASCII编码值进行计算，ELFhash函数对于能够比较均匀地把字符串分布在散列表中。这些函数使用位运算使得每一个字符都对
复制链接

扫一扫

专栏目录