基于共享内存多级hash设计

最新推荐文章于 2023-09-05 10:47:58 发布

小雄哥

最新推荐文章于 2023-09-05 10:47:58 发布

阅读量4.3k

点赞数 1

分类专栏：语言基础文章标签：共享内存

语言基础专栏收录该内容

58 篇文章 0 订阅

订阅专栏

Feature list:

1. 支持Set/Get/Replace/Update, 优化编译下默认16级冲突下单进程下Set/Get可达170w次/s。（理论上只要不达到内容的带宽限制，而且是多核的机器下性能是随线进程数陪增）

2. 支持多程下操作(可预见低的出错概率)

3. 支持CAS弱一致性检测(需要编译时开启)

4. 支持二进制文件导出/导入

5. 支持遍历处理（回调）

6. 可以自定义最大可冲突级数/单个桶最大长度（需要重新编译库）

注意：下面版本是32位机上实现的，64位机需要做些兼容修改

提供运维工具

a) 支持Bindump/Binrestore：共享内存保存到二进制文件和从二进制文件恢复到内存(支持跨机器操作)

b) 查看当前使用率与各级桶的使用率( 未实现）

c) 修改为只读、只写模式( 未实现）

d) 支持远程恢复与同步( 未实现）

实现

1. 预设N个大素数（PRIMER_TABLE,目前取值是1000037~1001323,共100个，最大数为1000,000*100 = 1亿个元素，默认初始化为16级hash）

2. 第一个素数作为一个桶长度，存储hash后的value时,就是以%素数来取模

3. 存储共享内存是一个线性空间，所以要先计算好每列的偏移量

(PRIMER_TABLE_OFFSET[i] = ∑PRIMER_TABLE[i])

4. Value需要定义固定长度，不能是指针指向外一个内存块

5. Hash值计算 ( key + Factor*k )%L, Factor初设为一个素数作为hash因子，用于扩散hash 值，减少冲突发生 1 11%10（后继可以考虑更好的hash算法）

6. 支持char* hash( time33 hash算法)

7. 每次set都要先遍历k阶是否key存在，插入中O(k)

8. 为了确保性能，可设置共享内存块不允许换出

9. 大小限制，32位机器受单个进程的进程空间限制，最大为4G。共享内在需要映射到进程空间，考虑到进程本身也需要使用到相应的内存，所以尽量限制在2G以下

设计理念：1）尽可能高的性能 2）能够提供尽可能的多线程安全 3）数据持久化

冲突处理

a) 先从第一个桶hash,如果当前位置已设置

(第一个桶hash + maxline*factor )%1000037

b) 再向下一个桶hash，按下一个桶的大素数来取余

c) 最大可冲突次数为hash级数( 可冲突次数越多，占满率越高，但是影响性能

优点：

1.非常地快！基于共享内存可以多进程共享

2.基于一维线性空间实现可从外部copy数据备份/恢复

3.空间利用率高，可达90%

4.Set/Get在优化编译后性能高

缺点：

1. 在设定的多次hash级内都冲突的话就插入不成功，但是机率很少。但是要慎重选择hash级数（建议起始级数设定16级，下面有数据说明）

2. 由于对同一个key的取value操作不是原子的，在非常低的概率下，可能在取一个key的同时在对这个key做set，导致只取值不完整(但是需要对一个key的操作在1/170w s内 )。除非是key很少，而且频繁访问(这种操作是建议加锁)

3. 单进程共享内在需要限制在4G(最好是2G)以内

4. Key不能为0

测试数据

（4字节的key, 12个字节value，直接rand()，成生的随机数可能不够足够精确）

测试环境：

Intel(R) Xeon(R) CPU E5405 @ 2.00GHz（4核）

16G内存DDR3,内存带宽 =(1066/8)×64×8=68224Mbit/8 ~ 8GB/s

测试期间CPU 只使用到一个核 100%

	10	16	20	32
总容量	1kw（~100M）	1.6kw(~200M)	2kw(~250M)	3.2kw(400M)
插入失败时的Rand次数	6,539,556	12,786,141	16,604,767	28,811,470
插入元素	6,529,720	12,748,311	16,540,688	28,618,666
耗时（未开-O3）	15,536,487us	49,602,499us	81,243,290us	210,149,843us
存取率（未开-O3）	~42w/s	~25.7w/s	~20w/s	~13w/s
耗时（打开-O3）	2,910,194us	7,379,382us	15,267,184us	35,694,707us
存取率（打开-O3）	~225w/s	~175w/s	~109w/s	~80w/s
利用率	0.6529	0.796655	0. 826898	0.894123
是否达上限	是	是	是	是

16级( 10次测试时间)

从测试数据可得出结论：

1）在正式使用时要打开O3的标记编译，可以编译成一个独立库来使用

2）使用级数越多，使用率越高，性能也随之下降

3）由于优化编译后性能非常高，所以可以尽量使用多的冲突级，减少因冲突无法插入的情况，同时提高hash空间的使用率真

4）初始化桶是取从1000037后素数，可以扩大桶的长度，从而减少级数

5）默认使用16级，根据需要自己再提高使用级数，可预计处理量约为1.2kw

冲突处理

问题：不同的字符串，可能hash出同一个值

Time33的算法，大概是 100w数据，有100~150个冲突

测试：

try set times:1000000

conflict times:105

conflict rate:0.000105

不同hash算法的测试结果：

测试条件：随机 32位数，%lu-%i-%j 的形式生成字符串，100w(排除因级数不够无法插入的情况,因开启测速gprof/-g, 所以性能有很大影响 )

Hash 函数	来源	结果	说明
time33_ hash	gcc/redis	hash total size:10001108 use time: 9940793us try set times:1000000 conflict times:112 conflict rate:0.000112	最简单实现
bobjenkins_hash	memcache	hash total size:10001108 use time: 9407349us try set times:1000000 conflict times:128 conflict rate:0.000128	通过多次rot将key散列，但是从测试效果来说不太明显
blizzard_hash	网上找的一个one-way hash	hash total size:10001108 use time: 9491830us try set times:1000000 conflict times:125 conflict rate:0.000125	通过预生成一个查找表来生成hash,但是多了一层函数调用，在频率调用下反而速度有少少影响
murmur_hash	levelDB	hash total size:10001108 use time: 8885562us try set times:1000000 conflict times:120 conflict rate:0.00012	支持设置初始值可以生成多个不同的hash值，较为简洁

结论：

1. 从多个开源软件用到的hash函数抽出测试来看，对同一个字符串结果集做hash出的key冲突率都差不多（这里的冲突是指两个不同的字符串hash出同一个一样的32值）

2. 综合来说选择 murmur_hash 比较合适

3. 同样需要解决的一个问题：怎么处理两个不同的字符串hash出同一个值的情况？

解决：

1）以同一个字符串str, 同时hash出三个值

h1 = murmur_hash( str, 0 );

h2 = murmur_hash( str, 1 );

h3 = murmur_hash( str, 2 );

2） h1 用作key的定位，计算hash要保存的位置。h2/h3作是比较值，即：对于一个key,需要满足三个hash值都一致时才认为是需要找的key

3）对于不同的str1/str2, 如果三个值都一样发生的机率是 (150/1000000 )^3 ~ 0.000000000003375。即约为 1/30b的机率，基本可以忽略了！

多线、进程下一致性处理（未实现）

1. CAS机制：

a) 为每一个Node在Set的时候分配一个cas值（返回的Node和存储Node的cas值一样，每次要更新这个Node时要检查是否cas与取出来时一致

b) 如果打开了这项检查，需要强制传入模块带上一个unsigned int cas;字段

c) 只有在Update一个key的value时才会造成多线程冲突，只是Set/Get是不会的。单线程也不会

d) 增加步进的概念：cas每次自增每个线程都不一样，这样可以每个线程有一个确定变量，如果是由其它线程修改的一定与本线程的cas不一样

i. 每个线程/进程有一个初始化，如果有10个进程就是0 ~9

ii. 每次cas值增加都是按进程数来加，step[0] += 10，因为每个进程的cas都不会一样

iii. 缺点是需要额外的初始化

场景：线程T1对key1、线程T2对Key1并发Get更新了Value值后想Set回去，可能会出现后一个操作覆盖前一个操作值，而且这个值是涉及到事务性的。正确是应该是T1 Set完后，T2才能取，串行化操作

=》CAS就是解决这个问题，如果发现cas值不一样了，就会Set失败，需要重取再设置（假定某时刻T1 的cas值为9，T2 的cas值为10。如果没有步进时，T1处理后cas值为10，T2再处理就认为没有改变过）

=》一般情况下都不需要这样的事务性，所以是弱线程安全的(很少概率出现，即使出现这样的冲突也不会造成很在原影响

2. 弱线程安全

a) 在多线程使用下( 为每个线程生成一个成员对象操作)。在很低的概率下，可能会出现两个不同的key，hash出同一个值，并同时操作同一个value的node位置

b) 操作系统对同一个内存位置的操作不保证是串行化的，

c) 概率计算：由于每一个桶的取模素数都不一样，所以大约出现冲突概率是1/N(默认桶长取为100w,所以出现概率为 1/100w，而我们的并发处理不可能达到这个数量级)

3. 强线程安全

a) 情景1：Write Lock

i. 为了达到强线程安全，引入加锁机制

ii. 每个Node增加一个字段char bLock;( 可设置flag的一个位),每次在写的时候都设置为1，写完后再设置为0

iii. 当另一个线程对这个Node处理里就需要先判断是否被锁

b) 情景2：Update Lock

处理的Scale

1. 在默认16级的情况下大约能处理1.2kw的用户量，如果想提高处理能力可以

a) 扩展每个桶的长度 100w->1000kw

b) 增加级数

c) 注意系统内存量是否足够：容量*Value len。单进程共享内在需要限制在2G以内

d) 分布式处理，考虑到单机单进程受限于内存大小，所以数据一致性尽量考虑由上层处理。如多机按取模处理，每个机器只负责部分号段，或者实现一致性的算法为减少扩容影响

库级支持定时遍历处理

有一种使用场景，需要遍历Hash表里的所有元素并进行相关的操作。基本实现就是传入一个预设参数的函数指针：

初始版本：

Foreach( void (*callback)( const unsigned long _key, valueType& _value) );

但是发现，只能处理全局的一些外部变量，无法处理指定类成员。所以添加一个外部传入的param指针，使得可以传入到回调函数里处理

第二版：

Foreach( void (*callback)( const unsigned long _key, valueType& _value, void* param_out ), void* param_in );

解决了参数问题，还有一个运行时问题：如果每次遍历触发的元素较多，可能一次过处理量会很大！这里需要使用者注意，应该多加一个上次检查时间来触发。

定时扫描器功能

TimerCallBack( fn, timer_id, set_timeout, set_excute_time_interval );

a) 设置一个为定时器设定的回调函数fn与处理id timer_id, 在每个set_timeout触发时遍布所有元素，为每个超过set_excute_time_interval的Node调用一下回调函数fn

b) 每个Node需要有一个dwLastCheckTs字段

后续功能构想

2. 支持共享内存/内存/内在映射文件

3. 动态自动扩展=》rehash( redis实现 )

4. 运维工具

a) 格式化输出指定key-Value/全输出

ChsDB(Cola’s hash shm DB )基于以上版本的扩展版( 未实现）

理念：数据先存共享内存再同步，实现一个轻量级高速key-value 数据库

1. 支持只读、只写模式（需要由提供的运维工具修改）

2. 支持定时数据快照snapshot(写本地文件)

3. 支持主-从（多从），主-主模式的数据同步

4. 库级支持定时遍历处理（回调）

错误处理

1．Share memory get failed

1) 检查系统共享内存量设置是否足够大( 减少级数或者

2) 是否有足够内存（级数*100w*sizeof(TemplateNode) ）

3) 是未初始成每个线程一个对象（使用全局对象来操作,确保只操加载到进程空间一次）

共享内存操作

Ipcs –l 列出所有共享内存相关的信息

ipcs-m 列出本机使用的共享内存列表

ipcrm –M id 删除一个共享内存块

使用说明(附源码)

1. 初始化

1）定义存储的数据结构

struct HASH_MSG_INFO

{

unsigned int dwUin;

unsigned int dwLastCheck;

unsigned char ucFlag;

char ucResvered[7];

};

2) 生成对象

CHashShm< HASH_MSG_INFO > ht(123);

if( !ht. IsInitOK() )

return false;

(1)默认16级冲突，可自定义级数: CHashShm< HASH_MSG_INFO,32 > ht( key_t(123));

(2)设置步进：CHashShm< HASH_MSG_INFO,32,1 > ht(key_t(123));用于多线/进程下弱冲突处理

(3)使用全局变量的方式，确保只操加载到进程空间一次

2. Set/Get/Replace/Update

Set:如果Key不存在，则插入hash表，如果存在就返回失败(支持key类型为unsigned long,char*+len,string)

HASH_MSG_INFO hMsg;

hMsg.dwUin = 12345;

hMsg.dwLastCheck= 12345;

if(ht.Set( 12345, hMsg ) != HASHSHM_OK )

{…}

Else{…}

Get

HASH_MSG_INFO hMsg;

if(ht.Get( 12345, hMsg ) != HASHSHM_OK )

Replace: 如果Key不存在，则插入hash表，如果存在就替换

Update：在编译时打开CAS开关，会做线程数据检查，需要先Get。如果数据已经被修改则update失败

注：由于不支持从value查到key,所以value里最好有一个相同的key字段。如果是char/string类型的key,还需要是定长，不能是STL对象

3. 设置进程退出是detach共享内存

ht. SetBehavior(HASH_ENABLE_DETACH )

4. 遍历

//遍历hash shm中的所有在线列表

g_ht.Foreach( ProcMsgCheck, this );

ProcMsgCheck 是回调函数

void ProcMsgCheck( unsigned long key, ST_ON_STATE& value,void* param )

注意 value 直接引用共享内在的位置！！

头文件：

#ifndef _COLALIANG_HASH_SHM_H_
#define _COLALIANG_HASH_SHM_H_

//-------------------------------------------
//Cola's Hash Shm Library 1.03
//colaliang( SNG Instant Messaging Application Department )
//last update: 2012-12-27

//Makefile
//	g++ -O3 -c hash_shm.cpp
// ar cq libchs.a hash_shm.o

//Enable extra function:
//1. CAS
// -DHASH_SHM_ENABLE_CAS

//2. TIMER_CALL_BACK
//-DHASH_SHM_ENABLE_TCB

//多级hash实现解释
//
//
//					i
//PRIMER_TABLE:	2	3	5	7	11	13	17	( bucket size, using primes )
//PRIMER_TABLE_TATAL:	0	2	5	10	17	28	41	( line address begin position )
//
//1. 冲突解决:通过检查 ( _key + lines*factor ) % PRIMER_TABLE[i]; 的位置是否已经设置, 否则检查下一个位置
//2. hash 多级取模, 通过计算素数来尽量散列
//3. lines 指定了最多的可能冲突次数
//4. maxline 确定共享内级的冲突最大级数
//5.初始化时需要指定 valueType, 元素固定长度, 不能为stl

//------------------------------------------

#include<iostream>
#include<cstdlib>
#include<cmath>
#include<sys/shm.h>

#include<fstream>
#include<vector>
#include<string>
using  std::cout;
using  std::cerr;
using  std::string;
using  std::vector;
using  std::ifstream;
using  std::ofstream;
using  std::ios;

extern const int PRIMER_TABLE_LEN;
extern const int PRIMER_TABLE[ ] ;

enum HASH_RETURN_CODE
{
    HASHSHM_OK = 0,
    HASHSHM_KEYEXIST,
    HASHSHM_ERROR,
    HASHSHM_NOTFOUND,
    HASHSHM_INSERTERROR,
    HASHSHM_OUTOFMEM,
    HASHSHM_UPDATE_ERROR,
};

enum HASH_STATUS
{
    HASH_STATUS_NORMAL			= 0x1,
    HASH_STATUS_WRITE_ONLY	= 0x2,
    HASH_STATUS_READ_ONLY		= 0x4
};

enum HASH_ENABLE_FLAG
{
    HASH_ENABLE_NONE = 0x0,
    HASH_ENABLE_DETACH = 0x1,		//detach from shm, when all process detach,the shm will release

};

//max conflict time, max items = MAX_LINES * BaseBucketLen ( default 100w )
const int MAX_LINES = 32;

//hash factor, use for better hash distrubition
const int FACTOR = 5381;

//use for thread magic id
const int THREAD_STEP = 1;

//string hash time33
unsigned long hash_time33(char const *str, int len =-1 );

template< typename valueType, int lines = MAX_LINES, int thread_step = THREAD_STEP >
class CHashShm
{
    public:

        CHashShm():bInitOk(false){};
        virtual ~CHashShm();

        //init with the share memory key,it will get share memory
        //if fail,exit
        bool Init( key_t shm_key );

    public:
        //set node into shm
        //	1) if the _key exists,return HASHSHM_KEYEXIST
        //	2) if set success,return HASHSHM_OK
        //	3) if fail return HASHSHM_ERROR
        int Set( const unsigned long _key ,const valueType &_value);
        int Set( const char* skey, const int len ,const valueType &_value );
        int Set( const string& strkey ,const valueType &_value );

        //get node from shm 
        int Get( const unsigned long _key,  valueType& _value );
        int Get( const char* skey, const int len, valueType& _value );
        int Get(  const string& strkey , valueType &_value );

        int Replace( const unsigned long _key ,const valueType& _value );
        int Replace( const char* skey, const int len,const valueType& _value );
        int Replace(  const string& strkey ,const valueType &_value );

        //if _key not in the table,return HASHSHM_NOTFOUND, else remove the node,set the node key 0 and return HASHSHM_OK
        int Remove( const unsigned long _key ); 
        int Remove( const char* skey, const int len );
        int Remove( const string& strkey );

        //callback function/param for execute, param_in will be pass to callback function as param_out
        void Foreach( void (*callback)( const unsigned long _key, valueType& _value, void* param_out ), void* param_in );

        //remove all the data
        void Clear(); 

        //operation enable behavior
        int SetBehavior( unsigned int iflag );
        int UnsetBehavior( unsigned int iflag );

        bool IsInitOK(){ return bInitOk; }
    public:
        bool BinDump( char* filename = "./chsbin" );
        bool BinRestore( char* filename = "./chsbin" );

    public:
        //the rate of the space used
        double GetFullRate() const;

        //the bucket size( begin 0 )
        void GetBucketSize( unsigned int index ) const ;

        //get one bucket's item count
        int GetBucketUseSize( unsigned int index ) const ;

        unsigned long GetCurSize() { return m_hashHead->currentSize; }       

        unsigned long GetSize() { return maxSize; };

    private:
        //the start position of the share memory
        //  1) the begin mem space used to storage the runtime data, reserved 16 byte
        //  2) currentSize = (unsigned long *)((long)mem)
        void *mem;

        //current size of the table ,the pointer of the shm begin 
        struct hash_head{
            unsigned long currentSize;  
            unsigned long status;  
            unsigned long reservered2;  
            unsigned long reservered3;  
        };
        hash_head * m_hashHead;

        //the size of the share memory
        unsigned long memSize;   

        //PRIMER_TABLE_TATAL[i] is the summary of the PRIMER_TABLE when x<=i 
        unsigned long PRIMER_TABLE_TATAL[lines];    

        //the size of the table
        unsigned long maxSize;        

        //write by the find function,record the last find place
        void *lastFound;        

        //Init flag
        bool bInitOk;

        //enable operation flag
        unsigned int flag;

        //the node of the hash table
        //	1) when key==0,the node is empty
        //	2) name-value pair
        struct hash_node{        
            unsigned long key;
            valueType value;    
        };

    private:
        //if _key in the table,return HASHSHM_OK,and set lastFound the position,otherwise return HASHSHM_NOTFOUND
        int find( const unsigned long _key );   

        //get share memory,used by the constructor
        bool getShm( key_t shm_key );    

        //get the positon with the (row,col), map to line pos
        void *getPos( const unsigned int _row,  const unsigned long _col )
        {
            //calculate the positon from the start
            unsigned long pos =  PRIMER_TABLE_TATAL[_row] + _col;

            if ( pos >= maxSize + sizeof(hash_head))
                return NULL;

            return (void *)((long)mem+ sizeof(hash_head) + pos*sizeof(hash_node));
        }

};

template< typename vT, int lines, int thread_step >
bool CHashShm<vT,lines,thread_step>::Init( key_t shm_key )
{   
    if( lines > PRIMER_TABLE_LEN )
       return false; 

    //constructor with get share memory
    maxSize=0;

    int i;
    for(i=0;i<lines;i++)
    {    
        //caculate the PRIMER_TABLE_TATAL
        maxSize+=PRIMER_TABLE[i];
        if(i!=0)
            PRIMER_TABLE_TATAL[i] = PRIMER_TABLE_TATAL[i-1]+PRIMER_TABLE[i-1];
        else 
            PRIMER_TABLE_TATAL[i]=0;   
    }

    //extra 16byte for use
    memSize=sizeof(hash_node)*maxSize + sizeof(hash_head);     
    if(!getShm( shm_key ))
        bInitOk = false;
    else
    {

        m_hashHead = (hash_head*)((long)mem );
        m_hashHead->currentSize = 0;

        //initialize as normal
        m_hashHead->status = HASH_STATUS_NORMAL;

        //init operation enable to default none
        flag = HASH_ENABLE_NONE;

        bInitOk = true;
    }

    return bInitOk;
}    

template< typename vT, int lines, int thread_step >
CHashShm<vT,lines,thread_step>::~CHashShm()
{
    //detach from share mem if HASH_ENABLE_DETACH setd. Because 
    if( flag& HASH_ENABLE_DETACH )
        shmdt( mem );
}

template< typename vT, int lines, int thread_step >
void CHashShm<vT,lines,thread_step>::Clear()
{
    memset(mem,0,memSize);
    m_hashHead->currentSize=0;
}

template< typename vT, int lines, int thread_step >
bool CHashShm<vT,lines,thread_step>::getShm( key_t shm_key )
{
    int shm_id=shmget(shm_key,memSize,0666);

    //check if the shm exists
    if( shm_id==-1 )   
    {
        //create the shm
        shm_id=shmget(shm_key,memSize,0666|IPC_CREAT);
        if(shm_id==-1){
            cerr<<"Share memory get failed\n";
            return false;
        }

        //create the shm
        mem=shmat(shm_id,NULL,0);

        memset(mem,0,memSize);

        if(int(mem)==-1){
            cerr<<"shmat system call failed\n";
            return false;
        }
    }
    else
    {
        //exist, point to the shm
        mem=shmat(shm_id,NULL,0);

        if(int(mem)==-1){
            cerr<<"shmat system call failed\n";
            return false;
        }
    }

    return true;
}

template< typename vT, int lines, int thread_step >
int CHashShm<vT,lines,thread_step>::find( const unsigned long _key)
{
    unsigned long hash;
    hash_node *pH=NULL;
    for(int i=0;i<lines;i++)
    {
        //calculate the col position
        hash = ( _key + lines * FACTOR ) % PRIMER_TABLE[i];   
        pH = ( hash_node *)getPos( i, hash );

        //position exceed the shm size, just break
        if( NULL == pH )
            break;

        if( pH->key == _key )
        {
            lastFound=pH;
            return HASHSHM_OK;
        }
    }

    return HASHSHM_NOTFOUND;
}

template< typename vT, int lines, int thread_step >
int CHashShm<vT,lines,thread_step>::Set( const unsigned long _key,const vT&_value)
{
    //if the key exists
    if( find(_key )== HASHSHM_OK )
        return HASHSHM_KEYEXIST;

    unsigned long hash;
    hash_node *pH=NULL;

    for(int i=0;i<lines;i++)
    {    
        //minimize conflict using primes
        // 1) firs hash pos( calculate the col position )
        hash=( _key + lines * FACTOR ) % PRIMER_TABLE[i];

        // 2) second hash pos( row, col )
        pH=(hash_node *)getPos( i,hash );

        // insert position exceed the shm size
        if( NULL == pH )
            return HASHSHM_OUTOFMEM;

        //find the insert position,insert the value
        if( pH->key== 0 )
        {      

            pH->key = _key;
            pH->value = _value;

            m_hashHead->currentSize++;

            return HASHSHM_OK;
        }
    }

    //all the appropriate position filled
    return HASHSHM_ERROR;   
}

template< typename vT, int lines, int thread_step >
int CHashShm<vT,lines,thread_step>::Set( const char* skey, const int len, const vT &_value )
{
    unsigned long ulHashKey = hash_time33( skey, len );
    return Set( ulHashKey, _value );

}

template< typename vT, int lines, int thread_step >
int CHashShm<vT,lines,thread_step>::Set( const string& strkey, const vT &_value )
{
    unsigned long ulHashKey = hash_time33( strkey.data(), strkey.size() );
    return Set( ulHashKey, _value );

}

template< typename vT, int lines, int thread_step >
int CHashShm<vT,lines,thread_step>::Get( const unsigned long _key, vT& _value )
{
    if( find( _key ) != HASHSHM_OK )
        return HASHSHM_NOTFOUND;

    //memset( &_value, &((hash_node*)lastFound)->value, sizeof(_value) );
    //Do I need memset?( c++' bitwise copy )
    _value = ((hash_node*)lastFound)->value;

    return HASHSHM_OK;
}

template< typename vT, int lines, int thread_step >
int CHashShm<vT,lines,thread_step>::Get( const char* skey, const int len, vT&_value )
{
    unsigned long ulHashKey = hash_time33( skey, len );
    return Get( ulHashKey, _value );

}

template< typename vT, int lines, int thread_step >
int CHashShm<vT,lines,thread_step>::Get( const string& strkey, vT&_value )
{
    unsigned long ulHashKey = hash_time33( strkey.data(), strkey.size() );
    return Get( ulHashKey, _value );

}

template< typename vT, int lines, int thread_step >
int CHashShm<vT,lines,thread_step>::Replace( const unsigned long _key,const vT &_value)
{
    //if the key exists, replace the value
    if( find(_key )== HASHSHM_OK )
    {
        ((hash_node*)lastFound)->value = _value;
        return HASHSHM_OK;
    }

    //if not found, the find a hash place
    unsigned long hash;
    hash_node *pH=NULL;

    for(int i=0;i<lines;i++)
    {    
        //minimize conflict using primes
        // 1) firs hash pos( calculate the col position )
        hash=( _key + lines * FACTOR ) % PRIMER_TABLE[i];

        // 2) second hash pos( row, col )
        pH=(hash_node *)getPos( i,hash );

        // insert position exceed the shm size
        if( NULL == pH )
            return HASHSHM_OUTOFMEM;

        //find the insert position,insert the value
        if( pH->key== 0 )
        {       
            pH->key = _key;
            pH->value = _value;
            m_hashHead->currentSize++;
            return HASHSHM_OK;
        }
    }

    //all the appropriate position filled
    return HASHSHM_ERROR;   
}

template< typename vT, int lines, int thread_step >
int CHashShm<vT,lines,thread_step>::Replace( const char* skey, const int len, const vT&_value )
{
    unsigned long ulHashKey = hash_time33( skey, len );
    return Replace( ulHashKey, _value );

}

template< typename vT, int lines, int thread_step >
int CHashShm<vT,lines,thread_step>::Replace( const string& strkey, const vT&_value )
{
    unsigned long ulHashKey = hash_time33( strkey.data(), strkey.size() );
    return Replace( ulHashKey, _value );

}

template< typename vT, int lines, int thread_step >
int CHashShm<vT,lines,thread_step>::Remove( const unsigned long _key)
{
    //not found
    if( find(_key) != HASHSHM_OK )
        return HASHSHM_NOTFOUND;

    hash_node *pH=(hash_node *)lastFound;

    //only set the key 0
    pH->key=0; 
    m_hashHead->currentSize--;

    return HASHSHM_OK;
}

template< typename vT, int lines, int thread_step >
int CHashShm<vT,lines,thread_step>::Remove( const char* skey, const int len )
{
    unsigned long ulHashKey = hash_time33( skey, len );
    return Remove( ulHashKey );

}

template< typename vT, int lines, int thread_step >
int CHashShm<vT,lines,thread_step>::Remove( const string& strkey )
{
    unsigned long ulHashKey = hash_time33( strkey.data(), strkey.size() );
    return Remove( ulHashKey );

}

template< typename vT, int lines, int thread_step >
void CHashShm<vT,lines,thread_step>::Foreach(void (*callback)( const unsigned long _key,vT &_value,void * param_out ), void* param_in )
{
    typedef  unsigned long u_long;
    u_long beg=(u_long)mem + sizeof(hash_head);
    u_long end=(u_long)mem+ sizeof(hash_head) + sizeof(hash_node)*(PRIMER_TABLE[lines-1]+PRIMER_TABLE_TATAL[lines-1]);

    hash_node *p=NULL;
    for(u_long pos=beg;pos<end;pos+=sizeof(hash_node))
    {
        //directly referece the actual memory place, so value can be modify outside directly
        p=(hash_node *)pos;
        if(p->key!=0)
            callback( p->key,p->value, param_in );
    }
}

template< typename vT, int lines, int thread_step >
bool CHashShm<vT,lines,thread_step>::BinDump( char* filename)
{

    ofstream os( filename, ios::out | ios::binary );
    if( !os )
        return false;

    os.write( (char*)mem, memSize ); 
    os.close();

    return true;
}

    template< typename vT, int lines, int thread_step >
bool CHashShm<vT,lines,thread_step>::BinRestore( char* filename)
{

    ifstream ios( filename, ios::binary );
    if( !ios )
        return false;

    // get length of file:
    ios.seekg (0, ios::end);
    unsigned long   file_length = ios.tellg();
    ios.seekg (0, ios::beg);

    if( file_length != memSize )
        return false;

    ios.read( (char*)mem, memSize ); 

    ios.close();
    return true;
}

//the rate of the space used
template< typename vT, int lines, int thread_step >
double CHashShm<vT,lines,thread_step>::GetFullRate() const
{ 
    return double( m_hashHead->currentSize )/maxSize;
};

//the bucket size( begin 0 )
template< typename vT, int lines, int thread_step >
void CHashShm<vT,lines,thread_step>::GetBucketSize( unsigned int index ) const
{ 
    return index>=lines?0:PRIMER_TABLE[index]; 
} ;

//get one bucket's item count
template< typename vT, int lines, int thread_step >
int CHashShm<vT,lines,thread_step>::GetBucketUseSize( unsigned int index )  const
{
    if(  index>=lines )
        return 0;

    int sum = 0;
    hash_node * pNode = (hash_node *)((long)mem+ sizeof(hash_head) + index*sizeof(hash_node));

    for( int i=0; i<PRIMER_TABLE[index]; i++ )
    {
        if( pNode->key != 0 )
            ++sum;
    }

    return sum;
}

//set operation enable flag
    template< typename vT, int lines, int thread_step >
int CHashShm<vT,lines,thread_step>::SetBehavior( unsigned int iflag ) 
{
    flag |= iflag;
    return flag;
}

    template< typename vT, int lines, int thread_step >
int CHashShm<vT,lines,thread_step>::UnsetBehavior( unsigned int iflag ) 
{
    flag &= ~iflag;
    return flag;
}

#endif

Cpp文件：

//-------------------------------------------
//Cola's Hash Shm Library 1.03
//colaliang( SNG Instant Messaging Application Department )
//last update: 2012-12-27

#include "hash_shm.h"


const int PRIMER_TABLE_LEN = 64;
const int PRIMER_TABLE[ PRIMER_TABLE_LEN ] = { 
    1511,1523,1531,1543,1549,1553,1559,1567,1571,1579,1583,1597,1601,1607,1609,1613,1619,1621,1627,1637,1657,1663,1667,1669,1693,
	1697,1699,1709,1721,1723,1733,1741,1747,1753,1759,1777,1783,1787,1789,1801,1811,1823,1831,1847,1861,1867,1871,1873,1877,1879,
	1889,1901,1907,1913,1931,1933,1949,1951,1973,1979,1987,1993,1997,1999
};

unsigned long hash_time33(char const *str, int len  ) 
{ 
	//get from php
	unsigned long hash = 5381; 
	
	//variant with the hash unrolled eight times
	// if len not specify, use default time33
	char const *p = str; 
	if( len < 0 )
	{ 
		for(; *p; p++) 
		{ 
			hash = hash * 33 + *p; 
		}
	
		return hash; 
	}

#define TIME33_HASH_MIXED_CH() hash = ((hash<<5)+hash) + *p++
	//use eighe alignment
	for (; len >= 8; len -= 8) 
	{ 
		TIME33_HASH_MIXED_CH();	// 1
		TIME33_HASH_MIXED_CH(); // 2 
		TIME33_HASH_MIXED_CH();	// 3 
		TIME33_HASH_MIXED_CH(); // 4 
		TIME33_HASH_MIXED_CH(); // 5 
		TIME33_HASH_MIXED_CH(); // 6 
		TIME33_HASH_MIXED_CH(); // 7 
		TIME33_HASH_MIXED_CH(); // 8 
	} 
	switch (len) 
	{ 
		case 7: TIME33_HASH_MIXED_CH();
		case 6: TIME33_HASH_MIXED_CH();
		case 5: TIME33_HASH_MIXED_CH();
		case 4: TIME33_HASH_MIXED_CH();
		case 3: TIME33_HASH_MIXED_CH();
		case 2: TIME33_HASH_MIXED_CH();
		case 1: TIME33_HASH_MIXED_CH(); break; 
		case 0: break; 
	} 
	
	return hash; 
}

小雄哥

关注

1
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
基于共享内存多级hash设计

Feature list:1. 支持Set/Get/Replace/Update, 优化编译下默认16级冲突下单进程下Set/Get可达170w次/s。（理论上只要不达到内容的带宽限制，而且是多核的机器下性能是随线进程数陪增）2. 支持多程下操作(可预见低的出错概率)3. 支持CAS弱一致性检测(需要编译时开启)4. 支持二进制文件导
复制链接

扫一扫