Atom原子C语言实现

最新推荐文章于 2024-06-15 09:39:00 发布

chenxugl

最新推荐文章于 2024-06-15 09:39:00 发布

阅读量2.8k

点赞数

分类专栏：源码实现文章标签：语言 c struct 数据结构 table buffer

本文链接：https://blog.csdn.net/chenxugl/article/details/7026403

版权

源码实现专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Atom原子C语言实现

在《C语言接口与实现》中，原子（atom）是这样定义的：它是一个指针，指向了一个唯一的、不可变得序列，序列中包含零或多个字节（字节值任意）。任一原子都只会出现一次。如果两个原子指向相同的位置，那么二者是相同的。优点是节省空间，因为任一序列都只会出现一次。

按照该书的要求，实现了四个接口：

int AtomInit(void);
char *AtomNew(const char *str,size_t len);
char *AtomString(const char *str);
size_t AtomLength(const char *str);

AtomInit是定义了一个全局变量g_ath，它指向了一个哈希表，当然该哈希为链式哈希，解决的冲突方法为拉链法。AtomNew函数是根据哈希值在哈希表中找到对应的桶，然后在该桶上添加原子项。AtomString只是调用了AtomNew函数而已，而AtomLength函数返回的是变量str的长度。

基本的数据结构如下：

#define MAX_ATOM_TABLE 1

struct Atom
{
   struct Atom *next;
   int len;
   char *str;
};

struct AtomTableHead
{
   int freenode;
   size_t st;
   struct Atom *buffer;
}g_ath;

总体设计图如下：

1 初始化

int atomInit()
{
   g_ath.freenode= 0;
   g_ath.st= MAX_ATOM_TABLE;
   g_ath.buffer= (struct Atom *)malloc(sizeof(struct Atom)*g_ath.st);
   if(!g_ath.buffer)
      return-1;
   memset(g_ath.buffer,0,g_ath.st*sizeof(structAtom));
   return0;
}

2 哈希函数

为保证原子能均匀的分别在桶中，选择HASH函数非常有讲究的，通常情况下根据经验来处理哈希值。本文选用下面函数：

unsigned long hash(const char *name,size_t len)
{
	unsigned long h=(unsigned long)len;
	size_t step = (len>>5)+1;
	size_t i;
	for (i=len; i>=step; i-=step)
	    h = h ^ ((h<<5)+(h>>2)+(unsigned long)name[i-1]);
	return h;
}

3 添加原子

char *atomNew(const char *str,size_t len)
{
   unsigned long k = hash(str,len);
   struct Atom *temp,*head;
   temp =head = g_ath.buffer + (k & (g_ath.st-1));
   for(;temp;temp= temp->next)
   {
      if(temp->len== 0)
      {
         structAtom *slot = temp;
         temp= temp->next;
         while(temp)
         {
            if(temp->len== len)
            {
                if(!memcmp(temp->str,str,len))
                  returntemp->str;
            }
            temp= temp->next;
         }
         slot->len= len;
         slot->str= (char *)(slot - offsetof(struct Atom,str));
         memcpy(slot->str,str,len);
         slot->str[len]= '\0';

         returnslot->str;
      }
      if(temp->len== len)
      {
         if(!memcmp(temp->str,str,len))
            returntemp->str;
      }
   }
   size_t flen=len+sizeof(struct Atom)+1;
   struct Atom *newSlot = (struct Atom *)malloc(flen);
   memset(newSlot,0,sizeof(structAtom)+len+1);
   newSlot->len= len;
   size_t sz = offsetof(struct Atom,str);
   newSlot->str= (char *)(newSlot - sz );   //大端小端问题
   memcpy(newSlot->str,str,len);
   newSlot->str[len]= '\0';
   structAtom *tmp = head->next;
   head->next= newSlot;
   newSlot->next= tmp;
   returnnewSlot->str;
}

考虑到链式管理，插入原子时是选择查在第一个原子的后面，如下图所示：

4 得到原子的长度

size_t atomLength(const char *str)
{
   size_t len = strlen(str);
   size_t k= hash(str,len);
   struct Atom *temp = g_ath.buffer + (k & (g_ath.st-1));
   for(;temp;temp=temp->next)
   {
      if(temp->len== len)
      {
         if(!memcmp(temp->str,str,len))
            returntemp->len;
      }
   }
   return 0;
}

在开发中，原子是非常有用的，通常情况下，如果使用任意字节的序列作为索引（而不使用整数），那么可以将原子用作键。

chenxugl

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
1
评论
Atom原子C语言实现

Atom原子C语言实现在《C语言接口与实现》中，原子（atom）是这样定义的：它是一个指针，指向了一个唯一的、不可变得序列，序列中包含零或多个字节（字节值任意）。任一原子都只会出现一次。如果两个原子指向相同的位置，那么二者是相同的。优点是节省空间，因为任一序列都只会出现一次。按照该书的要求，实现了四个接口：int AtomInit(void);char
复制链接

扫一扫