Memcached开发（五）：常见数据结构

concisedistinct

于 2024-07-18 09:52:50 发布

阅读量1.2k

点赞数 18

分类专栏： Memcached 文章标签： memcached 数据结构缓存分布式

本文链接：https://blog.csdn.net/concisedistinct/article/details/140513595

版权

Memcached 专栏收录该内容

19 篇文章 0 订阅

订阅专栏

4. 链地址法（Separate Chaining）

4.1 原理

4.2 实现

5. 双向链表（Doubly Linked List）

5.1 定义和原理

5.2 在Memcached中的应用

6. LRU（Least Recently Used）淘汰算法

Memcached 是一种高性能的分布式内存对象缓存系统，用于动态Web应用以减轻数据库负载。它通过将数据存储在内存中，从而减少了数据库访问的次数，加快了数据读取速度。理解Memcached中的常见数据结构对于优化应用性能和有效利用缓存资源至关重要。

1. 概述

在Memcached中，数据的存储和管理依赖于特定的数据结构。这些数据结构决定了如何存储、检索和管理缓存中的数据。通过合理地选择和使用这些数据结构，可以大幅提升系统的性能和可靠性。

2. 数据结构概览

Memcached主要使用以下几种数据结构来管理和存储数据：

哈希表（Hash Table）
链地址法（Separate Chaining）
双向链表（Doubly Linked List）
LRU（Least Recently Used）淘汰算法

3. 哈希表（Hash Table）

3.1 定义和原理

哈希表是一种通过键值对（key-value pair）来存储数据的数据结构。它使用一个哈希函数将键映射到一个索引，从而在数组中定位数据。哈希表的主要优势在于其快速的数据存取能力。

3.2 哈希函数

哈希函数是哈希表的核心。它接收一个键，并生成一个数组中的索引位置。理想的哈希函数能够将键均匀地分布在哈希表中，避免冲突。

unsigned int hash(const char *key) {
    unsigned int hash = 0;
    while (*key) {
        hash = (hash << 5) + *key++;
    }
    return hash;
}

3.3 冲突处理

由于哈希表的大小有限，不同的键可能会映射到相同的索引位置，导致冲突。Memcached采用链地址法来解决这个问题。

4. 链地址法（Separate Chaining）

4.1 原理

链地址法是一种处理哈希表冲突的方法。在这种方法中，每个哈希表的槽都包含一个指向链表的指针，所有映射到同一槽的键值对都被存储在这个链表中。

4.2 实现

当发生冲突时，新键值对将被插入到链表的头部或尾部。检索时，需要遍历链表找到对应的键。

typedef struct entry {
    char *key;
    char *value;
    struct entry *next;
} entry;

entry *hash_table[HASH_TABLE_SIZE];

void insert(const char *key, const char *value) {
    unsigned int index = hash(key) % HASH_TABLE_SIZE;
    entry *new_entry = malloc(sizeof(entry));
    new_entry->key = strdup(key);
    new_entry->value = strdup(value);
    new_entry->next = hash_table[index];
    hash_table[index] = new_entry;
}

char *search(const char *key) {
    unsigned int index = hash(key) % HASH_TABLE_SIZE;
    entry *current = hash_table[index];
    while (current) {
        if (strcmp(current->key, key) == 0) {
            return current->value;
        }
        current = current->next;
    }
    return NULL;
}

5. 双向链表（Doubly Linked List）

5.1 定义和原理

双向链表是一种链式数据结构，其中每个节点包含三个部分：数据、指向前一个节点的指针和指向后一个节点的指针。双向链表允许从任意节点高效地插入、删除和遍历。

5.2 在Memcached中的应用

Memcached使用双向链表来维护LRU淘汰策略中的缓存项。每当一个缓存项被访问时，它会被移动到链表的头部，从而确保最近访问的项保持在链表的前端。

typedef struct node {
    char *key;
    char *value;
    struct node *prev;
    struct node *next;
} node;

node *head = NULL;
node *tail = NULL;

void move_to_head(node *n) {
    if (n == head) return;

    if (n->prev) n->prev->next = n->next;
    if (n->next) n->next->prev = n->prev;
    if (n == tail) tail = n->prev;

    n->next = head;
    n->prev = NULL;
    if (head) head->prev = n;
    head = n;
    if (tail == NULL) tail = head;
}

node *create_node(const char *key, const char *value) {
    node *n = malloc(sizeof(node));
    n->key = strdup(key);
    n->value = strdup(value);
    n->prev = n->next = NULL;
    return n;
}

6. LRU（Least Recently Used）淘汰算法

6.1 原理

LRU是一种常见的缓存淘汰策略，用于在内存不足时移除最久未使用的缓存项。LRU基于以下两个操作：

每当访问一个缓存项时，将其移动到链表头部。
当缓存满时，移除链表尾部的缓存项。

6.2 实现

结合双向链表和哈希表，Memcached实现了高效的LRU淘汰机制。

typedef struct cache_entry {
    char *key;
    char *value;
    struct cache_entry *prev;
    struct cache_entry *next;
} cache_entry;

cache_entry *cache_head = NULL;
cache_entry *cache_tail = NULL;
cache_entry *cache_hash_table[HASH_TABLE_SIZE];

void add_to_cache(const char *key, const char *value) {
    unsigned int index = hash(key) % HASH_TABLE_SIZE;
    cache_entry *entry = cache_hash_table[index];
    
    while (entry) {
        if (strcmp(entry->key, key) == 0) {
            move_to_head(entry);
            return;
        }
        entry = entry->next;
    }

    cache_entry *new_entry = create_node(key, value);
    new_entry->next = cache_hash_table[index];
    if (cache_hash_table[index]) cache_hash_table[index]->prev = new_entry;
    cache_hash_table[index] = new_entry;

    move_to_head(new_entry);

    if (cache_size >= CACHE_LIMIT) {
        cache_entry *old_entry = cache_tail;
        if (old_entry->prev) old_entry->prev->next = NULL;
        cache_tail = old_entry->prev;
        free(old_entry->key);
        free(old_entry->value);
        free(old_entry);
    } else {
        cache_size++;
    }
}

7. 数据存储与检索的实现

7.1 存储数据

数据存储是Memcached的核心功能之一。存储数据时，需要选择合适的键值对，并将其存储在哈希表中。如果哈希表的某个槽中已经存在数据，则需要处理冲突。

void set(const char *key, const char *value) {
    unsigned int index = hash(key) % HASH_TABLE_SIZE;
    cache_entry *entry = cache_hash_table[index];
    
    while (entry) {
        if (strcmp(entry->key, key) == 0) {
            free(entry->value);
            entry->value = strdup(value);
            move_to_head(entry);
            return;
        }
        entry = entry->next;
    }

    cache_entry *new_entry = create_node(key, value);
    new_entry->next = cache_hash_table[index];
    if (cache_hash_table[index]) cache_hash_table[index]->prev = new_entry;
    cache_hash_table[index] = new_entry;

    move_to_head(new_entry);

    if (cache_size >= CACHE_LIMIT) {
        cache_entry *old_entry = cache_tail;
        if (old_entry->prev) old_entry->prev->next = NULL;
        cache_tail = old_entry->prev;
        free(old_entry->key);
        free(old_entry->value);
        free(old_entry);
    } else {
        cache_size++;
    }
}

7.2 检索数据

检索数据时，需要根据键值在哈希表中查找对应的槽，并遍历链表找到对应的缓存项。如果找到缓存项，需要将其移动到链表头部，以保持LRU策略的有效性。

char *get(const char *key) {
    unsigned int index = hash(key) % HASH_TABLE_SIZE;
    cache_entry *entry = cache_hash_table[index];
    
    while (entry) {
        if (strcmp(entry->key, key) == 0) {
            move_to_head(entry);
            return entry->value;
        }
        entry = entry->next;
    }
    return NULL;
}

8. 数据结构的选择和优化

在实际应用中，选择合适的数据结构对于优化Memcached的性能至关重要。以下是一些优化建议：

8.1 哈希函数优化

选择一个合适的哈希函数可以减少冲突，提高查找效率。理想的哈希函数应具有良好的均匀性和低碰撞率。

8.2 链表优化

在处理链表时，可以选择双向链表以提高插入和删除的效率。对于高度动态的数据，双向链表比单向链表更具优势。

8.3 内存管理

合理的内存管理可以提高系统的稳定性和性能。在Memcached中，可以通过设置合适的内存分配策略和淘汰策略来优化内存使用。

void free_entry(cache_entry *entry) {
    free(entry->key);
    free(entry->value);
    free(entry);
}

void evict_entry() {
    if (cache_tail) {
        cache_entry *old_entry = cache_tail;
        if (old_entry->prev) old_entry->prev->next = NULL;
        cache_tail = old_entry->prev;
        free_entry(old_entry);
        cache_size--;
    }
}