redis 从0到1完整学习（六）：Hash 表数据结构

发布时间：2023年12月22日

文章目录

1. 引言
2. redis 源码下载
3. dict 数据结构
4. 哈希表扩容与 rehash
5. 参考

1. 引言

前情提要：
《redis 从0到1完整学习（一）：安装&初识 redis》
《redis 从0到1完整学习（二）：redis 常用命令》
《redis 从0到1完整学习（三）：redis 数据结构》
《redis 从0到1完整学习（四）：字符串 SDS 数据结构》
《redis 从0到1完整学习（五）：集合 IntSet 数据结构》
本文主要结合源码来介绍 hash 表的数据结构

2. redis 源码下载

Redis 源码可以点击这里下载，方便查看其中定义的一些数据结构。
在这里插入图片描述

3. dict 数据结构

Dict 由三部分组成，分别是：哈希表（DictHashTable）、哈希节点（DictEntry）、字典（Dict），数据结构如下：
在这里插入图片描述
dict、dictht、dictEntry 三者的数据结构关系如下：

当 Dict 添加键值对时，首先由 key 计算出 hash 值 h，通过 h & sizemask （等同取模计算，提升计算速度）计算元素对应数组中的索引位置。假设哈希值 h = 5，则 5&7=5，因此键值对存储到数组索引为5的位置。

如下图：第一次插入到下标为5的数组中。
在这里插入图片描述

如果第二次插入的 hash 值计算后的下标也是5，则第二次插入到链表的头部：
在这里插入图片描述

4. 哈希表扩容与 rehash

当哈希表中元素越来越多导致哈希冲突增多时，链表过长后，会查询效率降低，由查询的时间复杂度最开始的 O(1) 向 O(n) 移动。

这部分源码在 Redis 的好几个版本都有所变化，主要是看看扩容的条件：
（1）Redis 6.0
这里是直接判断 ht->used == ht->size。

/* Expand the hash table if needed */
static int _dictExpandIfNeeded(dict *ht) {
    /* If the hash table is empty expand it to the initial size,
     * if the table is "full" dobule its size. */
    if (ht->size == 0)
        return dictExpand(ht, DICT_HT_INITIAL_SIZE);
    if (ht->used == ht->size)
        return dictExpand(ht, ht->size*2);
    return DICT_OK;
}

（2）Redis 6.2
引入哈希表的负载因子：LoadFactor = used/size。在每次新增键值对时都会检查负载因子。

/* Expand the hash table if needed */
static int _dictExpandIfNeeded(dict *d)
{
    // 已经在 rehash， 则返回
    if (dictIsRehashing(d)) return DICT_OK;

    // 如果为空，则初始化 size 为4
    if (d->ht[0].size == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE);

    // 如果负载因子 > dict_force_resize_ratio(定义为5)，则扩容
    if (d->ht[0].used >= d->ht[0].size &&
        (dict_can_resize ||
         d->ht[0].used/d->ht[0].size > dict_force_resize_ratio) &&
        dictTypeExpandAllowed(d))
    {
        return dictExpand(d, d->ht[0].used + 1);
    }
    return DICT_OK;
}

（3）Redis 7.2

/* Expand the hash table if needed */
static int _dictExpandIfNeeded(dict *d)
{
    /* Incremental rehashing already in progress. Return. */
    if (dictIsRehashing(d)) return DICT_OK;

    /* If the hash table is empty expand it to the initial size. */
    if (DICTHT_SIZE(d->ht_size_exp[0]) == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE);

    /* If we reached the 1:1 ratio, and we are allowed to resize the hash
     * table (global setting) or we should avoid it but the ratio between
     * elements/buckets is over the "safe" threshold, we resize doubling
     * the number of buckets. */
    if (!dictTypeExpandAllowed(d))
        return DICT_OK;
    if ((dict_can_resize == DICT_RESIZE_ENABLE &&
         d->ht_used[0] >= DICTHT_SIZE(d->ht_size_exp[0])) ||
        (dict_can_resize != DICT_RESIZE_FORBID &&
         d->ht_used[0] / DICTHT_SIZE(d->ht_size_exp[0]) > dict_force_resize_ratio))
    {
        return dictExpand(d, d->ht_used[0] + 1);
    }
    return DICT_OK;
}

下面以 Redis 6.2 源码介绍下扩容的核心方法_dictExpand：

/* Expand or create the hash table,
 * when malloc_failed is non-NULL, it'll avoid panic if malloc fails (in which case it'll be set to 1).
 * Returns DICT_OK if expand was performed, and DICT_ERR if skipped. */
int _dictExpand(dict *d, unsigned long size, int* malloc_failed)
{
    if (malloc_failed) *malloc_failed = 0;
    // 如果当前 size 大于要申请的 size，或者正在 rehash，则报错
    if (dictIsRehashing(d) || d->ht[0].used > size)
        return DICT_ERR;
    dictht n; /* the new hash table */
    // 初始化第一个大于等于 size 的 2^n 数，这个数赋值为 realsize，但是不会低于4
    unsigned long realsize = _dictNextPower(size);
    ...
	
    // 重置 hash 表的大小和掩码，并且分配新内存
    n.size = realsize;
    n.sizemask = realsize-1;
    if (malloc_failed) {
        n.table = ztrycalloc(realsize*sizeof(dictEntry*));
        *malloc_failed = n.table == NULL;
        if (*malloc_failed)
            return DICT_ERR;
    } else
        n.table = zcalloc(realsize*sizeof(dictEntry*));

    n.used = 0;

    // 第一次初始化，则直接返回
    if (d->ht[0].table == NULL) {
        d->ht[0] = n;
        return DICT_OK;
    }
	// 如果不是第一次初始化，说明是扩容，需要 rehash，将 rehashidx 置为0，在后续增删改，会触发 rehash
    d->ht[1] = n;
    d->rehashidx = 0;
    return DICT_OK;
}

上面代码的最后只是说明了要进行 rehash 操作，在 rehash 过程中：

每次增、删、改、查，都会把 dict.ht[0].table[rehashidx] 的值 rehash 到 dict.ht[1] 中，同时 rehashidx++，这样渐进式地 rehash，防止 rehash 阻塞主进程太久，影响效率。
新增操作，直接写入ht[1]
删、改、查会在 dict.ht[0]，dict.ht[1] 依次查找。

5. 参考

《redis 从0到1完整学习（一）：安装&初识 redis》
《redis 从0到1完整学习（二）：redis 常用命令》
《redis 从0到1完整学习（三）：redis 数据结构》
《redis 从0到1完整学习（四）：字符串 SDS 数据结构》
《redis 从0到1完整学习（五）：集合 IntSet 数据结构》

文章来源:https://blog.csdn.net/qq_36803941/article/details/135145078
本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若内容造成侵权/违法违规/事实不符，请联系我的编程经验分享网邮箱：chenni525@qq.com进行投诉反馈，一经查实，立即删除！

redis 从0到1完整学习 （六）：Hash 表数据结构