哈希表是如何工作的?

我正在寻找一个关于哈希表如何工作的解释-用简单的英语为我这样的傻瓜!

例如，我知道它接受键，计算哈希(我正在寻找一个解释)，然后执行某种模运算来计算出它在存储值的数组中的位置，但我的知识到此为止。

谁能解释一下过程吗?

编辑:我并不是特别问哈希码是如何计算的，而是对哈希表如何工作的一般概述。

当前回答

简短而甜蜜:

哈希表封装了一个数组，我们称之为internalArray。将项以如下方式插入数组:

let insert key value =
    internalArray[hash(key) % internalArray.Length] <- (key, value)
    //oversimplified for educational purposes

有时两个键会散列到数组中的同一个索引，而您希望保留这两个值。我喜欢把两个值都存储在同一个索引中，通过将internalArray作为一个链表数组来编码很简单:

let insert key value =
    internalArray[hash(key) % internalArray.Length].AddLast(key, value)

所以，如果我想从哈希表中检索一个项，我可以这样写:

let get key =
    let linkedList = internalArray[hash(key) % internalArray.Length]
    for (testKey, value) in linkedList
        if (testKey = key) then return value
    return null

删除操作写起来也很简单。正如你所知道的，从我们的链表数组中插入、查找和删除几乎是O(1)。

当我们的internalArray太满时，可能在85%左右的容量，我们可以调整内部数组的大小，并将所有项目从旧数组移动到新数组中。

2009-04-08 17:24:48

其他回答

其实比这更简单。

哈希表不过是一个包含键/值对的向量数组(通常是稀疏数组)。此数组的最大大小通常小于哈希表中存储的数据类型的可能值集中的项数。

哈希算法用于根据将存储在数组中的项的值生成该数组的索引。

This is where storing vectors of key/value pairs in the array come in. Because the set of values that can be indexes in the array is typically smaller than the number of all possible values that the type can have, it is possible that your hash algorithm is going to generate the same value for two separate keys. A good hash algorithm will prevent this as much as possible (which is why it is relegated to the type usually because it has specific information which a general hash algorithm can't possibly know), but it's impossible to prevent.

因此，您可以使用多个键来生成相同的散列代码。当这种情况发生时，将遍历向量中的项，并在向量中的键和正在查找的键之间进行直接比较。如果找到，则返回与该键关联的值，否则不返回任何值。

2009-04-08 16:04:43

简短而甜蜜:

哈希表封装了一个数组，我们称之为internalArray。将项以如下方式插入数组:

let insert key value =
    internalArray[hash(key) % internalArray.Length] <- (key, value)
    //oversimplified for educational purposes

let insert key value =
    internalArray[hash(key) % internalArray.Length].AddLast(key, value)

所以，如果我想从哈希表中检索一个项，我可以这样写:

let get key =
    let linkedList = internalArray[hash(key) % internalArray.Length]
    for (testKey, value) in linkedList
        if (testKey = key) then return value
    return null

删除操作写起来也很简单。正如你所知道的，从我们的链表数组中插入、查找和删除几乎是O(1)。

当我们的internalArray太满时，可能在85%左右的容量，我们可以调整内部数组的大小，并将所有项目从旧数组移动到新数组中。

2009-04-08 17:24:48

这是一个相当深奥的理论领域，但基本轮廓很简单。

本质上，哈希函数只是一个函数，它从一个空间(比如任意长度的字符串)获取内容，并将它们映射到一个用于索引的空间(比如无符号整数)。

如果你只有一个小空间的东西来散列，你可能只需要把这些东西解释为整数，你就完成了(例如4字节字符串)

不过，通常情况下，你的空间要大得多。如果你允许作为键的空间大于你用于索引的空间(你的uint32或其他)，那么你不可能为每个键都有唯一的值。当两个或多个东西散列到相同的结果时，您必须以适当的方式处理冗余(这通常被称为冲突，如何处理它或不处理它将略微取决于您使用散列的目的)。

这意味着你不希望得到相同的结果，你也可能希望哈希函数是快速的。

平衡这两个属性(以及其他一些属性)让许多人忙得不可开交!

在实践中，您通常应该能够找到一个已知适合您的应用程序的函数并使用它。

Now to make this work as a hashtable: Imagine you didn't care about memory usage. Then you can create an array as long as your indexing set (all uint32's, for example). As you add something to the table, you hash it's key and look at the array at that index. If there is nothing there, you put your value there. If there is already something there, you add this new entry to a list of things at that address, along with enough information (your original key, or something clever) to find which entry actually belongs to which key.

因此，随着时间的推移，哈希表(数组)中的每个条目要么是空的，要么包含一个条目，要么包含一个条目列表。检索很简单，就像在数组中建立索引，然后返回值，或者遍历值列表并返回正确的值。

当然，在实践中你通常不能这样做，它浪费太多的内存。因此，所有操作都基于稀疏数组(其中唯一的条目是实际使用的条目，其他所有内容都隐式为空)。

有很多方案和技巧可以让它更好地工作，但这是最基本的。

2009-04-08 16:11:34

这是另一种看待它的方式。

我假设你理解数组A的概念，它支持索引操作，你可以一步找到第I个元素，A[I]，不管A有多大。

因此，例如，如果您想存储一组恰好年龄不同的人的信息，一个简单的方法是有一个足够大的数组，并使用每个人的年龄作为数组的索引。这样，你就可以一步获取任何人的信息。

But of course there could be more than one person with the same age, so what you put in the array at each entry is a list of all the people who have that age. So you can get to an individual person's information in one step plus a little bit of search in that list (called a "bucket"). It only slows down if there are so many people that the buckets get big. Then you need a larger array, and some other way to get more identifying information about the person, like the first few letters of their surname, instead of using age.

这是基本思想。不使用年龄，可以使用任何能产生良好价值观传播的人的函数。这就是哈希函数。比如你可以把这个人名字的ASCII表示的每三分之一，按某种顺序打乱。重要的是，您不希望太多人散列到同一个存储桶，因为速度取决于存储桶保持较小。

2009-04-08 17:44:33

对于所有寻找编程用语的人，下面是它是如何工作的。高级哈希表的内部实现有许多复杂之处，并且对存储分配/释放和搜索进行了优化，但顶层的思想是非常相同的。

(void) addValue : (object) value
{
   int bucket = calculate_bucket_from_val(value);
   if (bucket) 
   {
       //do nothing, just overwrite
   }
   else   //create bucket
   {
      create_extra_space_for_bucket();
   }
   put_value_into_bucket(bucket,value);
}

(bool) exists : (object) value
{
   int bucket = calculate_bucket_from_val(value);
   return bucket;
}

其中calculate_bucket_from_val()是哈希函数，所有的惟一性魔术都必须在这里发生。

经验法则是: 对于要插入的给定值，bucket必须是唯一的，并且派生自它应该存储的值。

Bucket是存储值的任何空间-这里我将它保持int作为数组索引，但它也可能是一个内存位置。

2015-10-07 11:11:20

哈希表是如何工作的?

推荐文章

最新文章

标签