我正在寻找一个关于哈希表如何工作的解释-用简单的英语为我这样的傻瓜!

例如,我知道它接受键,计算哈希(我正在寻找一个解释),然后执行某种模运算来计算出它在存储值的数组中的位置,但我的知识到此为止。

谁能解释一下过程吗?

编辑:我并不是特别问哈希码是如何计算的,而是对哈希表如何工作的一般概述。


当前回答

Hashtable inside contains cans in which it stores the key sets. The Hashtable uses the hashcode to decide to which the key pair should plan. The capacity to get the container area from Key's hashcode is known as hash work. In principle, a hash work is a capacity which when given a key, creates an address in the table. A hash work consistently returns a number for an item. Two equivalent items will consistently have a similar number while two inconsistent objects may not generally have various numbers. When we put objects into a hashtable then it is conceivable that various objects may have equal/ same hashcode. This is known as a collision. To determine collision, hashtable utilizes a variety of lists. The sets mapped to a single array index are stored in a list and then the list reference is stored in the index.

其他回答

对于所有寻找编程用语的人,下面是它是如何工作的。高级哈希表的内部实现有许多复杂之处,并且对存储分配/释放和搜索进行了优化,但顶层的思想是非常相同的。

(void) addValue : (object) value
{
   int bucket = calculate_bucket_from_val(value);
   if (bucket) 
   {
       //do nothing, just overwrite
   }
   else   //create bucket
   {
      create_extra_space_for_bucket();
   }
   put_value_into_bucket(bucket,value);
}

(bool) exists : (object) value
{
   int bucket = calculate_bucket_from_val(value);
   return bucket;
}

其中calculate_bucket_from_val()是哈希函数,所有的惟一性魔术都必须在这里发生。

经验法则是: 对于要插入的给定值,bucket必须是唯一的,并且派生自它应该存储的值。

Bucket是存储值的任何空间-这里我将它保持int作为数组索引,但它也可能是一个内存位置。

其实比这更简单。

哈希表不过是一个包含键/值对的向量数组(通常是稀疏数组)。此数组的最大大小通常小于哈希表中存储的数据类型的可能值集中的项数。

哈希算法用于根据将存储在数组中的项的值生成该数组的索引。

This is where storing vectors of key/value pairs in the array come in. Because the set of values that can be indexes in the array is typically smaller than the number of all possible values that the type can have, it is possible that your hash algorithm is going to generate the same value for two separate keys. A good hash algorithm will prevent this as much as possible (which is why it is relegated to the type usually because it has specific information which a general hash algorithm can't possibly know), but it's impossible to prevent.

因此,您可以使用多个键来生成相同的散列代码。当这种情况发生时,将遍历向量中的项,并在向量中的键和正在查找的键之间进行直接比较。如果找到,则返回与该键关联的值,否则不返回任何值。

哈希表完全基于这样一个事实,即实际计算遵循随机访问机模型,即内存中任何地址的值都可以在O(1)时间或常数时间内访问。

因此,如果我有一个键的宇宙(我可以在应用程序中使用的所有可能的键的集合,例如,滚动no。对于学生来说,如果它是4位,那么这个宇宙就是从1到9999的一组数字),并且一种将它们映射到有限大小的数字集的方法可以在我的系统中分配内存,理论上我的哈希表已经准备好了。

Generally, in applications the size of universe of keys is very large than number of elements I want to add to the hash table(I don't wanna waste a 1 GB memory to hash ,say, 10000 or 100000 integer values because they are 32 bit long in binary reprsentaion). So, we use this hashing. It's sort of a mixing kind of "mathematical" operation, which maps my large universe to a small set of values that I can accomodate in memory. In practical cases, often space of a hash table is of the same "order"(big-O) as the (number of elements *size of each element), So, we don't waste much memory.

现在,一个大集合映射到一个小集合,映射必须是多对一的。因此,不同的键将被分配相同的空间(?? ?不公平)。有几种方法可以解决这个问题,我只知道其中最流行的两种:

Use the space that was to be allocated to the value as a reference to a linked list. This linked list will store one or more values, that come to reside in same slot in many to one mapping. The linked list also contains keys to help someone who comes searching. It's like many people in same apartment, when a delivery-man comes, he goes to the room and asks specifically for the guy. Use a double hash function in an array which gives the same sequence of values every time rather than a single value. When I go to store a value, I see whether the required memory location is free or occupied. If it's free, I can store my value there, if it's occupied I take next value from the sequence and so on until I find a free location and I store my value there. When searching or retreiving the value, I go back on same path as given by the sequence and at each location ask for the vaue if it's there until I find it or search all possible locations in the array.

CLRS的《算法导论》对这个主题提供了非常好的见解。

Hashtable inside contains cans in which it stores the key sets. The Hashtable uses the hashcode to decide to which the key pair should plan. The capacity to get the container area from Key's hashcode is known as hash work. In principle, a hash work is a capacity which when given a key, creates an address in the table. A hash work consistently returns a number for an item. Two equivalent items will consistently have a similar number while two inconsistent objects may not generally have various numbers. When we put objects into a hashtable then it is conceivable that various objects may have equal/ same hashcode. This is known as a collision. To determine collision, hashtable utilizes a variety of lists. The sets mapped to a single array index are stored in a list and then the list reference is stored in the index.

这是一个相当深奥的理论领域,但基本轮廓很简单。

本质上,哈希函数只是一个函数,它从一个空间(比如任意长度的字符串)获取内容,并将它们映射到一个用于索引的空间(比如无符号整数)。

如果你只有一个小空间的东西来散列,你可能只需要把这些东西解释为整数,你就完成了(例如4字节字符串)

不过,通常情况下,你的空间要大得多。如果你允许作为键的空间大于你用于索引的空间(你的uint32或其他),那么你不可能为每个键都有唯一的值。当两个或多个东西散列到相同的结果时,您必须以适当的方式处理冗余(这通常被称为冲突,如何处理它或不处理它将略微取决于您使用散列的目的)。

这意味着你不希望得到相同的结果,你也可能希望哈希函数是快速的。

平衡这两个属性(以及其他一些属性)让许多人忙得不可开交!

在实践中,您通常应该能够找到一个已知适合您的应用程序的函数并使用它。

Now to make this work as a hashtable: Imagine you didn't care about memory usage. Then you can create an array as long as your indexing set (all uint32's, for example). As you add something to the table, you hash it's key and look at the array at that index. If there is nothing there, you put your value there. If there is already something there, you add this new entry to a list of things at that address, along with enough information (your original key, or something clever) to find which entry actually belongs to which key.

因此,随着时间的推移,哈希表(数组)中的每个条目要么是空的,要么包含一个条目,要么包含一个条目列表。检索很简单,就像在数组中建立索引,然后返回值,或者遍历值列表并返回正确的值。

当然,在实践中你通常不能这样做,它浪费太多的内存。因此,所有操作都基于稀疏数组(其中唯一的条目是实际使用的条目,其他所有内容都隐式为空)。

有很多方案和技巧可以让它更好地工作,但这是最基本的。