在.NET中,GetHashCode方法在整个.NET基类库的许多地方都使用。正确执行它对于在集合中或确定相等时快速查找项目尤为重要。
对于如何为自定义类实现GetHashCode,是否有标准算法或最佳实践,以便不会降低性能?
在.NET中,GetHashCode方法在整个.NET基类库的许多地方都使用。正确执行它对于在集合中或确定相等时快速查找项目尤为重要。
对于如何为自定义类实现GetHashCode,是否有标准算法或最佳实践,以便不会降低性能?
当前回答
我在Helper库中有一个Hashing类,用于此目的。
/// <summary>
/// This is a simple hashing function from Robert Sedgwicks Hashing in C book.
/// Also, some simple optimizations to the algorithm in order to speed up
/// its hashing process have been added. from: www.partow.net
/// </summary>
/// <param name="input">array of objects, parameters combination that you need
/// to get a unique hash code for them</param>
/// <returns>Hash code</returns>
public static int RSHash(params object[] input)
{
const int b = 378551;
int a = 63689;
int hash = 0;
// If it overflows then just wrap around
unchecked
{
for (int i = 0; i < input.Length; i++)
{
if (input[i] != null)
{
hash = hash * a + input[i].GetHashCode();
a = a * b;
}
}
}
return hash;
}
然后,只需将其用作:
public override int GetHashCode()
{
return Hashing.RSHash(_field1, _field2, _field3);
}
我没有评估它的表现,所以欢迎任何反馈。
其他回答
这是一个实现Josh Bloch实现的静态助手类;并且提供了显式重载来“防止”装箱,并且还专门为长原语实现哈希。
您可以传递与equals实现匹配的字符串比较。
因为Hash输出始终是int,所以您可以只链接Hash调用。
using System;
using System.Collections;
using System.Collections.Generic;
using System.Reflection;
using System.Runtime.CompilerServices;
namespace Sc.Util.System
{
/// <summary>
/// Static methods that allow easy implementation of hashCode. Example usage:
/// <code>
/// public override int GetHashCode()
/// => HashCodeHelper.Seed
/// .Hash(primitiveField)
/// .Hsh(objectField)
/// .Hash(iEnumerableField);
/// </code>
/// </summary>
public static class HashCodeHelper
{
/// <summary>
/// An initial value for a hashCode, to which is added contributions from fields.
/// Using a non-zero value decreases collisions of hashCode values.
/// </summary>
public const int Seed = 23;
private const int oddPrimeNumber = 37;
/// <summary>
/// Rotates the seed against a prime number.
/// </summary>
/// <param name="aSeed">The hash's first term.</param>
/// <returns>The new hash code.</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static int rotateFirstTerm(int aSeed)
{
unchecked {
return HashCodeHelper.oddPrimeNumber * aSeed;
}
}
/// <summary>
/// Contributes a boolean to the developing HashCode seed.
/// </summary>
/// <param name="aSeed">The developing HashCode value or seed.</param>
/// <param name="aBoolean">The value to contribute.</param>
/// <returns>The new hash code.</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int Hash(this int aSeed, bool aBoolean)
{
unchecked {
return HashCodeHelper.rotateFirstTerm(aSeed)
+ (aBoolean
? 1
: 0);
}
}
/// <summary>
/// Contributes a char to the developing HashCode seed.
/// </summary>
/// <param name="aSeed">The developing HashCode value or seed.</param>
/// <param name="aChar">The value to contribute.</param>
/// <returns>The new hash code.</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int Hash(this int aSeed, char aChar)
{
unchecked {
return HashCodeHelper.rotateFirstTerm(aSeed)
+ aChar;
}
}
/// <summary>
/// Contributes an int to the developing HashCode seed.
/// Note that byte and short are handled by this method, through implicit conversion.
/// </summary>
/// <param name="aSeed">The developing HashCode value or seed.</param>
/// <param name="aInt">The value to contribute.</param>
/// <returns>The new hash code.</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int Hash(this int aSeed, int aInt)
{
unchecked {
return HashCodeHelper.rotateFirstTerm(aSeed)
+ aInt;
}
}
/// <summary>
/// Contributes a long to the developing HashCode seed.
/// </summary>
/// <param name="aSeed">The developing HashCode value or seed.</param>
/// <param name="aLong">The value to contribute.</param>
/// <returns>The new hash code.</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int Hash(this int aSeed, long aLong)
{
unchecked {
return HashCodeHelper.rotateFirstTerm(aSeed)
+ (int)(aLong ^ (aLong >> 32));
}
}
/// <summary>
/// Contributes a float to the developing HashCode seed.
/// </summary>
/// <param name="aSeed">The developing HashCode value or seed.</param>
/// <param name="aFloat">The value to contribute.</param>
/// <returns>The new hash code.</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int Hash(this int aSeed, float aFloat)
{
unchecked {
return HashCodeHelper.rotateFirstTerm(aSeed)
+ Convert.ToInt32(aFloat);
}
}
/// <summary>
/// Contributes a double to the developing HashCode seed.
/// </summary>
/// <param name="aSeed">The developing HashCode value or seed.</param>
/// <param name="aDouble">The value to contribute.</param>
/// <returns>The new hash code.</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int Hash(this int aSeed, double aDouble)
=> aSeed.Hash(Convert.ToInt64(aDouble));
/// <summary>
/// Contributes a string to the developing HashCode seed.
/// </summary>
/// <param name="aSeed">The developing HashCode value or seed.</param>
/// <param name="aString">The value to contribute.</param>
/// <param name="stringComparison">Optional comparison that creates the hash.</param>
/// <returns>The new hash code.</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int Hash(
this int aSeed,
string aString,
StringComparison stringComparison = StringComparison.Ordinal)
{
if (aString == null)
return aSeed.Hash(0);
switch (stringComparison) {
case StringComparison.CurrentCulture :
return StringComparer.CurrentCulture.GetHashCode(aString);
case StringComparison.CurrentCultureIgnoreCase :
return StringComparer.CurrentCultureIgnoreCase.GetHashCode(aString);
case StringComparison.InvariantCulture :
return StringComparer.InvariantCulture.GetHashCode(aString);
case StringComparison.InvariantCultureIgnoreCase :
return StringComparer.InvariantCultureIgnoreCase.GetHashCode(aString);
case StringComparison.OrdinalIgnoreCase :
return StringComparer.OrdinalIgnoreCase.GetHashCode(aString);
default :
return StringComparer.Ordinal.GetHashCode(aString);
}
}
/// <summary>
/// Contributes a possibly-null array to the developing HashCode seed.
/// Each element may be a primitive, a reference, or a possibly-null array.
/// </summary>
/// <param name="aSeed">The developing HashCode value or seed.</param>
/// <param name="aArray">CAN be null.</param>
/// <returns>The new hash code.</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int Hash(this int aSeed, IEnumerable aArray)
{
if (aArray == null)
return aSeed.Hash(0);
int countPlusOne = 1; // So it differs from null
foreach (object item in aArray) {
++countPlusOne;
if (item is IEnumerable arrayItem) {
if (!object.ReferenceEquals(aArray, arrayItem))
aSeed = aSeed.Hash(arrayItem); // recursive call!
} else
aSeed = aSeed.Hash(item);
}
return aSeed.Hash(countPlusOne);
}
/// <summary>
/// Contributes a possibly-null array to the developing HashCode seed.
/// You must provide the hash function for each element.
/// </summary>
/// <param name="aSeed">The developing HashCode value or seed.</param>
/// <param name="aArray">CAN be null.</param>
/// <param name="hashElement">Required: yields the hash for each element
/// in <paramref name="aArray"/>.</param>
/// <returns>The new hash code.</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int Hash<T>(this int aSeed, IEnumerable<T> aArray, Func<T, int> hashElement)
{
if (aArray == null)
return aSeed.Hash(0);
int countPlusOne = 1; // So it differs from null
foreach (T item in aArray) {
++countPlusOne;
aSeed = aSeed.Hash(hashElement(item));
}
return aSeed.Hash(countPlusOne);
}
/// <summary>
/// Contributes a possibly-null object to the developing HashCode seed.
/// </summary>
/// <param name="aSeed">The developing HashCode value or seed.</param>
/// <param name="aObject">CAN be null.</param>
/// <returns>The new hash code.</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int Hash(this int aSeed, object aObject)
{
switch (aObject) {
case null :
return aSeed.Hash(0);
case bool b :
return aSeed.Hash(b);
case char c :
return aSeed.Hash(c);
case int i :
return aSeed.Hash(i);
case long l :
return aSeed.Hash(l);
case float f :
return aSeed.Hash(f);
case double d :
return aSeed.Hash(d);
case string s :
return aSeed.Hash(s);
case IEnumerable iEnumerable :
return aSeed.Hash(iEnumerable);
}
return aSeed.Hash(aObject.GetHashCode());
}
/// <summary>
/// This utility method uses reflection to iterate all specified properties that are readable
/// on the given object, excluding any property names given in the params arguments, and
/// generates a hashcode.
/// </summary>
/// <param name="aSeed">The developing hash code, or the seed: if you have no seed, use
/// the <see cref="Seed"/>.</param>
/// <param name="aObject">CAN be null.</param>
/// <param name="propertySelector"><see cref="BindingFlags"/> to select the properties to hash.</param>
/// <param name="ignorePropertyNames">Optional.</param>
/// <returns>A hash from the properties contributed to <c>aSeed</c>.</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int HashAllProperties(
this int aSeed,
object aObject,
BindingFlags propertySelector
= BindingFlags.Instance
| BindingFlags.Public
| BindingFlags.GetProperty,
params string[] ignorePropertyNames)
{
if (aObject == null)
return aSeed.Hash(0);
if ((ignorePropertyNames != null)
&& (ignorePropertyNames.Length != 0)) {
foreach (PropertyInfo propertyInfo in aObject.GetType()
.GetProperties(propertySelector)) {
if (!propertyInfo.CanRead
|| (Array.IndexOf(ignorePropertyNames, propertyInfo.Name) >= 0))
continue;
aSeed = aSeed.Hash(propertyInfo.GetValue(aObject));
}
} else {
foreach (PropertyInfo propertyInfo in aObject.GetType()
.GetProperties(propertySelector)) {
if (propertyInfo.CanRead)
aSeed = aSeed.Hash(propertyInfo.GetValue(aObject));
}
}
return aSeed;
}
/// <summary>
/// NOTICE: this method is provided to contribute a <see cref="KeyValuePair{TKey,TValue}"/> to
/// the developing HashCode seed; by hashing the key and the value independently. HOWEVER,
/// this method has a different name since it will not be automatically invoked by
/// <see cref="Hash(int,object)"/>, <see cref="Hash(int,IEnumerable)"/>,
/// or <see cref="HashAllProperties"/> --- you MUST NOT mix this method with those unless
/// you are sure that no KeyValuePair instances will be passed to those methods; or otherwise
/// the generated hash code will not be consistent. This method itself ALSO will not invoke
/// this method on the Key or Value here if that itself is a KeyValuePair.
/// </summary>
/// <param name="aSeed">The developing HashCode value or seed.</param>
/// <param name="keyValuePair">The value to contribute.</param>
/// <returns>The new hash code.</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int HashKeyAndValue<TKey, TValue>(this int aSeed, KeyValuePair<TKey, TValue> keyValuePair)
=> aSeed.Hash(keyValuePair.Key)
.Hash(keyValuePair.Value);
/// <summary>
/// NOTICE: this method is provided to contribute a collection of <see cref="KeyValuePair{TKey,TValue}"/>
/// to the developing HashCode seed; by hashing the key and the value independently. HOWEVER,
/// this method has a different name since it will not be automatically invoked by
/// <see cref="Hash(int,object)"/>, <see cref="Hash(int,IEnumerable)"/>,
/// or <see cref="HashAllProperties"/> --- you MUST NOT mix this method with those unless
/// you are sure that no KeyValuePair instances will be passed to those methods; or otherwise
/// the generated hash code will not be consistent. This method itself ALSO will not invoke
/// this method on a Key or Value here if that itself is a KeyValuePair or an Enumerable of
/// KeyValuePair.
/// </summary>
/// <param name="aSeed">The developing HashCode value or seed.</param>
/// <param name="keyValuePairs">The values to contribute.</param>
/// <returns>The new hash code.</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int HashKeysAndValues<TKey, TValue>(
this int aSeed,
IEnumerable<KeyValuePair<TKey, TValue>> keyValuePairs)
{
if (keyValuePairs == null)
return aSeed.Hash(null);
foreach (KeyValuePair<TKey, TValue> keyValuePair in keyValuePairs) {
aSeed = aSeed.HashKeyAndValue(keyValuePair);
}
return aSeed;
}
}
}
直到最近,我的回答都很接近乔恩·斯基特的回答。然而,我最近开始了一个使用两个哈希表的幂的项目,即内部表大小为8、16、32等的哈希表。
而且非常糟糕。因此,经过一点实验和研究后,我开始用以下方法重新散列我的散列:
public static int ReHash(int source)
{
unchecked
{
ulong c = 0xDEADBEEFDEADBEEF + (ulong)source;
ulong d = 0xE2ADBEEFDEADBEEF ^ c;
ulong a = d += c = c << 15 | c >> -15;
ulong b = a += d = d << 52 | d >> -52;
c ^= b += a = a << 26 | a >> -26;
d ^= c += b = b << 51 | b >> -51;
a ^= d += c = c << 28 | c >> -28;
b ^= a += d = d << 9 | d >> -9;
c ^= b += a = a << 47 | a >> -47;
d ^= c += b << 54 | b >> -54;
a ^= d += c << 32 | c >> 32;
a += d << 25 | d >> -25;
return (int)(a >> 1);
}
}
然后我的两个哈希表的能力就不再糟糕了。
但这让我很不安,因为上面的方法不应该奏效。或者更准确地说,除非原始的GetHashCode()以非常特殊的方式很差,否则它不应该工作。
重新混合哈希代码并不能改善一个好的哈希代码,因为唯一可能的效果是我们引入了更多的冲突。
重新混合哈希代码并不能改善糟糕的哈希代码,因为唯一可能的效果是我们将值53上的大量冲突更改为值183487291的大量冲突。
重新混合哈希代码只能改进哈希代码,该哈希代码至少在避免整个范围内的绝对冲突(232个可能值)方面做得相当好,但在为哈希表中的实际使用而进行模化时,在避免冲突方面做得很差。虽然二次幂表的简单模使这一点更加明显,但它对更常见的素数表也有负面影响,这并不是那么明显(重新散列的额外工作将超过好处,但好处仍然存在)。
编辑:我还使用了开放寻址,这也会增加对冲突的敏感度,也许比二的幂更敏感。
好吧,这令人不安的是,.NET(或这里的研究)中的string.GetHashCode()实现可以通过这种方式改进多少(由于较少的冲突,测试运行速度大约快20-30倍),更令人不安我自己的哈希代码可以改进多少(远远不止于此)。
我过去编写的所有GetHashCode()实现,实际上都是这个网站上答案的基础,比我想象的要糟糕得多。很多时候,它对于很多用途来说“足够好”,但我想要更好的东西。
所以我把这个项目放在一边(反正它是一个宠物项目),开始研究如何在.NET中快速生成一个好的、分布良好的哈希代码。
最后,我决定将SpookyHash移植到.NET。实际上,上面的代码是使用SpookyHash从32位输入生成32位输出的快速路径版本。
现在,SpookyHash不是一个好的快速记忆代码。我的端口就更少了,因为我手动内联了很多端口以提高速度*。但这就是代码重用的目的。
然后我把这个项目放在一边,因为正如最初的项目产生了如何产生更好的哈希代码的问题,所以这个项目产生了怎样产生更好的.NET memcpy的问题。
然后我回来了,并生成了大量重载,以便将几乎所有的原生类型(十进制†除外)轻松地输入到哈希代码中。
它速度很快,鲍勃·詹金斯(Bob Jenkins)值得称赞,因为我移植的原始代码速度更快,尤其是在64位机器上,算法经过了优化。
完整的代码可以在https://bitbucket.org/JonHanna/spookilysharp/src但是考虑到上面的代码是它的简化版本。
然而,由于它现在已经写好了,因此可以更容易地使用它:
public override int GetHashCode()
{
var hash = new SpookyHash();
hash.Update(field1);
hash.Update(field2);
hash.Update(field3);
return hash.Final().GetHashCode();
}
它还需要种子值,因此,如果您需要处理不受信任的输入,并希望防止哈希DoS攻击,您可以根据正常运行时间或类似情况设置种子,并使攻击者无法预测结果:
private static long hashSeed0 = Environment.TickCount;
private static long hashSeed1 = DateTime.Now.Ticks;
public override int GetHashCode()
{
//produce different hashes ever time this application is restarted
//but remain consistent in each run, so attackers have a harder time
//DoSing the hash tables.
var hash = new SpookyHash(hashSeed0, hashSeed1);
hash.Update(field1);
hash.Update(field2);
hash.Update(field3);
return hash.Final().GetHashCode();
}
*这方面的一个大惊喜是,手动内联返回(x<<n)|(x>>-n)的旋转方法改进了性能。我本可以确定抖动会为我内联,但评测显示的情况并非如此。
†十进制虽然来自C#,但从.NET角度看不是本机。它的问题是,它自己的GetHashCode()将精度视为重要,而它自己的Equals()则没有。两者都是有效的选择,但不是那样混合。在实现自己的版本时,您需要选择执行一个或另一个,但我不知道您想要哪个。
‡通过比较。如果在字符串上使用,64位的SpookyHash要比32位的string.GetHashCode()快得多,这比64位的string.GetHashCode()要快得多。
可以尝试采用C++Boost库的方法。类似于:
class HashUtil
{
public static int HashCombine(int seed, int other)
{
unchecked
{
return other + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}
}
}
然后:
class MyClass
{
private string _field1;
private int _field2;
private AnotherClass _field3;
private YetAnotherClass _field4;
public override int GetHashCode()
{
int result = HashUtil.HashCombine(_field1.GetHashCode(), _field2);
result = HashUtil.HashCombine(result, _field3.GetHashCode());
return HashUtil.HashCombine(result, _field4.GetHashCode());
}
}
微软引领了几种哈希方法。。。
//for classes that contain a single int value
return this.value;
//for classes that contain multiple int value
return x ^ y;
//for classes that contain single number bigger than int
return ((int)value ^ (int)(value >> 32));
//for classes that contain class instance fields which inherit from object
return obj1.GetHashCode();
//for classes that contain multiple class instance fields which inherit from object
return obj1.GetHashCode() ^ obj2.GetHashCode() ^ obj3.GetHashCode();
我可以猜测,对于多个大整数,您可以使用这个:
int a=((int)value1 ^ (int)(value1 >> 32));
int b=((int)value2 ^ (int)(value2 >> 32));
int c=((int)value3 ^ (int)(value3 >> 32));
return a ^ b ^ c;
对于多类型也是如此:首先使用GetHashCode()将所有类型转换为int然后int值将被xor'ed,结果是您的哈希值。
对于那些使用哈希作为ID(我的意思是一个唯一的值)的人来说,哈希自然被限制在数字个数,我认为哈希算法是5个字节,至少是MD5。
您可以将多个值转换为哈希值,其中一些值是相同的,因此不要将其用作标识符。(也许有一天我会使用你的组件)
如果我们的财产不超过8处(希望如此),这里还有另一种选择。
ValueTuple是一个结构,似乎有一个可靠的GetHashCode实现。
这意味着我们可以简单地这样做:
// Yay, no allocations and no custom implementations!
public override int GetHashCode() => (this.PropA, this.PropB).GetHashCode();
让我们来看看.NETCore当前对ValueTuple的GetHashCode的实现。
这来自ValueTuple:
internal static int CombineHashCodes(int h1, int h2)
{
return HashHelpers.Combine(HashHelpers.Combine(HashHelpers.RandomSeed, h1), h2);
}
internal static int CombineHashCodes(int h1, int h2, int h3)
{
return HashHelpers.Combine(CombineHashCodes(h1, h2), h3);
}
这来自HashHelper:
public static readonly int RandomSeed = Guid.NewGuid().GetHashCode();
public static int Combine(int h1, int h2)
{
unchecked
{
// RyuJIT optimizes this to use the ROL instruction
// Related GitHub pull request: dotnet/coreclr#1830
uint rol5 = ((uint)h1 << 5) | ((uint)h1 >> 27);
return ((int)rol5 + h1) ^ h2;
}
}
英语:
向左旋转(循环移位)h1 5个位置。将结果和h1相加。将结果与h2进行异或运算。首先对{static random seed,h1}执行上述操作。对于每个其他项目,对上一个结果和下一个项目(例如h2)执行操作。
如果能更多地了解这个ROL-5散列代码算法的财产,那就太好了。
遗憾的是,为我们自己的GetHashCode延迟ValueTuple可能不像我们希望的那样快。相关讨论中的这条评论说明了直接调用HashHelpers.Combine更具性能。另一方面,这是内部的,所以我们必须复制代码,牺牲了我们在这里获得的大部分。此外,我们将负责记住首先与随机种子结合。我不知道如果我们跳过这一步会有什么后果。