在.NET中,GetHashCode方法在整个.NET基类库的许多地方都使用。正确执行它对于在集合中或确定相等时快速查找项目尤为重要。
对于如何为自定义类实现GetHashCode,是否有标准算法或最佳实践,以便不会降低性能?
在.NET中,GetHashCode方法在整个.NET基类库的许多地方都使用。正确执行它对于在集合中或确定相等时快速查找项目尤为重要。
对于如何为自定义类实现GetHashCode,是否有标准算法或最佳实践,以便不会降低性能?
当前回答
ReSharper用户可以使用ReSharper->Edit->generate Code->Equality Members生成GetHashCode、Equals等。
// ReSharper's GetHashCode looks like this
public override int GetHashCode() {
unchecked {
int hashCode = Id;
hashCode = (hashCode * 397) ^ IntMember;
hashCode = (hashCode * 397) ^ OtherIntMember;
hashCode = (hashCode * 397) ^ (RefMember != null ? RefMember.GetHashCode() : 0);
// ...
return hashCode;
}
}
其他回答
如果我们的财产不超过8处(希望如此),这里还有另一种选择。
ValueTuple是一个结构,似乎有一个可靠的GetHashCode实现。
这意味着我们可以简单地这样做:
// Yay, no allocations and no custom implementations!
public override int GetHashCode() => (this.PropA, this.PropB).GetHashCode();
让我们来看看.NETCore当前对ValueTuple的GetHashCode的实现。
这来自ValueTuple:
internal static int CombineHashCodes(int h1, int h2)
{
return HashHelpers.Combine(HashHelpers.Combine(HashHelpers.RandomSeed, h1), h2);
}
internal static int CombineHashCodes(int h1, int h2, int h3)
{
return HashHelpers.Combine(CombineHashCodes(h1, h2), h3);
}
这来自HashHelper:
public static readonly int RandomSeed = Guid.NewGuid().GetHashCode();
public static int Combine(int h1, int h2)
{
unchecked
{
// RyuJIT optimizes this to use the ROL instruction
// Related GitHub pull request: dotnet/coreclr#1830
uint rol5 = ((uint)h1 << 5) | ((uint)h1 >> 27);
return ((int)rol5 + h1) ^ h2;
}
}
英语:
向左旋转(循环移位)h1 5个位置。将结果和h1相加。将结果与h2进行异或运算。首先对{static random seed,h1}执行上述操作。对于每个其他项目,对上一个结果和下一个项目(例如h2)执行操作。
如果能更多地了解这个ROL-5散列代码算法的财产,那就太好了。
遗憾的是,为我们自己的GetHashCode延迟ValueTuple可能不像我们希望的那样快。相关讨论中的这条评论说明了直接调用HashHelpers.Combine更具性能。另一方面,这是内部的,所以我们必须复制代码,牺牲了我们在这里获得的大部分。此外,我们将负责记住首先与随机种子结合。我不知道如果我们跳过这一步会有什么后果。
截至https://github.com/dotnet/coreclr/pull/14863,有一种生成哈希代码的新方法非常简单!只要写
public override int GetHashCode()
=> HashCode.Combine(field1, field2, field3);
这将生成高质量的哈希代码,而无需担心实现细节。
这是我的哈希码助手。它的优点是它使用泛型类型参数,因此不会导致装箱:
public static class HashHelper
{
public static int GetHashCode<T1, T2>(T1 arg1, T2 arg2)
{
unchecked
{
return 31 * arg1.GetHashCode() + arg2.GetHashCode();
}
}
public static int GetHashCode<T1, T2, T3>(T1 arg1, T2 arg2, T3 arg3)
{
unchecked
{
int hash = arg1.GetHashCode();
hash = 31 * hash + arg2.GetHashCode();
return 31 * hash + arg3.GetHashCode();
}
}
public static int GetHashCode<T1, T2, T3, T4>(T1 arg1, T2 arg2, T3 arg3,
T4 arg4)
{
unchecked
{
int hash = arg1.GetHashCode();
hash = 31 * hash + arg2.GetHashCode();
hash = 31 * hash + arg3.GetHashCode();
return 31 * hash + arg4.GetHashCode();
}
}
public static int GetHashCode<T>(T[] list)
{
unchecked
{
int hash = 0;
foreach (var item in list)
{
hash = 31 * hash + item.GetHashCode();
}
return hash;
}
}
public static int GetHashCode<T>(IEnumerable<T> list)
{
unchecked
{
int hash = 0;
foreach (var item in list)
{
hash = 31 * hash + item.GetHashCode();
}
return hash;
}
}
/// <summary>
/// Gets a hashcode for a collection for that the order of items
/// does not matter.
/// So {1, 2, 3} and {3, 2, 1} will get same hash code.
/// </summary>
public static int GetHashCodeForOrderNoMatterCollection<T>(
IEnumerable<T> list)
{
unchecked
{
int hash = 0;
int count = 0;
foreach (var item in list)
{
hash += item.GetHashCode();
count++;
}
return 31 * hash + count.GetHashCode();
}
}
/// <summary>
/// Alternative way to get a hashcode is to use a fluent
/// interface like this:<br />
/// return 0.CombineHashCode(field1).CombineHashCode(field2).
/// CombineHashCode(field3);
/// </summary>
public static int CombineHashCode<T>(this int hashCode, T arg)
{
unchecked
{
return 31 * hashCode + arg.GetHashCode();
}
}
此外,它还具有扩展方法来提供流畅的界面,因此您可以这样使用它:
public override int GetHashCode()
{
return HashHelper.GetHashCode(Manufacturer, PartN, Quantity);
}
或者像这样:
public override int GetHashCode()
{
return 0.CombineHashCode(Manufacturer)
.CombineHashCode(PartN)
.CombineHashCode(Quantity);
}
我在Helper库中有一个Hashing类,用于此目的。
/// <summary>
/// This is a simple hashing function from Robert Sedgwicks Hashing in C book.
/// Also, some simple optimizations to the algorithm in order to speed up
/// its hashing process have been added. from: www.partow.net
/// </summary>
/// <param name="input">array of objects, parameters combination that you need
/// to get a unique hash code for them</param>
/// <returns>Hash code</returns>
public static int RSHash(params object[] input)
{
const int b = 378551;
int a = 63689;
int hash = 0;
// If it overflows then just wrap around
unchecked
{
for (int i = 0; i < input.Length; i++)
{
if (input[i] != null)
{
hash = hash * a + input[i].GetHashCode();
a = a * b;
}
}
}
return hash;
}
然后,只需将其用作:
public override int GetHashCode()
{
return Hashing.RSHash(_field1, _field2, _field3);
}
我没有评估它的表现,所以欢迎任何反馈。
这是Jon Skeet发布的上述算法的另一个流畅实现,但不包括分配或装箱操作:
public static class Hash
{
public const int Base = 17;
public static int HashObject(this int hash, object obj)
{
unchecked { return hash * 23 + (obj == null ? 0 : obj.GetHashCode()); }
}
public static int HashValue<T>(this int hash, T value)
where T : struct
{
unchecked { return hash * 23 + value.GetHashCode(); }
}
}
用法:
public class MyType<T>
{
public string Name { get; set; }
public string Description { get; set; }
public int Value { get; set; }
public IEnumerable<T> Children { get; set; }
public override int GetHashCode()
{
return Hash.Base
.HashObject(this.Name)
.HashObject(this.Description)
.HashValue(this.Value)
.HashObject(this.Children);
}
}
由于泛型类型约束,编译器将确保不使用类调用HashValue。但是没有编译器支持HashObject,因为添加泛型参数也会添加装箱操作。