HashMap有两个重要的属性:大小和负载因子。我查阅了Java文档,它说0.75f是初始负载因子。但我找不到它的实际用途。
谁能描述一下我们需要设置负载系数的不同情况是什么,以及不同情况下的样本理想值是什么?
HashMap有两个重要的属性:大小和负载因子。我查阅了Java文档,它说0.75f是初始负载因子。但我找不到它的实际用途。
谁能描述一下我们需要设置负载系数的不同情况是什么,以及不同情况下的样本理想值是什么?
当前回答
什么是载客率?
HashMap为增加其容量而消耗的容量量。
为什么是载重系数?
负载因子默认为初始容量(16)的0.75,因此在容量增加之前,25%的桶将是空闲的&这使得许多带有新hashcode指向它们的新桶在桶数量增加后存在。
为什么要保留许多免费桶&保留免费桶对性能有什么影响?
如果您将加载因子设置为1.0,那么可能会发生一些非常有趣的事情。
Say you are adding an object x to your hashmap whose hashCode is 888 & in your hashmap the bucket representing the hashcode is free , so the object x gets added to the bucket, but now again say if you are adding another object y whose hashCode is also 888 then your object y will get added for sure BUT at the end of the bucket (because the buckets are nothing but linkedList implementation storing key,value & next) now this has a performance impact ! Since your object y is no longer present in the head of the bucket if you perform a lookup the time taken is not going to be O(1) this time it depends on how many items are there in the same bucket. This is called hash collision by the way & this even happens when your loading factor is less than 1.
性能、哈希碰撞和加载因子之间的相关性
更低的负载系数=更多的空闲桶=更少的碰撞几率=高性能=高空间需求。 更高的负载系数=更少的空闲桶=更高的碰撞几率=更低的性能=更低的空间需求。
其他回答
我会选择n * 1.5或n + (n >> 1)的表大小,这将给出不除法的负载因子。66666~,这在大多数系统上是很慢的,特别是在硬件中没有除法的便携式系统上。
文档解释得很好:
An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets. As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.
与所有性能优化一样,避免过早优化是一个好主意(即没有关于瓶颈在哪里的硬数据)。
实际上,根据我的计算,“完美”的负载系数更接近于log 2(~ 0.7)。尽管任何负载系数小于这个值都会产生更好的性能。我觉得点75手枪可能是从帽子里拿出来的。
证明:
可以避免连锁,并通过预测分支预测 桶是否为空。一个桶可能是空的,如果它的概率 空的超过。5。
s表示键的大小,n表示添加的键的数量。使用二项式 定理,一个桶为空的概率为:
P(0) = C(n, 0) * (1/s)^0 * (1 - 1/s)^(n - 0)
因此,如果小于,则桶可能是空的
log(2)/log(s/(s - 1)) keys
当s达到无穷大时,如果添加的键数等于 P(0) = .5,则n/s迅速逼近log(2):
lim (log(2)/log(s/(s - 1)))/s as s -> infinity = log(2) ~ 0.693...
从文档中可以看到:
负载因子衡量的是在哈希表的容量自动增加之前允许达到的满度
这实际上取决于您的特定需求,没有“经验法则”来指定初始负载系数。
HashMap的默认初始容量为16,负载因子为0.75f(即当前映射大小的75%)。负载因子表示HashMap容量应该在哪个级别加倍。
例如,容量与负载系数的乘积为16 * 0.75 = 12。这表示在HashMap中存储了第12个键值对后,其容量变为32。