Python中无穷大的哈希值有与pi匹配的数字:

>>> inf = float('inf')
>>> hash(inf)
314159
>>> int(math.pi*1e5)
314159

这是巧合还是有意为之?


_PyHASH_INF被定义为一个等于314159的常量。

我找不到任何关于这个问题的讨论,或者给出理由的评论。我认为这或多或少是随意选择的。我想,只要它们对其他哈希值不使用相同的有意义的值,就不应该有关系。


总结:这不是巧合;在Python的默认CPython实现中,_PyHASH_INF被硬编码为314159,并且是Tim Peters在2000年选择的任意值(显然是从π的数字中)。


hash(float('inf'))的值是用于数值类型的内置哈希函数的系统相关参数之一,也可以通过sys.hash_info获得。Python 3中的inf:

>>> import sys
>>> sys.hash_info
sys.hash_info(width=64, modulus=2305843009213693951, inf=314159, nan=0, imag=1000003, algorithm='siphash24', hash_bits=64, seed_bits=128, cutoff=0)
>>> sys.hash_info.inf
314159

(PyPy的结果也一样。)


就代码而言,哈希是一个内置函数。在Python浮点对象上调用它调用的函数,其指针由内置浮点类型(PyTypeObject PyFloat_Type)的tp_hash属性给出,该函数是float_hash函数,定义为return _Py_HashDouble(v->ob_fval),该函数反过来具有

    if (Py_IS_INFINITY(v))
        return v > 0 ? _PyHASH_INF : -_PyHASH_INF;

其中_PyHASH_INF定义为314159:

#define _PyHASH_INF 314159

就历史而言,在Python代码中第一次提到314159(你可以通过git bisect或git log -S 314159 -p找到它)是由Tim Peters在2000年8月添加的,现在在cpython git存储库中提交了39dce293。

提交消息说:

Fix for http://sourceforge.net/bugs/?func=detailbug&bug_id=111866&group_id=5470. This was a misleading bug -- the true "bug" was that hash(x) gave an error return when x is an infinity. Fixed that. Added new Py_IS_INFINITY macro to pyport.h. Rearranged code to reduce growing duplication in hashing of float and complex numbers, pushing Trent's earlier stab at that to a logical conclusion. Fixed exceedingly rare bug where hashing of floats could return -1 even if there wasn't an error (didn't waste time trying to construct a test case, it was simply obvious from the code that it could happen). Improved complex hash so that hash(complex(x, y)) doesn't systematically equal hash(complex(y, x)) anymore.

特别是,在这次提交中,他删除了Objects/floatobject.c中静态长float_hash(PyFloatObject *v)的代码,并使其只返回_Py_HashDouble(v->ob_fval);,并且在Objects/object.c中long _Py_HashDouble(double v)的定义中,他添加了以下几行:

        if (Py_IS_INFINITY(intpart))
            /* can't convert to long int -- arbitrary */
            v = v < 0 ? -271828.0 : 314159.0;

如前所述,这是一个随意的选择。请注意,271828是由e的前几个十进制数字组成的。

相关后续提交:

By Mark Dickinson in Apr 2010 (also), making the Decimal type behave similarly By Mark Dickinson in Apr 2010 (also), moving this check to the top and adding test cases By Mark Dickinson in May 2010 as issue 8188, completely rewriting the hash function to its current implementation, but retaining this special case, giving the constant a name _PyHASH_INF (also removing the 271828 which is why in Python 3 hash(float('-inf')) returns -314159 rather than -271828 as it does in Python 2) By Raymond Hettinger in Jan 2011, adding an explicit example in the "What's new" for Python 3.2 of sys.hash_info showing the above value. (See here.) By Stefan Krah in Mar 2012 modifying the Decimal module but keeping this hash. By Christian Heimes in Nov 2013, moved the definition of _PyHASH_INF from Include/pyport.h to Include/pyhash.h where it now lives.


的确,

sys.hash_info.inf

返回314159。该值不是生成的,而是内置于源代码中。 事实上,

hash(float('-inf'))

在python 2中返回-271828,或大约-e(现在是-314159)。

事实上,有史以来最著名的两个无理数被用作哈希值,这不太可能是巧合。