哈希字典?

为了缓存目的，我需要从字典中存在的GET参数生成一个缓存键。

目前，我正在使用sha1(repr(sorted(my_dict.items()))) (sha1()是一个内部使用hashlib的方便方法)，但我很好奇是否有更好的方法。

当前回答

使用sorted(d.s items())并不足以获得稳定的repr。d中的一些值也可以是字典，它们的键仍然会以任意顺序出现。只要所有的键都是字符串，我更喜欢使用:

json.dumps(d, sort_keys=True)

也就是说，如果散列需要在不同的机器或Python版本之间保持稳定，我不确定这是万无一失的。您可能希望添加分隔符和ensure_ascii参数，以保护自己不受对默认值的任何更改的影响。我很感激你的评论。

2014-02-25 02:29:57

其他回答

如果你的字典不是嵌套的，你可以用字典的项创建一个frozenset，并使用hash():

hash(frozenset(my_dict.items()))

与生成JSON字符串或字典表示相比，这需要的计算量要小得多。

更新:请参阅下面的评论，为什么这种方法可能不会产生稳定的结果。

2011-05-04 13:24:33

编辑:如果你所有的键都是字符串，那么在继续阅读这个答案之前，请参阅Jack O'Connor的更简单(更快)的解决方案(它也适用于嵌套字典)。

虽然答案已经被接受，但问题的标题是“哈希一个python字典”，关于这个标题的答案是不完整的。(关于问题的主体，答案是完整的。)

嵌套的字典

如果一个人在Stack Overflow上搜索如何散列字典，他可能会遇到这个恰当的标题问题，如果他试图散列多重嵌套字典，他可能会感到不满意。上面的答案在这种情况下不起作用，您必须实现某种递归机制来检索散列。

下面是一个这样的机制:

import copy

def make_hash(o):

  """
  Makes a hash from a dictionary, list, tuple or set to any level, that contains
  only other hashable types (including any lists, tuples, sets, and
  dictionaries).
  """

  if isinstance(o, (set, tuple, list)):

    return tuple([make_hash(e) for e in o])    

  elif not isinstance(o, dict):

    return hash(o)

  new_o = copy.deepcopy(o)
  for k, v in new_o.items():
    new_o[k] = make_hash(v)

  return hash(tuple(frozenset(sorted(new_o.items()))))

奖励:哈希对象和类

hash()函数在散列类或实例时工作得很好。然而，关于对象，我发现了一个关于哈希的问题:

class Foo(object): pass
foo = Foo()
print (hash(foo)) # 1209812346789
foo.a = 1
print (hash(foo)) # 1209812346789

哈希值是一样的，即使我改变了foo。这是因为foo的单位没有改变，所以哈希值是一样的。如果你想让foo根据它的当前定义进行不同的哈希，解决方案是哈希掉任何实际发生变化的东西。在本例中，__dict__属性:

class Foo(object): pass
foo = Foo()
print (make_hash(foo.__dict__)) # 1209812346789
foo.a = 1
print (make_hash(foo.__dict__)) # -78956430974785

唉，当你试图对类本身做同样的事情时:

print (make_hash(Foo.__dict__)) # TypeError: unhashable type: 'dict_proxy'

类__dict__属性不是一个普通的字典:

print (type(Foo.__dict__)) # type <'dict_proxy'>

这是一个类似于前面的机制，将适当地处理类:

import copy

DictProxyType = type(object.__dict__)

def make_hash(o):

  """
  Makes a hash from a dictionary, list, tuple or set to any level, that 
  contains only other hashable types (including any lists, tuples, sets, and
  dictionaries). In the case where other kinds of objects (like classes) need 
  to be hashed, pass in a collection of object attributes that are pertinent. 
  For example, a class can be hashed in this fashion:

    make_hash([cls.__dict__, cls.__name__])

  A function can be hashed like so:

    make_hash([fn.__dict__, fn.__code__])
  """

  if type(o) == DictProxyType:
    o2 = {}
    for k, v in o.items():
      if not k.startswith("__"):
        o2[k] = v
    o = o2  

  if isinstance(o, (set, tuple, list)):

    return tuple([make_hash(e) for e in o])    

  elif not isinstance(o, dict):

    return hash(o)

  new_o = copy.deepcopy(o)
  for k, v in new_o.items():
    new_o[k] = make_hash(v)

  return hash(tuple(frozenset(sorted(new_o.items()))))

你可以使用this返回一个包含任意数量元素的哈希元组:

# -7666086133114527897
print (make_hash(func.__code__))

# (-7666086133114527897, 3527539)
print (make_hash([func.__code__, func.__dict__]))

# (-7666086133114527897, 3527539, -509551383349783210)
print (make_hash([func.__code__, func.__dict__, func.__name__]))

注意:以上所有代码都假设Python 3.x。没有在早期版本中测试，尽管我假设make_hash()将在2.7.2中工作。至于让例子起作用，我确实知道

func.__code__

应该用

func.func_code

2012-01-03 15:05:37

更新自2013年回复…

以上答案在我看来都不可靠。原因是使用了items()。据我所知，这是一个依赖于机器的顺序。

这个怎么样?

import hashlib

def dict_hash(the_dict, *ignore):
    if ignore:  # Sometimes you don't care about some items
        interesting = the_dict.copy()
        for item in ignore:
            if item in interesting:
                interesting.pop(item)
        the_dict = interesting
    result = hashlib.sha1(
        '%s' % sorted(the_dict.items())
    ).hexdigest()
    return result

2013-03-04 18:10:36

这里有一个更清晰的解决方案。

def freeze(o):
  if isinstance(o,dict):
    return frozenset({ k:freeze(v) for k,v in o.items()}.items())

  if isinstance(o,list):
    return tuple([freeze(v) for v in o])

  return o


def make_hash(o):
    """
    makes a hash out of anything that contains only list,dict and hashable types including string and numeric types
    """
    return hash(freeze(o))

2014-02-06 21:13:38

使用sorted(d.s items())并不足以获得稳定的repr。d中的一些值也可以是字典，它们的键仍然会以任意顺序出现。只要所有的键都是字符串，我更喜欢使用:

json.dumps(d, sort_keys=True)

2014-02-25 02:29:57

推荐文章

最新文章

标签