我怎样才能使dict的子类尽可能“完美”?最终目标是有一个键是小写的简单字典。

似乎应该有一些我可以覆盖的基本集合来实现这个工作,但根据我所有的研究和尝试,情况似乎并非如此:

如果我重写__getitem__/__setitem__,那么get/set将不起作用。我怎样才能使它们发挥作用呢?当然,我不需要单独实现它们? 我阻止酸洗工作,我需要实现__setstate__等? 我需要repr, update和__init__吗? 我应该只使用mutablemapping(似乎不应该使用UserDict ? 或DictMixin) ?如果有,怎么做?这些文件并不是很有启发性。

这是我的第一次尝试,get()不起作用,毫无疑问还有许多其他小问题:

class arbitrary_dict(dict):
    """A dictionary that applies an arbitrary key-altering function
       before accessing the keys."""

    def __keytransform__(self, key):
        return key

    # Overridden methods. List from 
    # https://stackoverflow.com/questions/2390827/how-to-properly-subclass-dict

    def __init__(self, *args, **kwargs):
        self.update(*args, **kwargs)

    # Note: I'm using dict directly, since super(dict, self) doesn't work.
    # I'm not sure why, perhaps dict is not a new-style class.

    def __getitem__(self, key):
        return dict.__getitem__(self, self.__keytransform__(key))

    def __setitem__(self, key, value):
        return dict.__setitem__(self, self.__keytransform__(key), value)

    def __delitem__(self, key):
        return dict.__delitem__(self, self.__keytransform__(key))

    def __contains__(self, key):
        return dict.__contains__(self, self.__keytransform__(key))


class lcdict(arbitrary_dict):
    def __keytransform__(self, key):
        return str(key).lower()

当前回答

你可以很容易地用集合中的abc(抽象基类)编写一个行为类似dict的对象。美国广播公司(abc)模块。它甚至会告诉您是否错过了一个方法,因此下面是关闭ABC的最小版本。

from collections.abc import MutableMapping


class TransformedDict(MutableMapping):
    """A dictionary that applies an arbitrary key-altering
       function before accessing the keys"""

    def __init__(self, *args, **kwargs):
        self.store = dict()
        self.update(dict(*args, **kwargs))  # use the free update to set keys

    def __getitem__(self, key):
        return self.store[self._keytransform(key)]

    def __setitem__(self, key, value):
        self.store[self._keytransform(key)] = value

    def __delitem__(self, key):
        del self.store[self._keytransform(key)]

    def __iter__(self):
        return iter(self.store)
    
    def __len__(self):
        return len(self.store)

    def _keytransform(self, key):
        return key

你可以从ABC中得到一些免费的方法:

class MyTransformedDict(TransformedDict):

    def _keytransform(self, key):
        return key.lower()


s = MyTransformedDict([('Test', 'test')])

assert s.get('TEST') is s['test']   # free get
assert 'TeSt' in s                  # free __contains__
                                    # free setdefault, __eq__, and so on

import pickle
# works too since we just use a normal dict
assert pickle.loads(pickle.dumps(s)) == s

我不会直接子类化dict(或其他内置)。这通常没有意义,因为您实际上要做的是实现字典的接口。这就是abc的意义所在。

其他回答

集合。当需要自定义字典时,UserDict通常是最简单的选项。

正如另一个答案所示,正确地覆盖dict是非常棘手的,而UserDict使它变得很容易。要回答原来的问题,你可以得到一个低键字典:

import collections

class LowercaseDict(collections.UserDict):

  def __getitem__(self, key):
    return super().__getitem__(key.lower())

  def __setitem__(self, key, value):
    return super().__setitem__(key.lower(), value)

  def __delitem__(self, key):
    return super().__delitem__(key.lower())

  # Unfortunately, __contains__ is required currently due to
  # https://github.com/python/cpython/issues/91784
  def __contains__(self, key):
    return key.lower() in self.data


d = LowercaseDict(MY_KEY=0)  # Keys normalized in .__init__
d.update({'OTHER_KEY': 1})  # Keys normalized in .update
d['Hello'] = d['other_KEY']
assert 'HELLO' in d
print(d)  # All keys normalized {'my_key': 0, 'other_key': 1, 'hello': 1}

与集合相反。abc。MutableMapping,你不需要__iter__、__len__ __init__,…… 子类化UserDict要容易得多。

然而UserDict是一个MutableMapping,而不是dict,所以:

assert not isinstance(collections.UserDict(), dict)
assert isinstance(collections.UserDict(), collections.abc.MutableMapping)

我的要求更严格一些:

I had to retain case info (the strings are paths to files displayed to the user, but it's a windows app so internally all operations must be case insensitive) I needed keys to be as small as possible (it did make a difference in memory performance, chopped off 110 mb out of 370). This meant that caching lowercase version of keys is not an option. I needed creation of the data structures to be as fast as possible (again made a difference in performance, speed this time). I had to go with a builtin

我最初的想法是用一个不区分大小写的unicode子类替换我们笨拙的Path类-但是:

事实证明很难正确-参见:python中不区分大小写的字符串类 结果是显式dict键处理使代码冗长和混乱-并且容易出错(结构体到处传递,并且不清楚它们是否有CIStr实例作为键/元素,很容易忘记加上some_dict[CIStr(路径)]是丑陋的)

所以我最后不得不写下那个不区分大小写的字典。多亏了@AaronHall编写的代码,游戏变得简单了10倍。

class CIstr(unicode):
    """See https://stackoverflow.com/a/43122305/281545, especially for inlines"""
    __slots__ = () # does make a difference in memory performance

    #--Hash/Compare
    def __hash__(self):
        return hash(self.lower())
    def __eq__(self, other):
        if isinstance(other, CIstr):
            return self.lower() == other.lower()
        return NotImplemented
    def __ne__(self, other):
        if isinstance(other, CIstr):
            return self.lower() != other.lower()
        return NotImplemented
    def __lt__(self, other):
        if isinstance(other, CIstr):
            return self.lower() < other.lower()
        return NotImplemented
    def __ge__(self, other):
        if isinstance(other, CIstr):
            return self.lower() >= other.lower()
        return NotImplemented
    def __gt__(self, other):
        if isinstance(other, CIstr):
            return self.lower() > other.lower()
        return NotImplemented
    def __le__(self, other):
        if isinstance(other, CIstr):
            return self.lower() <= other.lower()
        return NotImplemented
    #--repr
    def __repr__(self):
        return '{0}({1})'.format(type(self).__name__,
                                 super(CIstr, self).__repr__())

def _ci_str(maybe_str):
    """dict keys can be any hashable object - only call CIstr if str"""
    return CIstr(maybe_str) if isinstance(maybe_str, basestring) else maybe_str

class LowerDict(dict):
    """Dictionary that transforms its keys to CIstr instances.
    Adapted from: https://stackoverflow.com/a/39375731/281545
    """
    __slots__ = () # no __dict__ - that would be redundant

    @staticmethod # because this doesn't make sense as a global function.
    def _process_args(mapping=(), **kwargs):
        if hasattr(mapping, 'iteritems'):
            mapping = getattr(mapping, 'iteritems')()
        return ((_ci_str(k), v) for k, v in
                chain(mapping, getattr(kwargs, 'iteritems')()))
    def __init__(self, mapping=(), **kwargs):
        # dicts take a mapping or iterable as their optional first argument
        super(LowerDict, self).__init__(self._process_args(mapping, **kwargs))
    def __getitem__(self, k):
        return super(LowerDict, self).__getitem__(_ci_str(k))
    def __setitem__(self, k, v):
        return super(LowerDict, self).__setitem__(_ci_str(k), v)
    def __delitem__(self, k):
        return super(LowerDict, self).__delitem__(_ci_str(k))
    def copy(self): # don't delegate w/ super - dict.copy() -> dict :(
        return type(self)(self)
    def get(self, k, default=None):
        return super(LowerDict, self).get(_ci_str(k), default)
    def setdefault(self, k, default=None):
        return super(LowerDict, self).setdefault(_ci_str(k), default)
    __no_default = object()
    def pop(self, k, v=__no_default):
        if v is LowerDict.__no_default:
            # super will raise KeyError if no default and key does not exist
            return super(LowerDict, self).pop(_ci_str(k))
        return super(LowerDict, self).pop(_ci_str(k), v)
    def update(self, mapping=(), **kwargs):
        super(LowerDict, self).update(self._process_args(mapping, **kwargs))
    def __contains__(self, k):
        return super(LowerDict, self).__contains__(_ci_str(k))
    @classmethod
    def fromkeys(cls, keys, v=None):
        return super(LowerDict, cls).fromkeys((_ci_str(k) for k in keys), v)
    def __repr__(self):
        return '{0}({1})'.format(type(self).__name__,
                                 super(LowerDict, self).__repr__())

隐式vs显式仍然是一个问题,但一旦问题解决了,重命名属性/变量以ci开头(以及一个很大的文档注释解释ci代表大小写不敏感)我认为是一个完美的解决方案-因为代码的读者必须充分意识到我们正在处理不区分大小写的底层数据结构。 这将有望修复一些难以重现的错误,我怀疑归结为大小写敏感性。

欢迎评论/更正:)

How can I make as "perfect" a subclass of dict as possible? The end goal is to have a simple dict in which the keys are lowercase. If I override __getitem__/__setitem__, then get/set don't work. How do I make them work? Surely I don't need to implement them individually? Am I preventing pickling from working, and do I need to implement __setstate__ etc? Do I need repr, update and __init__? Should I just use mutablemapping (it seems one shouldn't use UserDict or DictMixin)? If so, how? The docs aren't exactly enlightening.

公认的答案是我的第一种方法,但由于它有一些问题, 因为没有人说过另一种方法,实际上是子类化字典,我在这里做一下。

公认的答案有什么问题?

这对我来说似乎是一个相当简单的要求:

我怎样才能使dict的子类尽可能“完美”? 最终目标是有一个键是小写的简单字典。

接受的答案实际上并没有继承dict,对此的测试失败:

>>> isinstance(MyTransformedDict([('Test', 'test')]), dict)
False

理想情况下,任何类型检查代码都将测试我们所期望的接口或抽象基类,但如果我们的数据对象被传递到测试dict的函数中,而我们不能“修复”这些函数,那么这段代码将失败。

人们可能会提出其他吹毛求疵的观点:

接受的答案也缺少类方法:fromkeys。 接受的答案也有一个多余的__dict__ -因此占用更多的内存空间: >>> .foo = 'bar' > > > s.__dict__ {'foo': 'bar', 'store': {'test': 'test'}}

实际上是继承dict

我们可以通过继承重用dict方法。我们所需要做的就是创建一个接口层,以确保如果键是字符串,则以小写形式传递到字典中。

如果我重写__getitem__/__setitem__,那么get/set将不起作用。我怎么让它们工作?当然,我不需要单独实现它们?

嗯,单独实现它们是这种方法的缺点,而使用MutableMapping的优点(请参阅已接受的答案),但它确实没有那么多工作。

首先,让我们提出Python 2和Python 3之间的区别,创建一个单例(_RaiseKeyError),以确保我们知道是否实际上得到了dict的参数。弹出,并创建一个函数来确保我们的字符串键是小写的:

from itertools import chain
try:              # Python 2
    str_base = basestring
    items = 'iteritems'
except NameError: # Python 3
    str_base = str, bytes, bytearray
    items = 'items'

_RaiseKeyError = object() # singleton for no-default behavior

def ensure_lower(maybe_str):
    """dict keys can be any hashable object - only call lower if str"""
    return maybe_str.lower() if isinstance(maybe_str, str_base) else maybe_str

现在我们实现-我使用super和完整的参数,以便这段代码适用于Python 2和3:

class LowerDict(dict):  # dicts take a mapping or iterable as their optional first argument
    __slots__ = () # no __dict__ - that would be redundant
    @staticmethod # because this doesn't make sense as a global function.
    def _process_args(mapping=(), **kwargs):
        if hasattr(mapping, items):
            mapping = getattr(mapping, items)()
        return ((ensure_lower(k), v) for k, v in chain(mapping, getattr(kwargs, items)()))
    def __init__(self, mapping=(), **kwargs):
        super(LowerDict, self).__init__(self._process_args(mapping, **kwargs))
    def __getitem__(self, k):
        return super(LowerDict, self).__getitem__(ensure_lower(k))
    def __setitem__(self, k, v):
        return super(LowerDict, self).__setitem__(ensure_lower(k), v)
    def __delitem__(self, k):
        return super(LowerDict, self).__delitem__(ensure_lower(k))
    def get(self, k, default=None):
        return super(LowerDict, self).get(ensure_lower(k), default)
    def setdefault(self, k, default=None):
        return super(LowerDict, self).setdefault(ensure_lower(k), default)
    def pop(self, k, v=_RaiseKeyError):
        if v is _RaiseKeyError:
            return super(LowerDict, self).pop(ensure_lower(k))
        return super(LowerDict, self).pop(ensure_lower(k), v)
    def update(self, mapping=(), **kwargs):
        super(LowerDict, self).update(self._process_args(mapping, **kwargs))
    def __contains__(self, k):
        return super(LowerDict, self).__contains__(ensure_lower(k))
    def copy(self): # don't delegate w/ super - dict.copy() -> dict :(
        return type(self)(self)
    @classmethod
    def fromkeys(cls, keys, v=None):
        return super(LowerDict, cls).fromkeys((ensure_lower(k) for k in keys), v)
    def __repr__(self):
        return '{0}({1})'.format(type(self).__name__, super(LowerDict, self).__repr__())

对于任何引用键的方法或特殊方法,我们使用几乎是样板的方法,但除此之外,通过继承,我们免费获得方法:len、clear、items、keys、popitem和values。虽然这需要一些仔细的思考才能正确,但看到这是可行的是微不足道的。

(注意haskey在Python 2中已弃用,在Python 3中已被移除。)

下面是一些用法:

>>> ld = LowerDict(dict(foo='bar'))
>>> ld['FOO']
'bar'
>>> ld['foo']
'bar'
>>> ld.pop('FoO')
'bar'
>>> ld.setdefault('Foo')
>>> ld
{'foo': None}
>>> ld.get('Bar')
>>> ld.setdefault('Bar')
>>> ld
{'bar': None, 'foo': None}
>>> ld.popitem()
('bar', None)

我是否阻止了酸洗工作,我是否需要实施 __setstate__等等?

酸洗

dict子类pickles很好:

>>> import pickle
>>> pickle.dumps(ld)
b'\x80\x03c__main__\nLowerDict\nq\x00)\x81q\x01X\x03\x00\x00\x00fooq\x02Ns.'
>>> pickle.loads(pickle.dumps(ld))
{'foo': None}
>>> type(pickle.loads(pickle.dumps(ld)))
<class '__main__.LowerDict'>

__repr__

我需要repr, update和__init__吗?

我们定义了update和__init__,但你默认有一个漂亮的__repr__:

>>> ld # without __repr__ defined for the class, we get this
{'foo': None}

然而,编写__repr__来提高代码的可调试性是很好的。理想的测试方法是eval(repr(obj)) == obj。如果这对你的代码来说很容易做到,我强烈推荐:

>>> ld = LowerDict({})
>>> eval(repr(ld)) == ld
True
>>> ld = LowerDict(dict(a=1, b=2, c=3))
>>> eval(repr(ld)) == ld
True

你看,这正是我们重新创建一个等效对象所需要的东西——这可能会在我们的日志或回溯中出现:

>>> ld
LowerDict({'a': 1, 'c': 3, 'b': 2})

结论

我应该只使用mutablemapping(似乎不应该使用UserDict ? 或DictMixin) ?如果有,怎么做?这些文件并不是很有启发性。

是的,这是几行代码,但它们的目的是全面的。我的第一个倾向是使用公认的答案, 如果它有问题,我会看看我的答案——因为它有点复杂,没有ABC来帮助我得到我的界面。

过早的优化是为了追求性能而追求更大的复杂性。 MutableMapping更简单——在其他条件相同的情况下,它得到一条即时边。尽管如此,为了列出所有的差异,让我们进行比较和对比。

我应该补充一点,曾经有人推动将一个类似的字典放入collections模块,但被拒绝了。你可能应该这样做:

my_dict[transform(key)]

它应该更容易调试。

比较和对比

使用MutableMapping实现了6个接口函数(缺少fromkeys),使用dict子类实现了11个接口函数。我不需要实现__iter__或__len__,但我必须实现get、setdefault、pop、update、copy、__contains__和fromkeys——但这些都相当简单,因为我可以对大多数这些实现使用继承。

MutableMapping在Python中实现了一些dict在C中实现的东西——所以我希望dict子类在某些情况下性能更好。

我们在两种方法中都得到了一个免费的__eq__——只有当另一个dict全是小写时,这两种方法才假定相等——但同样,我认为dict子类比较起来更快。

简介:

子类化MutableMapping更简单,bug的机会更少,但速度更慢,占用更多内存(参见冗余dict),并且isinstance(x, dict)失败 继承dict更快,使用更少的内存,并传递isinstance(x, dict),但实现起来更复杂。

哪个更完美?这取决于你对完美的定义。

你可以很容易地用集合中的abc(抽象基类)编写一个行为类似dict的对象。美国广播公司(abc)模块。它甚至会告诉您是否错过了一个方法,因此下面是关闭ABC的最小版本。

from collections.abc import MutableMapping


class TransformedDict(MutableMapping):
    """A dictionary that applies an arbitrary key-altering
       function before accessing the keys"""

    def __init__(self, *args, **kwargs):
        self.store = dict()
        self.update(dict(*args, **kwargs))  # use the free update to set keys

    def __getitem__(self, key):
        return self.store[self._keytransform(key)]

    def __setitem__(self, key, value):
        self.store[self._keytransform(key)] = value

    def __delitem__(self, key):
        del self.store[self._keytransform(key)]

    def __iter__(self):
        return iter(self.store)
    
    def __len__(self):
        return len(self.store)

    def _keytransform(self, key):
        return key

你可以从ABC中得到一些免费的方法:

class MyTransformedDict(TransformedDict):

    def _keytransform(self, key):
        return key.lower()


s = MyTransformedDict([('Test', 'test')])

assert s.get('TEST') is s['test']   # free get
assert 'TeSt' in s                  # free __contains__
                                    # free setdefault, __eq__, and so on

import pickle
# works too since we just use a normal dict
assert pickle.loads(pickle.dumps(s)) == s

我不会直接子类化dict(或其他内置)。这通常没有意义,因为您实际上要做的是实现字典的接口。这就是abc的意义所在。

你要做的就是

class BatchCollection(dict):
    def __init__(self, *args, **kwargs):
        dict.__init__(*args, **kwargs)

OR

class BatchCollection(dict):
    def __init__(self, inpt={}):
        super(BatchCollection, self).__init__(inpt)

我个人使用的示例用法

### EXAMPLE
class BatchCollection(dict):
    def __init__(self, inpt={}):
        dict.__init__(*args, **kwargs)

    def __setitem__(self, key, item):
        if (isinstance(key, tuple) and len(key) == 2
                and isinstance(item, collections.Iterable)):
            # self.__dict__[key] = item
            super(BatchCollection, self).__setitem__(key, item)
        else:
            raise Exception(
                "Valid key should be a tuple (database_name, table_name) "
                "and value should be iterable")

注意:只在python3中测试