我目前正在尝试Python 3.7中引入的新数据类结构。我目前被困在试图做一些继承的父类。看起来参数的顺序被我当前的方法搞砸了,比如子类中的bool形参在其他形参之前传递。这将导致一个类型错误。

from dataclasses import dataclass

@dataclass
class Parent:
    name: str
    age: int
    ugly: bool = False

    def print_name(self):
        print(self.name)

    def print_age(self):
        print(self.age)

    def print_id(self):
        print(f'The Name is {self.name} and {self.name} is {self.age} year old')

@dataclass
class Child(Parent):
    school: str
    ugly: bool = True


jack = Parent('jack snr', 32, ugly=True)
jack_son = Child('jack jnr', 12, school = 'havard', ugly=True)

jack.print_id()
jack_son.print_id()

当我运行这段代码时,我得到这个TypeError:

TypeError: non-default argument 'school' follows default argument

我怎么解决这个问题?


当前回答

在发现数据类可能会获得一个允许字段重新排序的装饰器参数后,我回到了这个问题。这无疑是一个有希望的发展,尽管这一功能的进展似乎有些停滞。

现在,您可以通过使用dataclassy(我对数据类的重新实现,克服了这种挫折)来获得这种行为,以及其他一些细节。在原始示例中使用from dataclassy来代替from dataclassy意味着它运行时没有错误。

使用inspect打印Child的签名使正在发生的事情变得清晰;结果是(name: str, age: int, school: str, ugly: bool = True)。字段总是重新排序,以便在初始化式的参数中,具有默认值的字段位于不具有默认值的字段之后。两个列表(没有默认值的字段和有默认值的字段)仍然按照定义顺序排序。

面对这个问题是促使我编写数据类替代品的因素之一。这里详细介绍的变通方法虽然很有用,但要求将代码扭曲到完全否定数据类的简单方法(即字段顺序可以简单地预测)所提供的可读性优势的程度。

其他回答

数据类组合属性的方式阻止您在基类中使用带有默认值的属性,然后在子类中使用没有默认值的属性(位置属性)。

That's because the attributes are combined by starting from the bottom of the MRO, and building up an ordered list of the attributes in first-seen order; overrides are kept in their original location. So Parent starts out with ['name', 'age', 'ugly'], where ugly has a default, and then Child adds ['school'] to the end of that list (with ugly already in the list). This means you end up with ['name', 'age', 'ugly', 'school'] and because school doesn't have a default, this results in an invalid argument listing for __init__.

这被记录在PEP-557数据类中,在继承下:

When the Data Class is being created by the @dataclass decorator, it looks through all of the class's base classes in reverse MRO (that is, starting at object) and, for each Data Class that it finds, adds the fields from that base class to an ordered mapping of fields. After all of the base class fields are added, it adds its own fields to the ordered mapping. All of the generated methods will use this combined, calculated ordered mapping of fields. Because the fields are in insertion order, derived classes override base classes.

规格项下:

如果一个没有默认值的字段紧跟在一个有默认值的字段之后,将引发TypeError。无论是在单个类中发生这种情况,还是作为类继承的结果,都是如此。

您确实有一些选择来避免这个问题。

第一个选项是使用单独的基类,将具有默认值的字段强制放到MRO顺序的后面位置。无论如何,避免直接在要用作基类的类上设置字段,例如Parent。

下面的类层次结构可以工作:

# base classes with fields; fields without defaults separate from fields with.
@dataclass
class _ParentBase:
    name: str
    age: int
    
@dataclass
class _ParentDefaultsBase:
    ugly: bool = False

@dataclass
class _ChildBase(_ParentBase):
    school: str

@dataclass
class _ChildDefaultsBase(_ParentDefaultsBase):
    ugly: bool = True

# public classes, deriving from base-with, base-without field classes
# subclasses of public classes should put the public base class up front.

@dataclass
class Parent(_ParentDefaultsBase, _ParentBase):
    def print_name(self):
        print(self.name)

    def print_age(self):
        print(self.age)

    def print_id(self):
        print(f"The Name is {self.name} and {self.name} is {self.age} year old")

@dataclass
class Child(_ChildDefaultsBase, Parent, _ChildBase):
    pass

通过将字段提取到具有无默认字段和具有默认字段的独立基类中,并仔细选择继承顺序,您可以生成一个MRO,将所有无默认字段放在具有默认字段之前。Child的反向MRO(忽略对象)是:

_ParentBase
_ChildBase
_ParentDefaultsBase
Parent
_ChildDefaultsBase

注意,虽然Parent没有设置任何新字段,但它确实从_ParentDefaultsBase继承了字段,并且不应该在字段列表顺序中以“最后”结束;上面的顺序把_ChildDefaultsBase放在最后,所以它的字段“win”。数据类规则也得到了满足;带默认字段的类(_ParentBase和_ChildBase)位于带默认字段的类(_ParentDefaultsBase和_ChildDefaultsBase)前面。

结果是父类和子类的字段都是旧的,而Child仍然是Parent的子类:

>>> from inspect import signature
>>> signature(Parent)
<Signature (name: str, age: int, ugly: bool = False) -> None>
>>> signature(Child)
<Signature (name: str, age: int, school: str, ugly: bool = True) -> None>
>>> issubclass(Child, Parent)
True

所以你可以创建这两个类的实例:

>>> jack = Parent('jack snr', 32, ugly=True)
>>> jack_son = Child('jack jnr', 12, school='havard', ugly=True)
>>> jack
Parent(name='jack snr', age=32, ugly=True)
>>> jack_son
Child(name='jack jnr', age=12, school='havard', ugly=True)

另一种选择是只使用默认字段;你仍然可以通过在__post_init__中引发一个错误来不提供学校值:

_no_default = object()

@dataclass
class Child(Parent):
    school: str = _no_default
    ugly: bool = True

    def __post_init__(self):
        if self.school is _no_default:
            raise TypeError("__init__ missing 1 required argument: 'school'")

但这确实改变了场的顺序;学校结束后丑陋:

<Signature (name: str, age: int, ugly: bool = True, school: str = <object object at 0x1101d1210>) -> None>

类型提示检查器会提示_no_default不是字符串。

您还可以使用attrs项目,该项目激发了数据类的灵感。它使用了不同的继承合并策略;它将子类中被覆盖的字段拉到字段列表的末尾,因此父类中的['name', 'age', 'ugly']在子类中变成了['name', 'age', 'school', 'ugly'];通过使用默认值重写字段,attrs允许重写而不需要执行MRO舞蹈。

attrs支持定义没有类型提示的字段,但是让我们通过设置auto_attribs=True坚持支持的类型提示模式:

import attr

@attr.s(auto_attribs=True)
class Parent:
    name: str
    age: int
    ugly: bool = False

    def print_name(self):
        print(self.name)

    def print_age(self):
        print(self.age)

    def print_id(self):
        print(f"The Name is {self.name} and {self.name} is {self.age} year old")

@attr.s(auto_attribs=True)
class Child(Parent):
    school: str
    ugly: bool = True

您看到此错误是因为在具有默认值的实参之后添加了没有默认值的实参。继承字段到数据类中的插入顺序与方法解析顺序相反,这意味着父字段放在前面,即使它们稍后被它们的子字段覆盖。

来自PEP-557 -数据类的示例:

@dataclass 阶级基础: x: Any = 15.0 Y: int = 0 @dataclass C类(基础): Z: int = 10 X: int = 15 最终的字段列表是,按顺序,x, y, z。x的最终类型是int,在类C中指定。

不幸的是,我认为没有其他办法。我的理解是,如果父类有默认实参,那么子类就不能有非默认实参。

当您使用Python继承创建数据类时,不能保证所有具有默认值的字段将出现在所有没有默认值的字段之后。

一个简单的解决方案是避免使用多重继承来构造“合并”数据类。相反,我们可以通过对父数据类的字段进行过滤和排序来构建合并的数据类。

试试这个merge_dataclasses()函数:

import dataclasses
import functools
from typing import Iterable, Type


def merge_dataclasses(
    cls_name: str,
    *,
    merge_from: Iterable[Type],
    **kwargs,
):
    """
    Construct a dataclass by merging the fields
    from an arbitrary number of dataclasses.

    Args:
        cls_name: The name of the constructed dataclass.

        merge_from: An iterable of dataclasses
            whose fields should be merged.

        **kwargs: Keyword arguments are passed to
            :py:func:`dataclasses.make_dataclass`.

    Returns:
        Returns a new dataclass
    """
    # Merge the fields from the dataclasses,
    # with field names from later dataclasses overwriting
    # any conflicting predecessor field names.
    each_base_fields = [d.__dataclass_fields__ for d in merge_from]
    merged_fields = functools.reduce(
        lambda x, y: {**x, **y}, each_base_fields
    )

    # We have to reorder all of the fields from all of the dataclasses
    # so that *all* of the fields without defaults appear
    # in the merged dataclass *before* all of the fields with defaults.
    fields_without_defaults = [
        (f.name, f.type, f)
        for f in merged_fields.values()
        if isinstance(f.default, dataclasses._MISSING_TYPE)
    ]
    fields_with_defaults = [
        (f.name, f.type, f)
        for f in merged_fields.values()
        if not isinstance(f.default, dataclasses._MISSING_TYPE)
    ]
    fields = [*fields_without_defaults, *fields_with_defaults]

    return dataclasses.make_dataclass(
        cls_name=cls_name,
        fields=fields,
        **kwargs,
    )

然后,您可以按照如下方式合并数据类。注意,我们可以合并A和B,默认字段B和d被移动到合并的数据类的末尾。

@dataclasses.dataclass
class A:
    a: int
    b: int = 0


@dataclasses.dataclass
class B:
    c: int
    d: int = 0


C = merge_dataclasses(
    "C",
    merge_from=[A, B],
)

# Note that 
print(C(a=1, d=1).__dict__)
# {'a': 1, 'd': 1, 'b': 0, 'c': 0}

当然,这种解决方案的缺陷是C实际上不继承A和B,这意味着您不能使用isinstance()或其他类型断言来验证C的亲本。

如何像这样定义丑陋的字段,而不是默认的方式?

ugly: bool = field(metadata=dict(required=False, missing=False))

一个实验性但有趣的解决方案是使用元类。下面的解决方案允许使用带有简单继承的Python数据类,而完全不使用数据类装饰器。此外,它可以继承父基类的字段,而不必抱怨位置参数的顺序(非默认字段)。

from collections import OrderedDict
import typing as ty
import dataclasses
from itertools import takewhile

class DataClassTerm:
    def __new__(cls, *args, **kwargs):
        return super().__new__(cls)

class DataClassMeta(type):
    def __new__(cls, clsname, bases, clsdict):
        fields = {}

        # Get list of base classes including the class to be produced(initialized without its original base classes as those have already become dataclasses)
        bases_and_self = [dataclasses.dataclass(super().__new__(cls, clsname, (DataClassTerm,), clsdict))] + list(bases)

        # Whatever is a subclass of DataClassTerm will become a DataClassTerm. 
        # Following block will iterate and create individual dataclasses and collect their fields
        for base in bases_and_self[::-1]: # Ensure that last fields in last base is prioritized
            if issubclass(base, DataClassTerm):
                to_dc_bases = list(takewhile(lambda c: c is not DataClassTerm, base.__mro__))
                for dc_base in to_dc_bases[::-1]: # Ensure that last fields in last base in MRO is prioritized(same as in dataclasses)
                    if dataclasses.is_dataclass(dc_base):
                        valid_dc = dc_base
                    else:
                        valid_dc = dataclasses.dataclass(dc_base)
                    for field in dataclasses.fields(valid_dc):
                        fields[field.name] = (field.name, field.type, field)
        
        # Following block will reorder the fields so that fields without default values are first in order
        reordered_fields = OrderedDict()
        for n, t, f  in fields.values():
            if f.default is dataclasses.MISSING and f.default_factory is dataclasses.MISSING:
                reordered_fields[n] = (n, t, f)
        for n, t, f  in fields.values():
            if n not in reordered_fields.keys():
                reordered_fields[n] = (n, t, f)
        
        # Create a new dataclass using `dataclasses.make_dataclass`, which ultimately calls type.__new__, which is the same as super().__new__ in our case
        fields = list(reordered_fields.values())
        full_dc = dataclasses.make_dataclass(cls_name=clsname, fields=fields, init=True, bases=(DataClassTerm,))
        
        # Discard the created dataclass class and create new one using super but preserve the dataclass specific namespace.
        return super().__new__(cls, clsname, bases, {**full_dc.__dict__,**clsdict})
    
class DataClassCustom(DataClassTerm, metaclass=DataClassMeta):
    def __new__(cls, *args, **kwargs):
        if len(args)>0:
            raise RuntimeError("Do not use positional arguments for initialization.")
        return super().__new__(cls, *args, **kwargs)

现在让我们创建一个带有父数据类和混合类的样本数据类:

class DataClassCustomA(DataClassCustom):
    field_A_1: int = dataclasses.field()
    field_A_2: ty.AnyStr = dataclasses.field(default=None)

class SomeOtherClass:
    def methodA(self):
        print('print from SomeOtherClass().methodA')

class DataClassCustomB(DataClassCustomA,SomeOtherClass):
    field_B_1: int = dataclasses.field()
    field_B_2: ty.Dict = dataclasses.field(default_factory=dict)

结果是

result_b = DataClassCustomB(field_A_1=1, field_B_1=2)

result_b
# DataClassCustomB(field_A_1=1, field_B_1=2, field_A_2=None, field_B_2={})

result_b.methodA()
# print from SomeOtherClass().methodA

尝试在每个父类上使用@dataclass装饰器做同样的事情会在接下来的子类中引发一个异常,如TypeError(f'non-default argument <field-name)跟随默认参数')。上面的解决方案防止了这种情况的发生,因为字段首先被重新排序。然而,由于字段的顺序被修改了,在DataClassCustom中防止*args的使用。__new__是强制的,因为原来的顺序不再有效。

虽然在Python >=3.10中引入了kw_only特性,本质上使数据类中的继承更加可靠,但上面的示例仍然可以用作一种使数据类可继承的方法,而不需要使用@dataclass装饰器。