如何使一个Python类序列化?
class FileItem:
def __init__(self, fname):
self.fname = fname
尝试序列化为JSON:
>>> import json
>>> x = FileItem('/foo/bar')
>>> json.dumps(x)
TypeError: Object of type 'FileItem' is not JSON serializable
如何使一个Python类序列化?
class FileItem:
def __init__(self, fname):
self.fname = fname
尝试序列化为JSON:
>>> import json
>>> x = FileItem('/foo/bar')
>>> json.dumps(x)
TypeError: Object of type 'FileItem' is not JSON serializable
当前回答
解决这个问题有很多方法。'ObjDict' (pip install object)是另一个。重点是提供像javascript一样的对象,它也可以像字典一样最好地处理从JSON加载的数据,但还有其他功能也很有用。这为原始问题提供了另一种解决方案。
其他回答
我们经常在日志文件中转储JSON格式的复杂字典。虽然大多数字段携带重要信息,但我们不太关心内置的类对象(例如子进程)。Popen对象)。由于存在这些不可序列化的对象,对json.dumps()的调用会失败。
为了解决这个问题,我构建了一个小函数来转储对象的字符串表示形式,而不是转储对象本身。如果您正在处理的数据结构嵌套太多,您可以指定嵌套的最大级别/深度。
from time import time
def safe_serialize(obj , max_depth = 2):
max_level = max_depth
def _safe_serialize(obj , current_level = 0):
nonlocal max_level
# If it is a list
if isinstance(obj , list):
if current_level >= max_level:
return "[...]"
result = list()
for element in obj:
result.append(_safe_serialize(element , current_level + 1))
return result
# If it is a dict
elif isinstance(obj , dict):
if current_level >= max_level:
return "{...}"
result = dict()
for key , value in obj.items():
result[f"{_safe_serialize(key , current_level + 1)}"] = _safe_serialize(value , current_level + 1)
return result
# If it is an object of builtin class
elif hasattr(obj , "__dict__"):
if hasattr(obj , "__repr__"):
result = f"{obj.__repr__()}_{int(time())}"
else:
try:
result = f"{obj.__class__.__name__}_object_{int(time())}"
except:
result = f"object_{int(time())}"
return result
# If it is anything else
else:
return obj
return _safe_serialize(obj)
由于字典也可以有不可序列化的键,转储它们的类名或对象表示将导致所有键都具有相同的名称,这将抛出错误,因为所有键都需要有唯一的名称,这就是为什么当前时间Since epoch被int(time())附加到对象名称。
可以使用以下具有不同级别/深度的嵌套字典来测试该函数
d = {
"a" : {
"a1" : {
"a11" : {
"a111" : "some_value" ,
"a112" : "some_value" ,
} ,
"a12" : {
"a121" : "some_value" ,
"a122" : "some_value" ,
} ,
} ,
"a2" : {
"a21" : {
"a211" : "some_value" ,
"a212" : "some_value" ,
} ,
"a22" : {
"a221" : "some_value" ,
"a222" : "some_value" ,
} ,
} ,
} ,
"b" : {
"b1" : {
"b11" : {
"b111" : "some_value" ,
"b112" : "some_value" ,
} ,
"b12" : {
"b121" : "some_value" ,
"b122" : "some_value" ,
} ,
} ,
"b2" : {
"b21" : {
"b211" : "some_value" ,
"b212" : "some_value" ,
} ,
"b22" : {
"b221" : "some_value" ,
"b222" : "some_value" ,
} ,
} ,
} ,
"c" : subprocess.Popen("ls -l".split() , stdout = subprocess.PIPE , stderr = subprocess.PIPE) ,
}
执行以下命令将会得到-
print("LEVEL 3")
print(json.dumps(safe_serialize(d , 3) , indent = 4))
print("\n\n\nLEVEL 2")
print(json.dumps(safe_serialize(d , 2) , indent = 4))
print("\n\n\nLEVEL 1")
print(json.dumps(safe_serialize(d , 1) , indent = 4))
结果:
LEVEL 3
{
"a": {
"a1": {
"a11": "{...}",
"a12": "{...}"
},
"a2": {
"a21": "{...}",
"a22": "{...}"
}
},
"b": {
"b1": {
"b11": "{...}",
"b12": "{...}"
},
"b2": {
"b21": "{...}",
"b22": "{...}"
}
},
"c": "<Popen: returncode: None args: ['ls', '-l']>"
}
LEVEL 2
{
"a": {
"a1": "{...}",
"a2": "{...}"
},
"b": {
"b1": "{...}",
"b2": "{...}"
},
"c": "<Popen: returncode: None args: ['ls', '-l']>"
}
LEVEL 1
{
"a": "{...}",
"b": "{...}",
"c": "<Popen: returncode: None args: ['ls', '-l']>"
}
[注意]:仅在不关心内置类对象的序列化时使用此选项。
TLDR:复制-粘贴下面的选项1或选项2
真正的/完整的答案:让Pythons json模块与你的类一起工作
AKA,求解:json。dump ({"thing": YOUR_CLASS()})
解释:
Yes, a good reliable solution exists No, there is no python "official" solution By official solution, I mean there is no way (as of 2023) to add a method to your class (like toJSON in JavaScript) and/or no way to register your class with the built-in json module. When something like json.dumps([1,2, your_obj]) is executed, python doesn't check a lookup table or object method. I'm not sure why other answers don't explain this The closest official approach is probably andyhasit's answer which is to inherit from a dictionary. However, inheriting from a dictionary doesn't work very well for many custom classes like AdvancedDateTime, or pytorch tensors. The ideal workaround is this: Mutate json.dumps (affects everywhere, even pip modules that import json) Add def __json__(self) method to your class
选项1:让一个模块来做补丁
PIP安装json-fix (扩展+包装版FancyJohn的回答,谢谢@FancyJohn)
your_class_definition.py
import json_fix
class YOUR_CLASS:
def __json__(self):
# YOUR CUSTOM CODE HERE
# you probably just want to do:
# return self.__dict__
return "a built-in object that is naturally json-able"
这是它。
使用示例:
from your_class_definition import YOUR_CLASS
import json
json.dumps([1,2, YOUR_CLASS()], indent=0)
# '[\n1,\n2,\n"a built-in object that is naturally json-able"\n]'
生成json。dump适用于Numpy数组,Pandas DataFrames和其他第三方对象,请参阅模块(只有大约2行代码,但需要解释)。
它是如何工作的?嗯…
选项2:补丁json。把你自己
注意:这种方法是简化的,它在已知的edgcase上失败(例如:如果你的自定义类继承了dict或其他内置类),并且它错过了控制外部类的json行为(numpy数组,datetime, dataframes,张量等)。
some_file_thats_imported_before_your_class_definitions.py
# Step: 1
# create the patch
from json import JSONEncoder
def wrapped_default(self, obj):
return getattr(obj.__class__, "__json__", wrapped_default.default)(obj)
wrapped_default.default = JSONEncoder().default
# apply the patch
JSONEncoder.original_default = JSONEncoder.default
JSONEncoder.default = wrapped_default
your_class_definition.py
# Step 2
class YOUR_CLASS:
def __json__(self, **options):
# YOUR CUSTOM CODE HERE
# you probably just want to do:
# return self.__dict__
return "a built-in object that is natually json-able"
_
其他答案似乎都是“序列化自定义对象的最佳实践/方法”
在这里的文档中已经介绍过了(搜索“complex”可以找到编码复数的例子)
解决这个问题有很多方法。'ObjDict' (pip install object)是另一个。重点是提供像javascript一样的对象,它也可以像字典一样最好地处理从JSON加载的数据,但还有其他功能也很有用。这为原始问题提供了另一种解决方案。
要添加另一个选项:您可以使用attrs包和asdict方法。
class ObjectEncoder(JSONEncoder):
def default(self, o):
return attr.asdict(o)
json.dumps(objects, cls=ObjectEncoder)
然后再转换回去
def from_json(o):
if '_obj_name' in o:
type_ = o['_obj_name']
del o['_obj_name']
return globals()[type_](**o)
else:
return o
data = JSONDecoder(object_hook=from_json).decode(data)
类看起来像这样
@attr.s
class Foo(object):
x = attr.ib()
_obj_name = attr.ib(init=False, default='Foo')
首先,我们需要使我们的对象符合JSON,这样我们就可以使用标准JSON模块转储它。我是这样做的:
def serialize(o):
if isinstance(o, dict):
return {k:serialize(v) for k,v in o.items()}
if isinstance(o, list):
return [serialize(e) for e in o]
if isinstance(o, bytes):
return o.decode("utf-8")
return o