如何JSON序列化集?

我有一个包含__hash__和__eq__方法的对象的Python集合，以确保集合中不包含重复的对象。

我需要json编码这个结果集，但传递甚至一个空集json。dumps方法引发TypeError。

  File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python2.7/json/encoder.py", line 178, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: set([]) is not JSON serializable

我知道我可以为json创建一个扩展。JSONEncoder类，它有一个自定义的默认方法，但我甚至不确定从哪里开始转换集合。我应该在默认方法内创建一个字典的设置值，然后返回编码?理想情况下，我希望使默认方法能够处理原始编码器阻塞的所有数据类型(我使用Mongo作为数据源，因此日期似乎也会引发这个错误)

任何正确方向的提示都将不胜感激。

编辑:

谢谢你的回答!也许我应该说得更准确些。

我利用这里的答案来解决被翻译的集合的限制，但内部键也是一个问题。

集合中的对象是可转换为__dict__的复杂对象，但它们本身也可以包含其属性的值，这些值可能不适用于json编码器中的基本类型。

这个集合中有很多不同的类型，散列基本上为实体计算一个唯一的id，但在NoSQL的真正精神中，并没有确切地告诉子对象包含什么。

一个对象可能包含开始日期值，而另一个对象可能具有一些其他模式，其中不包含包含“非原始”对象的键。

That is why the only solution I could think of was to extend the JSONEncoder to replace the default method to turn on different cases - but I'm not sure how to go about this and the documentation is ambiguous. In nested objects, does the value returned from default go by key, or is it just a generic include/discard that looks at the whole object? How does that method accommodate nested values? I've looked through previous questions and can't seem to find the best approach to case-specific encoding (which unfortunately seems like what I'm going to need to do here).

当前回答

我将Raymond Hettinger的解决方案改编为python 3。

以下是改变的地方:

unicode消失了用super()更新父函数的默认调用使用base64将bytes类型序列化为str(因为python 3中的bytes似乎不能转换为JSON)

from decimal import Decimal
from base64 import b64encode, b64decode
from json import dumps, loads, JSONEncoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, int, float, bool, type(None))):
            return super().default(obj)
        return {'_python_object': b64encode(pickle.dumps(obj)).decode('utf-8')}

def as_python_object(dct):
    if '_python_object' in dct:
        return pickle.loads(b64decode(dct['_python_object'].encode('utf-8')))
    return dct

data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'}, Decimal('3.14')]
j = dumps(data, cls=PythonObjectEncoder)
print(loads(j, object_hook=as_python_object))
# prints: [1, 2, 3, {'knights', 'who', 'say', 'ni'}, {'key': 'value'}, Decimal('3.14')]

2016-03-27 20:26:24

其他回答

如果你只是需要快速转储，不想实现自定义编码器。你可以使用以下方法:

json_string = json.dumps(data, iterable_as_array=True)

这将把所有集合(和其他可迭代对象)转换为数组。只是要注意，当解析回JSON时，这些字段将保持数组。如果要保留类型，则需要编写自定义编码器。

还要确保安装了simplejson，并且是必需的。你可以在PyPi上找到它。

2018-12-06 14:08:34

您可以创建一个自定义编码器，在遇到集合时返回一个列表。这里有一个例子:

import json
class SetEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, set):
            return list(obj)
        return json.JSONEncoder.default(self, obj)

data_str = json.dumps(set([1,2,3,4,5]), cls=SetEncoder)
print(data_str)
# Output: '[1, 2, 3, 4, 5]'

您也可以用这种方法检测其他类型。如果需要保留列表实际上是一个集合，则可以使用自定义编码。类似return {'type':'set'， 'list':list(obj)}这样的方法可能有用。

为了说明嵌套类型，考虑序列化:

class Something(object):
    pass
json.dumps(set([1,2,3,4,5,Something()]), cls=SetEncoder)

这会引发以下错误:

TypeError: <__main__.Something object at 0x1691c50> is not JSON serializable

这表明编码器将接受返回的列表结果，并递归地调用其子序列化器。为多个类型添加自定义序列化器，可以这样做:

class SetEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, set):
            return list(obj)
        if isinstance(obj, Something):
            return 'CustomSomethingRepresentation'
        return json.JSONEncoder.default(self, obj)
 
data_str = json.dumps(set([1,2,3,4,5,Something()]), cls=SetEncoder)
print(data_str)
# Output: '[1, 2, 3, 4, 5, "CustomSomethingRepresentation"]'

2011-11-22 16:49:28

@AnttiHaapala的简写:

json.dumps(dict_with_sets, default=lambda x: list(x) if isinstance(x, set) else x)

2021-01-09 05:24:38

JSON中只有字典、列表和基本对象类型(int、string、bool)可用。

2011-11-22 16:42:30

你不需要创建一个自定义编码器类来提供默认方法——它可以作为关键字参数传入:

import json

def serialize_sets(obj):
    if isinstance(obj, set):
        return list(obj)

    return obj

json_str = json.dumps(set([1,2,3]), default=serialize_sets)
print(json_str)

在所有支持的Python版本中，结果为[1,2,3]。

2020-03-05 11:40:18

如何JSON序列化集?

推荐文章

最新文章

标签