如何JSON序列化集?

我有一个包含__hash__和__eq__方法的对象的Python集合，以确保集合中不包含重复的对象。

我需要json编码这个结果集，但传递甚至一个空集json。dumps方法引发TypeError。

  File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python2.7/json/encoder.py", line 178, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: set([]) is not JSON serializable

我知道我可以为json创建一个扩展。JSONEncoder类，它有一个自定义的默认方法，但我甚至不确定从哪里开始转换集合。我应该在默认方法内创建一个字典的设置值，然后返回编码?理想情况下，我希望使默认方法能够处理原始编码器阻塞的所有数据类型(我使用Mongo作为数据源，因此日期似乎也会引发这个错误)

任何正确方向的提示都将不胜感激。

编辑:

谢谢你的回答!也许我应该说得更准确些。

我利用这里的答案来解决被翻译的集合的限制，但内部键也是一个问题。

集合中的对象是可转换为__dict__的复杂对象，但它们本身也可以包含其属性的值，这些值可能不适用于json编码器中的基本类型。

这个集合中有很多不同的类型，散列基本上为实体计算一个唯一的id，但在NoSQL的真正精神中，并没有确切地告诉子对象包含什么。

一个对象可能包含开始日期值，而另一个对象可能具有一些其他模式，其中不包含包含“非原始”对象的键。

That is why the only solution I could think of was to extend the JSONEncoder to replace the default method to turn on different cases - but I'm not sure how to go about this and the documentation is ambiguous. In nested objects, does the value returned from default go by key, or is it just a generic include/discard that looks at the whole object? How does that method accommodate nested values? I've looked through previous questions and can't seem to find the best approach to case-specific encoding (which unfortunately seems like what I'm going to need to do here).

当前回答

@AnttiHaapala的简写:

json.dumps(dict_with_sets, default=lambda x: list(x) if isinstance(x, set) else x)

2021-01-09 05:24:38

其他回答

公认的解决方案的一个缺点是它的输出非常特定于python。也就是说，它的原始json输出不能被人类观察到，也不能被其他语言(如javascript)加载。例子:

db = {
        "a": [ 44, set((4,5,6)) ],
        "b": [ 55, set((4,3,2)) ]
        }

j = dumps(db, cls=PythonObjectEncoder)
print(j)

会让你:

{"a": [44, {"_python_object": "gANjYnVpbHRpbnMKc2V0CnEAXXEBKEsESwVLBmWFcQJScQMu"}], "b": [55, {"_python_object": "gANjYnVpbHRpbnMKc2V0CnEAXXEBKEsCSwNLBGWFcQJScQMu"}]}

我可以提出一个解决方案，将set降级为一个包含列表的字典，并在使用相同的编码器加载到python时返回到一个集，因此保留了可观察性和语言不可知论:

from decimal import Decimal
from base64 import b64encode, b64decode
from json import dumps, loads, JSONEncoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, int, float, bool, type(None))):
            return super().default(obj)
        elif isinstance(obj, set):
            return {"__set__": list(obj)}
        return {'_python_object': b64encode(pickle.dumps(obj)).decode('utf-8')}

def as_python_object(dct):
    if '__set__' in dct:
        return set(dct['__set__'])
    elif '_python_object' in dct:
        return pickle.loads(b64decode(dct['_python_object'].encode('utf-8')))
    return dct

db = {
        "a": [ 44, set((4,5,6)) ],
        "b": [ 55, set((4,3,2)) ]
        }

j = dumps(db, cls=PythonObjectEncoder)
print(j)
ob = loads(j)
print(ob["a"])

这就得到了:

{"a": [44, {"__set__": [4, 5, 6]}], "b": [55, {"__set__": [2, 3, 4]}]}
[44, {'__set__': [4, 5, 6]}]

请注意，序列化一个包含键值为“__set__”的元素的字典将破坏这种机制。所以__set__现在已经成为一个保留字典键。显然，你可以随意使用另一个更模糊的键。

2020-01-16 13:33:19

JSON表示法只有少数几种原生数据类型(对象、数组、字符串、数字、布尔值和null)，因此JSON中序列化的任何东西都需要表示为这些类型之一。

如json模块文档所示，这种转换可以由JSONEncoder和JSONDecoder自动完成，但这样你就会放弃一些你可能需要的其他结构(如果你将集合转换为列表，那么你就失去了恢复常规列表的能力;如果使用dict.fromkeys(s)将集合转换为字典，则失去恢复字典的能力)。

更复杂的解决方案是构建一个可以与其他原生JSON类型共存的自定义类型。这让你可以存储嵌套结构，包括列表，集，字典，小数，datetime对象等:

from json import dumps, loads, JSONEncoder, JSONDecoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        try:
            return {'_python_object': pickle.dumps(obj).decode('latin-1')}
        except pickle.PickleError:
            return super().default(obj)

def as_python_object(dct):
    if '_python_object' in dct:
        return pickle.loads(dct['_python_object'].encode('latin-1'))
    return dct

下面是一个示例会话，显示它可以处理列表，字典和集合:

>>> data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'}, Decimal('3.14')]

>>> j = dumps(data, cls=PythonObjectEncoder)

>>> loads(j, object_hook=as_python_object)
[1, 2, 3, set(['knights', 'say', 'who', 'ni']), {'key': 'value'}, Decimal('3.14')]

或者，使用更通用的序列化技术(如YAML、Twisted Jelly或Python的pickle模块)可能会有用。它们各自支持更大范围的数据类型。

2011-11-22 16:41:32

@AnttiHaapala的简写:

json.dumps(dict_with_sets, default=lambda x: list(x) if isinstance(x, set) else x)

2021-01-09 05:24:38

JSON中只有字典、列表和基本对象类型(int、string、bool)可用。

2011-11-22 16:42:30

如果你只需要编码集合，而不是一般的Python对象，并且想让它易于人类阅读，可以使用Raymond Hettinger的答案的简化版本:

import json
import collections

class JSONSetEncoder(json.JSONEncoder):
    """Use with json.dumps to allow Python sets to be encoded to JSON

    Example
    -------

    import json

    data = dict(aset=set([1,2,3]))

    encoded = json.dumps(data, cls=JSONSetEncoder)
    decoded = json.loads(encoded, object_hook=json_as_python_set)
    assert data == decoded     # Should assert successfully

    Any object that is matched by isinstance(obj, collections.Set) will
    be encoded, but the decoded value will always be a normal Python set.

    """

    def default(self, obj):
        if isinstance(obj, collections.Set):
            return dict(_set_object=list(obj))
        else:
            return json.JSONEncoder.default(self, obj)

def json_as_python_set(dct):
    """Decode json {'_set_object': [1,2,3]} to set([1,2,3])

    Example
    -------
    decoded = json.loads(encoded, object_hook=json_as_python_set)

    Also see :class:`JSONSetEncoder`

    """
    if '_set_object' in dct:
        return set(dct['_set_object'])
    return dct

2015-02-05 08:37:20

如何JSON序列化集?

推荐文章

最新文章

标签