如何从JSON获得字符串对象而不是Unicode

我使用Python 2从ASCII编码的文本文件解析JSON。

当用json或simplejson加载这些文件时，我的所有字符串值都转换为Unicode对象而不是字符串对象。问题是，我必须将数据与一些只接受字符串对象的库一起使用。我不能更改库也不能更新它们。

是否有可能获得字符串对象而不是Unicode对象?

例子

>>> import json
>>> original_list = ['a', 'b']
>>> json_list = json.dumps(original_list)
>>> json_list
'["a", "b"]'
>>> new_list = json.loads(json_list)
>>> new_list
[u'a', u'b']  # I want these to be of type `str`, not `unicode`

(2017年一个简单而干净的解决方案是使用最新版本的Python——即Python 3和更高版本。)

当前回答

看看这个类似问题的答案，上面说

前缀u表示你有一个Unicode字符串。当你真正使用字符串时，它不会出现在你的数据中。不要被打印出来的结果所迷惑。

例如，试试这个:

print mail_accounts[0]["i"]

你不会看到u。

2017-07-04 13:32:45

其他回答

没有内置选项让json模块函数返回字节字符串而不是Unicode字符串。然而，这个简短而简单的递归函数将任何解码的JSON对象从使用Unicode字符串转换为utf -8编码的字节字符串:

def byteify(input):
    if isinstance(input, dict):
        return {byteify(key): byteify(value)
                for key, value in input.iteritems()}
    elif isinstance(input, list):
        return [byteify(element) for element in input]
    elif isinstance(input, unicode):
        return input.encode('utf-8')
    else:
        return input

只需在从json中获得的输出上调用此函数。加载或json。负载的电话。

几点注意事项:

To support Python 2.6 or earlier, replace return {byteify(key): byteify(value) for key, value in input.iteritems()} with return dict([(byteify(key), byteify(value)) for key, value in input.iteritems()]), since dictionary comprehensions weren't supported until Python 2.7. Since this answer recurses through the entire decoded object, it has a couple of undesirable performance characteristics that can be avoided with very careful use of the object_hook or object_pairs_hook parameters. Mirec Miskuf's answer is so far the only one that manages to pull this off correctly, although as a consequence, it's significantly more complicated than my approach.

2012-10-28 00:27:17

虽然这里有一些很好的答案，但我最终使用PyYAML来解析我的JSON文件，因为它以str类型字符串而不是unicode类型给出键和值。因为JSON是YAML的一个子集，它工作得很好:

>>> import json
>>> import yaml
>>> list_org = ['a', 'b']
>>> list_dump = json.dumps(list_org)
>>> list_dump
'["a", "b"]'
>>> json.loads(list_dump)
[u'a', u'b']
>>> yaml.safe_load(list_dump)
['a', 'b']

笔记

但有一些事情需要注意:

I get string objects because all my entries are ASCII encoded. If I would use Unicode encoded entries, I would get them back as unicode objects — there is no conversion! You should (probably always) use PyYAML's safe_load function; if you use it to load JSON files, you don't need the "additional power" of the load function anyway. If you want a YAML parser that has more support for the 1.2 version of the spec (and correctly parses very low numbers) try Ruamel YAML: pip install ruamel.yaml and import ruamel.yaml as yaml was all I needed in my tests.

转换

如上所述，没有任何转换!如果你不能确定只处理ASCII值(而且大多数时候你不能确定)，最好使用转换函数:

我现在用过几次Mark Amery的，效果很好，很容易使用。您还可以使用类似的函数作为object_hook，因为它可以提高大文件的性能。请参阅Mirec Miskuf稍复杂的回答。

2013-05-04 10:37:24

使用Python 3.6，有时我仍然会遇到这个问题。例如，当从REST API获取响应并将响应文本加载到JSON时，我仍然得到Unicode字符串。使用json.dumps()找到了一个简单的解决方案。

response_message = json.loads(json.dumps(response.text))
print(response_message)

2018-04-25 17:17:55

这是因为json()在字符串对象和Unicode对象之间没有区别。它们都是JavaScript中的字符串。

我认为JSON返回Unicode对象是正确的。事实上，我不会接受更少的东西，因为JavaScript字符串实际上是unicode对象(即JSON (JavaScript)字符串可以存储任何类型的unicode字符)，因此在从JSON转换字符串时创建unicode对象是有意义的。普通字符串不适合，因为库必须猜测您想要的编码。

最好在任何地方都使用unicode字符串对象。因此，最好的选择是更新库，使它们能够处理Unicode对象。

但如果你真的想要字节串，只需将结果编码为你选择的编码:

>>> nl = json.loads(js)
>>> nl
[u'a', u'b']
>>> nl = [s.encode('utf-8') for s in nl]
>>> nl
['a', 'b']

2009-06-05 16:44:45

迈克·布伦南的答案很接近，但没有任何理由重新审视整个结构。如果使用object_hook_pairs (Python 2.7+)形参:

Object_pairs_hook是一个可选函数，它将使用任意对象字面量的解码结果调用。object_pairs_hook的返回值将被使用，而不是字典。此特性可用于实现依赖于键和值对解码顺序的自定义解码器(例如集合)。OrderedDict将记住插入的顺序)。如果还定义了object_hook，则object_pairs_hook具有优先级。

有了它，你可以得到每个JSON对象，所以你可以不需要递归地进行解码:

def deunicodify_hook(pairs):
    new_pairs = []
    for key, value in pairs:
        if isinstance(value, unicode):
            value = value.encode('utf-8')
        if isinstance(key, unicode):
            key = key.encode('utf-8')
        new_pairs.append((key, value))
    return dict(new_pairs)

In [52]: open('test.json').read()
Out[52]: '{"1": "hello", "abc": [1, 2, 3], "def": {"hi": "mom"}, "boo": [1, "hi", "moo", {"5": "some"}]}'

In [53]: json.load(open('test.json'))
Out[53]:
{u'1': u'hello',
 u'abc': [1, 2, 3],
 u'boo': [1, u'hi', u'moo', {u'5': u'some'}],
 u'def': {u'hi': u'mom'}}

In [54]: json.load(open('test.json'), object_pairs_hook=deunicodify_hook)
Out[54]:
{'1': 'hello',
 'abc': [1, 2, 3],
 'boo': [1, 'hi', 'moo', {'5': 'some'}],
 'def': {'hi': 'mom'}}

注意，我从来没有递归地调用钩子，因为当你使用object_pairs_hook时，每个对象都会被传递给钩子。您确实需要关心列表，但是正如您所看到的，列表中的对象将被正确地转换，并且您不必递归来实现它。

一位同事指出Python2.6没有object_hook_pairs。你仍然可以通过做一个很小的改变来使用这个will Python2.6。在上面的钩子中，更改:

for key, value in pairs:

for key, value in pairs.iteritems():

然后使用object_hook代替object_pairs_hook:

In [66]: json.load(open('test.json'), object_hook=deunicodify_hook)
Out[66]:
{'1': 'hello',
 'abc': [1, 2, 3],
 'boo': [1, 'hi', 'moo', {'5': 'some'}],
 'def': {'hi': 'mom'}}

使用object_pairs_hook可以为JSON对象中的每个对象少实例化一个字典，如果您正在解析一个巨大的文档，那么这样做可能是值得的。

2016-01-14 17:34:06

如何从JSON获得字符串对象而不是Unicode

推荐文章

最新文章

标签