如何检查字符串是否为unicode或ascii?

我必须在Python中做什么来找出字符串的编码?

当前回答

在Python-3中，我必须理解字符串是否像b='\x7f\x00\x00\x01'或b='127.0.0.1'我的解决方案是这样的:

def get_str(value):
    str_value = str(value)
    
    if str_value.isprintable():
        return str_value

    return '.'.join(['%d' % x for x in value])

对我有用，我希望对有需要的人有用

2021-04-07 16:05:45

其他回答

在python3中，所有字符串都是Unicode字符的序列。有一种bytes类型保存原始字节。

在python2中，字符串的类型可以是str或unicode。你可以用如下代码来区分:

def whatisthis(s):
    if isinstance(s, str):
        print "ordinary string"
    elif isinstance(s, unicode):
        print "unicode string"
    else:
        print "not a string"

这并不区分“Unicode或ASCII”;它只区分Python类型。Unicode字符串可以由ASCII范围内的纯字符组成，字节字符串可以包含ASCII、编码的Unicode，甚至是非文本数据。

2011-02-13 22:40:50

一种简单的方法是检查unicode是否是内置函数。如果是，你在python2中，你的字符串将是一个字符串。要确保所有内容都使用unicode，可以执行以下操作:

import builtins

i = 'cats'
if 'unicode' in dir(builtins):     # True in python 2, False in 3
  i = unicode(i)

2019-09-18 14:24:38

这可能会帮助其他人，我开始测试变量s的字符串类型，但对于我的应用程序，更有意义的是简单地返回s为utf-8。调用return_utf的进程知道它在处理什么，并可以适当地处理字符串。代码不是原始的，但我希望它是Python版本不可知的，不需要版本测试或导入六个版本。请对下面的示例代码进行改进，以帮助其他人。

def return_utf(s):
    if isinstance(s, str):
        return s.encode('utf-8')
    if isinstance(s, (int, float, complex)):
        return str(s).encode('utf-8')
    try:
        return s.encode('utf-8')
    except TypeError:
        try:
            return str(s).encode('utf-8')
        except AttributeError:
            return s
    except AttributeError:
        return s
    return s # assume it was already utf-8

2015-12-23 22:16:43

您可以使用通用编码检测器，但请注意，它只会给您最好的猜测，而不是实际的编码，因为不可能知道字符串“abc”的编码。您将需要在其他地方获取编码信息，例如HTTP协议使用内容类型报头。

2011-02-13 22:34:55

注意，在Python 3中，这样说并不公平:

字符串是UTFx的任何x(例如。use UTF8) str是Unicode 字符串是Unicode字符的有序集合

Python的str类型(通常)是Unicode码位序列，其中一些映射到字符。

即使在Python 3上，回答这个问题也不像您想象的那么简单。

测试ascii兼容字符串的一个明显的方法是尝试编码:

"Hello there!".encode("ascii")
#>>> b'Hello there!'

"Hello there... ☃!".encode("ascii")
#>>> Traceback (most recent call last):
#>>>   File "", line 4, in <module>
#>>> UnicodeEncodeError: 'ascii' codec can't encode character '\u2603' in position 15: ordinal not in range(128)

这个错误区分了不同的情况。

在Python 3中，甚至有一些字符串包含无效的Unicode代码点:

"Hello there!".encode("utf8")
#>>> b'Hello there!'

"\udcc3".encode("utf8")
#>>> Traceback (most recent call last):
#>>>   File "", line 19, in <module>
#>>> UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc3' in position 0: surrogates not allowed

用同样的方法来区分它们。

2014-07-09 02:35:59

如何检查字符串是否为unicode或ascii?

推荐文章

最新文章

标签