将Unicode字符串转换为Python中的字符串(包含额外符号)

如何将Unicode字符串(包含额外的字符，如£$等)转换为Python字符串?

当前回答

下面是一个示例代码

import unicodedata    
raw_text = u"here $%6757 dfgdfg"
convert_text = unicodedata.normalize('NFKD', raw_text).encode('ascii','ignore')

2016-12-19 07:59:44

其他回答

这里有一个例子:

>>> u = u'€€€'
>>> s = u.encode('utf8')
>>> s
'\xe2\x82\xac\xe2\x82\xac\xe2\x82\xac'

2009-07-30 15:46:26

下面是一个示例代码

import unicodedata    
raw_text = u"here $%6757 dfgdfg"
convert_text = unicodedata.normalize('NFKD', raw_text).encode('ascii','ignore')

2016-12-19 07:59:44

好吧，如果你愿意/准备切换到Python 3(由于与一些Python 2代码向后不兼容，你可能不会切换到Python 3)，你不需要做任何转换;Python 3中的所有文本都用Unicode字符串表示，这也意味着不再使用u'<text>'语法。实际上，您还拥有用于表示数据的字节字符串(可能是经过编码的字符串)。

http://docs.python.org/3.1/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit

(当然，如果你目前使用的是Python 3，那么问题很可能与你试图将文本保存到文件的方式有关。)

2009-07-30 16:09:31

如果你不需要转换非ASCII字符，你可以使用encode to ASCII:

>>> a=u"aaaàçççñññ"
>>> type(a)
<type 'unicode'>
>>> a.encode('ascii','ignore')
'aaa'
>>> a.encode('ascii','replace')
'aaa???????'
>>>

2009-07-31 07:13:09

我已经做了下面的函数，它可以让你控制什么要保留根据Unicode的General_Category_Values (https://www.unicode.org/reports/tr44/#General_Category_Values)

def FormatToNameList(name_str):
    import unicodedata
    clean_str = ''
    for c in name_str:
        if unicodedata.category(c) in ['Lu','Ll']:
            clean_str += c.lower()
            print('normal letter: ',c)
        elif unicodedata.category(c) in ['Lt','Lm','Lo']:
            clean_str += c
            print('special letter: ',c)
        elif unicodedata.category(c) in ['Nd']:
            clean_str += c
            print('normal number: ',c)
        elif unicodedata.category(c) in ['Nl','No']:
            clean_str += c
            print('special number: ',c)
        elif unicodedata.category(c) in ['Cc','Sm','Zs','Zl','Zp','Pc','Pd','Ps','Pe','Pi','Pf','Po']:
            clean_str += ' '
            print('space or symbol: ',c)
        else:
            print('other: ',' : ',c,' unicodedata.category: ',unicodedata.category(c))    
    name_list = clean_str.split(' ')
    return clean_str, name_list
if __name__ == '__main__':
     u = 'some3^?"Weirdstr '+ chr(231) + chr(0x0af4)
     [clean_str, name_list] = FormatToNameList(u)
     print(clean_str)
     print(name_list)

参见https://docs.python.org/3/howto/unicode.html

2022-06-30 12:38:29

将Unicode字符串转换为Python中的字符串(包含额外符号)

推荐文章

最新文章

标签