是否有python约定,何时应该实现__str__()而不是__unicode__()。我已经看到类重写__unicode__()比__str__()更频繁,但它似乎不一致。是否存在特定的规则,以便更好地执行其中一个而不是另一个?两者都实现是必要的/好的做法吗?


当前回答

如果我不是特别关心给定类的微优化字符串化,我总是只实现__unicode__,因为它更通用。当我确实关心这样的微小性能问题(这是例外,而不是规则),只有__str__(当我能证明在字符串化的输出中永远不会有非ascii字符)或两者(当两者都是可能的),可能会有所帮助。

These I think are solid principles, but in practice it's very common to KNOW there will be nothing but ASCII characters without doing effort to prove it (e.g. the stringified form only has digits, punctuation, and maybe a short ASCII name;-) in which case it's quite typical to move on directly to the "just __str__" approach (but if a programming team I worked with proposed a local guideline to avoid that, I'd be +1 on the proposal, as it's easy to err in these matters AND "premature optimization is the root of all evil in programming";-).

其他回答

如果你在Django中同时使用python2和python3,我推荐使用python_2_unicode_compatible装饰器:

Django提供了一种简单的方法来定义在Python 2和3上工作的str()和unicode()方法:你必须定义一个str()方法来返回文本并应用python_2_unicode_compatible()装饰器。

正如前面对另一个答案的评论中提到的,future的一些版本。util也支持这个装饰器。在我的系统上,我需要为python2安装一个更新的future模块,并为python3安装future模块。之后,这里是一个函数示例:

#! /usr/bin/env python

from future.utils import python_2_unicode_compatible
from sys import version_info

@python_2_unicode_compatible
class SomeClass():
    def __str__(self):
        return "Called __str__"


if __name__ == "__main__":
    some_inst = SomeClass()
    print(some_inst)
    if (version_info > (3,0)):
        print("Python 3 does not support unicode()")
    else:
        print(unicode(some_inst))

以下是示例输出(其中venv2/venv3是virtualenv实例):

~/tmp$ ./venv3/bin/python3 demo_python_2_unicode_compatible.py 
Called __str__
Python 3 does not support unicode()

~/tmp$ ./venv2/bin/python2 demo_python_2_unicode_compatible.py 
Called __str__
Called __str__

对于那些不熟悉__unicode__函数的人,有必要指出Python 2中围绕__unicode__函数的一些默认行为。X,特别是当与__str__一起定义时。

class A :
    def __init__(self) :
        self.x = 123
        self.y = 23.3

    #def __str__(self) :
    #    return "STR      {}      {}".format( self.x , self.y)
    def __unicode__(self) :
        return u"UNICODE  {}      {}".format( self.x , self.y)

a1 = A()
a2 = A()

print( "__repr__ checks")
print( a1 )
print( a2 )

print( "\n__str__ vs __unicode__ checks")
print( str( a1 ))
print( unicode(a1))
print( "{}".format( a1 ))
print( u"{}".format( a1 ))

产生以下控制台输出…

__repr__ checks
<__main__.A instance at 0x103f063f8>
<__main__.A instance at 0x103f06440>

__str__ vs __unicode__ checks
<__main__.A instance at 0x103f063f8>
UNICODE 123      23.3
<__main__.A instance at 0x103f063f8>
UNICODE 123      23.3

现在,当我取消注释__str__方法时

__repr__ checks
STR      123      23.3
STR      123      23.3

__str__ vs __unicode__ checks
STR      123      23.3
UNICODE  123      23.3
STR      123      23.3
UNICODE  123      23.3

__str__()是旧的方法——它返回字节。__unicode__()是新的首选方法——它返回字符。这些名字有点混乱,但在2。X,我们因为兼容性的原因被困住了。通常,你应该把所有的字符串格式放在__unicode__()中,并创建一个存根__str__()方法:

def __str__(self):
    return unicode(self).encode('utf-8')

在3.0中,str包含字符,因此相同的方法被命名为__bytes__()和__str__()。它们的行为与预期一致。

如果我不是特别关心给定类的微优化字符串化,我总是只实现__unicode__,因为它更通用。当我确实关心这样的微小性能问题(这是例外,而不是规则),只有__str__(当我能证明在字符串化的输出中永远不会有非ascii字符)或两者(当两者都是可能的),可能会有所帮助。

These I think are solid principles, but in practice it's very common to KNOW there will be nothing but ASCII characters without doing effort to prove it (e.g. the stringified form only has digits, punctuation, and maybe a short ASCII name;-) in which case it's quite typical to move on directly to the "just __str__" approach (but if a programming team I worked with proposed a local guideline to avoid that, I'd be +1 on the proposal, as it's easy to err in these matters AND "premature optimization is the root of all evil in programming";-).

随着世界越来越小,您遇到的任何字符串最终都可能包含Unicode。所以对于任何新的应用程序,你至少应该提供__unicode__()。是否也重写__str__()则只是个人喜好的问题。