我想知道为什么大多数使用Perl构建的现代解决方案在默认情况下不启用UTF-8。
我知道核心Perl脚本有许多遗留问题,可能会破坏一些东西。但是,从我的角度来看,在21世纪,大型的新项目(或具有大视角的项目)应该从头开始使他们的软件支持UTF-8。但我还是不认为会发生这种情况。例如,Moose启用严格和警告,但不启用Unicode。现代的::Perl也减少了样板文件,但没有UTF-8处理。
为什么?在2011年的现代Perl项目中,是否有一些避免使用UTF-8的理由?
评论@tchrist太长了,所以我把它加在这里。
看来我没有说清楚。让我试着补充一些东西。
我和他对情况的看法很相似,但我们的结论却完全相反。我同意,Unicode的情况是复杂的,但这就是为什么我们(Perl用户和编码员)需要一些层(或pragma),使UTF-8的处理像现在一样简单。
它指出了许多方面,我将阅读和思考他们几天甚至几个星期。不过,这不是我的重点。tchrist试图证明“启用UTF-8”不是一种单一的方法。我没有那么多的知识与之争论。所以,我坚持用活生生的例子。
我玩了Rakudo和UTF-8只是我需要的。我没有遇到任何问题,一切都很顺利。也许有一些更深层次的限制,但在开始时,我所测试的所有工作都符合我的预期。
这难道不应该是现代Perl 5的目标吗?我要强调的是:我并不是建议将UTF-8作为核心Perl的默认字符集,而是建议开发新项目的人员可以使用snap触发它。
Another example, but with a more negative tone. Frameworks should make development easier. Some years ago, I tried web frameworks, but just threw them away because "enabling UTF-8" was so obscure. I did not find how and where to hook Unicode support. It was so time-consuming that I found it easier to go the old way. Now I saw here there was a bounty to deal with the same problem with Mason 2: How to make Mason2 UTF-8 clean?. So, it is pretty new framework, but using it with UTF-8 needs deep knowledge of its internals. It is like a big red sign: STOP, don't use me!
我真的很喜欢Perl。但是处理Unicode是痛苦的。我仍然发现自己在撞墙。在某种程度上tchrist是正确的,并回答了我的问题:新项目不吸引UTF-8,因为它在Perl 5中太复杂了。
我认为您误解了Unicode及其与Perl的关系。无论您以何种方式存储数据,Unicode、ISO-8859-1或许多其他东西,您的程序都必须知道如何将获得的字节解释为输入(解码),以及如何表示想要输出的信息(编码)。如果解释错误,数据就会失真。在你的程序内部没有什么神奇的默认设置来告诉程序外部的东西如何操作。
You think it's hard, most likely, because you are used to everything being ASCII. Everything you should have been thinking about was simply ignored by the programming language and all of the things it had to interact with. If everything used nothing but UTF-8 and you had no choice, then UTF-8 would be just as easy. But not everything does use UTF-8. For instance, you don't want your input handle to think that it's getting UTF-8 octets unless it actually is, and you don't want your output handles to be UTF-8 if the thing reading from them can't handle UTF-8. Perl has no way to know those things. That's why you are the programmer.
我不认为Perl 5中的Unicode太复杂。我认为这很可怕,人们会避免这样做。这是不同的。为此,我在第6版《学习Perl》中加入了Unicode,在《有效Perl编程》中也有很多关于Unicode的东西。您必须花时间学习和理解Unicode及其工作原理。否则你将无法有效地使用它。
我认为您误解了Unicode及其与Perl的关系。无论您以何种方式存储数据,Unicode、ISO-8859-1或许多其他东西,您的程序都必须知道如何将获得的字节解释为输入(解码),以及如何表示想要输出的信息(编码)。如果解释错误,数据就会失真。在你的程序内部没有什么神奇的默认设置来告诉程序外部的东西如何操作。
You think it's hard, most likely, because you are used to everything being ASCII. Everything you should have been thinking about was simply ignored by the programming language and all of the things it had to interact with. If everything used nothing but UTF-8 and you had no choice, then UTF-8 would be just as easy. But not everything does use UTF-8. For instance, you don't want your input handle to think that it's getting UTF-8 octets unless it actually is, and you don't want your output handles to be UTF-8 if the thing reading from them can't handle UTF-8. Perl has no way to know those things. That's why you are the programmer.
我不认为Perl 5中的Unicode太复杂。我认为这很可怕,人们会避免这样做。这是不同的。为此,我在第6版《学习Perl》中加入了Unicode,在《有效Perl编程》中也有很多关于Unicode的东西。您必须花时间学习和理解Unicode及其工作原理。否则你将无法有效地使用它。