我们在Team Foundation Server (TFS)中有一个项目,其中有一个非英语字符(š)。当尝试编写一些与构建相关的脚本时,我们偶然发现了一个问题——我们不能将这个字母传递给命令行工具。命令提示符或其他东西会把它弄乱,tf.exe实用程序无法找到指定的项目。
我尝试了不同格式的.bat文件(ANSI, UTF-8,带BOM和不带BOM),以及用JavaScript编写脚本(本质上是Unicode) -但运气不好。如何执行程序并传递一个Unicode命令行?
我们在Team Foundation Server (TFS)中有一个项目,其中有一个非英语字符(š)。当尝试编写一些与构建相关的脚本时,我们偶然发现了一个问题——我们不能将这个字母传递给命令行工具。命令提示符或其他东西会把它弄乱,tf.exe实用程序无法找到指定的项目。
我尝试了不同格式的.bat文件(ANSI, UTF-8,带BOM和不带BOM),以及用JavaScript编写脚本(本质上是Unicode) -但运气不好。如何执行程序并传递一个Unicode命令行?
当前回答
我的背景:我在控制台中使用Unicode输入/输出已经很多年了(并且每天都这么做。此外,我还为这项任务开发了支持工具)。只要你了解以下事实/限制,问题就很少:
CMD and “console” are unrelated factors. CMD.exe is a just one of programs which are ready to “work inside” a console (“console applications”). AFAIK, CMD has perfect support for Unicode; you can enter/output all Unicode chars when any codepage is active. Windows’ console has A LOT of support for Unicode — but it is not perfect (just “good enough”; see below). chcp 65001 is very dangerous. Unless a program was specially designed to work around defects in the Windows’ API (or uses a C runtime library which has these workarounds), it would not work reliably. Win8 fixes ½ of these problems with cp65001, but the rest is still applicable to Win10. I work in cp1252. As I already said: To input/output Unicode in a console, one does not need to set the codepage.
细节
To read/write Unicode to a console, an application (or its C runtime library) should be smart enough to use not File-I/O API, but Console-I/O API. (For an example, see how Python does it.) Likewise, to read Unicode command-line arguments, an application (or its C runtime library) should be smart enough to use the corresponding API. Console font rendering supports only Unicode characters in BMP (in other words: below U+10000). Only simple text rendering is supported (so European — and some East Asian — languages should work fine — as far as one uses precomposed forms). [There is a minor fine print here for East Asian and for characters U+0000, U+0001, U+30FB.]
实际考虑
The defaults on Window are not very helpful. For best experience, one should tune up 3 pieces of configuration: For output: a comprehensive console font. For best results, I recommend my builds. (The installation instructions are present there — and also listed in other answers on this page.) For input: a capable keyboard layout. For best results, I recommend my layouts. For input: allow HEX input of Unicode. One more gotcha with “Pasting” into a console application (very technical): HEX input delivers a character on KeyUp of Alt; all the other ways to deliver a character happen on KeyDown; so many applications are not ready to see a character on KeyUp. (Only applicable to applications using Console-I/O API.) Conclusion: many application would not react on HEX input events. Moreover, what happens with a “Pasted” character depends on the current keyboard layout: if the character can be typed without using prefix keys (but with arbitrary complicated combination of modifiers, as in Ctrl-Alt-AltGr-Kana-Shift-Gray*) then it is delivered on an emulated keypress. This is what any application expects — so pasting anything which contains only such characters is fine. However, the “other” characters are delivered by emulating HEX input. Conclusion: unless your keyboard layout supports input of A LOT of characters without prefix keys, some buggy applications may skip characters when you Paste via Console’s UI: Alt-Space E P. (This is why I recommend using my keyboard layouts!)
我们还应该记住,Windows上的“替代的、更强大的”主机根本不是主机。它们不支持Console-I/O api,因此依赖这些api工作的程序将无法正常工作。(不过,只使用“文件- i /O api到控制台文件句柄”的程序可以很好地工作。)
微软的Powershell就是一个非主机的例子。我不用它;要进行实验,按下并释放WinKey,然后输入powershell。
(另一方面,有一些程序,如ConEmu或ANSICON,试图做更多的事情:他们“试图”拦截控制台i /O api,以使“真正的控制台应用程序”也能工作。这绝对适用于玩具示例程序;在现实生活中,这可能解决不了您的特定问题。实验。)
总结
设置字体,键盘布局(并可选地,允许十六进制输入)。 只使用通过Console-I/O api的程序,并接受Unicode命令行参数。例如,任何由cygwin编译的程序都可以。正如我已经说过的,CMD也很好。
UPD:最初,对于cp65001中的一个错误,我混淆了内核层和CRTL层(UPD²:和Windows用户模式API!)另外:Win8修复了这个错误的一半;我澄清了关于“更好的控制台”应用程序的部分,并添加了Python如何做到这一点的参考。
其他回答
在Windows 10 x64机器上,我通过以下方法使命令提示符显示非英语字符:
打开提升命令提示符(以管理员身份运行CMD.EXE)。通过以下方法查询控制台可用的TrueType字体的注册表:
REG query "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont"
你会看到如下输出:
0 REG_SZ Lucida Console
00 REG_SZ Consolas
936 REG_SZ *新宋体
932 REG_SZ *MS ゴシック
现在我们需要添加一个TrueType字体,支持你需要的字符,如Courier New。我们通过在字符串名后面加0来实现,所以在这种情况下,下一个字符串将是"000":
REG ADD "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont" /v 000 /t REG_SZ /d "Courier New"
现在我们实现了UTF-8支持:
REG ADD HKCU\Console /v CodePage /t REG_DWORD /d 65001 /f
设置默认字体为Courier New:
REG ADD HKCU\Console /v FaceName /t REG_SZ /d "Courier New" /f
设置字体大小为20:
REG ADD HKCU\Console /v FontSize /t REG_DWORD /d 20 /f
启用快速编辑,如果你喜欢:
REG ADD HKCU\Console /v QuickEdit /t REG_DWORD /d 1 /f
一个更好更干净的方法是:安装可用的免费微软日语包。(其他东方语言包也可以,但我已经测试了日语包。)
这将为您提供具有较大字形集的字体,使它们成为默认行为,更改各种Windows工具,如cmd, WordPad等。
对于那些使用WSL但又不想要Cygwin或Git的额外包的人来说,wsltty是可用的,它只提供支持UTF-8的终端
Try:
chcp 65001
这会将代码页更改为UTF-8。此外,还需要使用Lucida控制台字体。
实际上,关键在于命令提示符实际上理解这些非英语字符,只是不能正确地显示它们。
当我在命令提示符中输入包含一些非英语字符的路径时,它显示为“?? ?”?????? ? ? ? ? ?”当您提交命令(cd "???????? ?????”在我的情况下),一切都按照预期工作。