我们在Team Foundation Server (TFS)中有一个项目,其中有一个非英语字符(š)。当尝试编写一些与构建相关的脚本时,我们偶然发现了一个问题——我们不能将这个字母传递给命令行工具。命令提示符或其他东西会把它弄乱,tf.exe实用程序无法找到指定的项目。
我尝试了不同格式的.bat文件(ANSI, UTF-8,带BOM和不带BOM),以及用JavaScript编写脚本(本质上是Unicode) -但运气不好。如何执行程序并传递一个Unicode命令行?
我们在Team Foundation Server (TFS)中有一个项目,其中有一个非英语字符(š)。当尝试编写一些与构建相关的脚本时,我们偶然发现了一个问题——我们不能将这个字母传递给命令行工具。命令提示符或其他东西会把它弄乱,tf.exe实用程序无法找到指定的项目。
我尝试了不同格式的.bat文件(ANSI, UTF-8,带BOM和不带BOM),以及用JavaScript编写脚本(本质上是Unicode) -但运气不好。如何执行程序并传递一个Unicode命令行?
当前回答
更改Windows控制台的默认Codepage是相当困难的。当你在网上搜索时,你会发现不同的建议,然而其中一些可能会完全破坏你的Windows,即你的PC无法再启动。
最安全的解决方案是: 转到你的注册表键HKEY_CURRENT_USER\Software\Microsoft\Command Processor并添加字符串值Autorun = chcp 65001。
或者,对于最常见的代码页,可以使用这个小的批处理脚本。
@ECHO off
SET ROOT_KEY="HKEY_CURRENT_USER"
FOR /f "skip=2 tokens=3" %%i in ('reg query HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage /v OEMCP') do set OEMCP=%%i
ECHO System default values:
ECHO.
ECHO ...............................................
ECHO Select Codepage
ECHO ...............................................
ECHO.
ECHO 1 - CP1252
ECHO 2 - UTF-8
ECHO 3 - CP850
ECHO 4 - ISO-8859-1
ECHO 5 - ISO-8859-15
ECHO 6 - US-ASCII
ECHO.
ECHO 9 - Reset to System Default (CP%OEMCP%)
ECHO 0 - EXIT
ECHO.
SET /P CP="Select a Codepage: "
if %CP%==1 (
echo Set default Codepage to CP1252
reg add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "@chcp 1252>nul" /f
) else if %CP%==2 (
echo Set default Codepage to UTF-8
reg add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "@chcp 65001>nul" /f
) else if %CP%==3 (
echo Set default Codepage to CP850
reg add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "@chcp 850>nul" /f
) else if %CP%==4 (
echo Set default Codepage to ISO-8859-1
add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "@chcp 28591>nul" /f
) else if %CP%==5 (
echo Set default Codepage to ISO-8859-15
add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "@chcp 28605>nul" /f
) else if %CP%==6 (
echo Set default Codepage to ASCII
add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "@chcp 20127>nul" /f
) else if %CP%==9 (
echo Reset Codepage to System Default
reg delete "%ROOT_KEY%\Software\Microsoft\Command Processor" /v AutoRun /f
) else if %CP%==0 (
echo Bye
) else (
echo Invalid choice
pause
)
使用@chcp 65001>nul而不是chcp 65001会抑制每次启动一个新的命令行窗口时都会得到的输出“活动代码页:65001”。
所有可用号码的完整列表,您可以从代码页标识符
注意,设置只适用于当前用户。如果你想为所有用户设置它,将set ROOT_KEY="HKEY_CURRENT_USER"替换为set ROOT_KEY="HKEY_LOCAL_MACHINE"
其他回答
一个非常简单的选择是安装一个Windows bash shell,如MinGW并使用它:
有一点学习曲线,因为你将需要使用Unix命令行功能,但你会喜欢它的力量,你可以将控制台字符集设置为UTF-8。
当然,您还可以得到所有常见的*nix好东西,如grep、find、less等。
我通过在批处理文件中使用它们的短名称(8点3)来删除以unicode命名的文件,从而避免了类似的问题。
短名称可以通过执行dir /x查看。显然,这只适用于已知的Unicode文件名。
检查非unicode程序的语言。如果你在Windows控制台中有俄语问题,那么你应该在这里设置俄语:
我的背景:我在控制台中使用Unicode输入/输出已经很多年了(并且每天都这么做。此外,我还为这项任务开发了支持工具)。只要你了解以下事实/限制,问题就很少:
CMD and “console” are unrelated factors. CMD.exe is a just one of programs which are ready to “work inside” a console (“console applications”). AFAIK, CMD has perfect support for Unicode; you can enter/output all Unicode chars when any codepage is active. Windows’ console has A LOT of support for Unicode — but it is not perfect (just “good enough”; see below). chcp 65001 is very dangerous. Unless a program was specially designed to work around defects in the Windows’ API (or uses a C runtime library which has these workarounds), it would not work reliably. Win8 fixes ½ of these problems with cp65001, but the rest is still applicable to Win10. I work in cp1252. As I already said: To input/output Unicode in a console, one does not need to set the codepage.
细节
To read/write Unicode to a console, an application (or its C runtime library) should be smart enough to use not File-I/O API, but Console-I/O API. (For an example, see how Python does it.) Likewise, to read Unicode command-line arguments, an application (or its C runtime library) should be smart enough to use the corresponding API. Console font rendering supports only Unicode characters in BMP (in other words: below U+10000). Only simple text rendering is supported (so European — and some East Asian — languages should work fine — as far as one uses precomposed forms). [There is a minor fine print here for East Asian and for characters U+0000, U+0001, U+30FB.]
实际考虑
The defaults on Window are not very helpful. For best experience, one should tune up 3 pieces of configuration: For output: a comprehensive console font. For best results, I recommend my builds. (The installation instructions are present there — and also listed in other answers on this page.) For input: a capable keyboard layout. For best results, I recommend my layouts. For input: allow HEX input of Unicode. One more gotcha with “Pasting” into a console application (very technical): HEX input delivers a character on KeyUp of Alt; all the other ways to deliver a character happen on KeyDown; so many applications are not ready to see a character on KeyUp. (Only applicable to applications using Console-I/O API.) Conclusion: many application would not react on HEX input events. Moreover, what happens with a “Pasted” character depends on the current keyboard layout: if the character can be typed without using prefix keys (but with arbitrary complicated combination of modifiers, as in Ctrl-Alt-AltGr-Kana-Shift-Gray*) then it is delivered on an emulated keypress. This is what any application expects — so pasting anything which contains only such characters is fine. However, the “other” characters are delivered by emulating HEX input. Conclusion: unless your keyboard layout supports input of A LOT of characters without prefix keys, some buggy applications may skip characters when you Paste via Console’s UI: Alt-Space E P. (This is why I recommend using my keyboard layouts!)
我们还应该记住,Windows上的“替代的、更强大的”主机根本不是主机。它们不支持Console-I/O api,因此依赖这些api工作的程序将无法正常工作。(不过,只使用“文件- i /O api到控制台文件句柄”的程序可以很好地工作。)
微软的Powershell就是一个非主机的例子。我不用它;要进行实验,按下并释放WinKey,然后输入powershell。
(另一方面,有一些程序,如ConEmu或ANSICON,试图做更多的事情:他们“试图”拦截控制台i /O api,以使“真正的控制台应用程序”也能工作。这绝对适用于玩具示例程序;在现实生活中,这可能解决不了您的特定问题。实验。)
总结
设置字体,键盘布局(并可选地,允许十六进制输入)。 只使用通过Console-I/O api的程序,并接受Unicode命令行参数。例如,任何由cygwin编译的程序都可以。正如我已经说过的,CMD也很好。
UPD:最初,对于cp65001中的一个错误,我混淆了内核层和CRTL层(UPD²:和Windows用户模式API!)另外:Win8修复了这个错误的一半;我澄清了关于“更好的控制台”应用程序的部分,并添加了Python如何做到这一点的参考。
从2019年6月开始,使用Windows 10,你将不必更改代码页。
参见“介绍Windows终端”(来自Kayla Cinnamon)和Microsoft/Terminal。 通过使用Consolas字体,将提供部分Unicode支持。
如Microsoft/Terminal issue 387中所述:
There are 87,887 ideographs currently in Unicode. You need all of them too? We need a boundary, and characters beyond that boundary should be handled by font fallback / font linking / whatever. What Consolas should cover: Characters that used as symbols that used by modern OSS programs in CLI. These characters should follow Consolas' design and metrics, and properly aligned with existing Consolas characters. What Consolas should NOT cover: Characters and punctuation of scripts that beyond Latin, Greek and Cyrillic, especially characters need complex shaping (like Arabic). These characters should be handled with font fallback.