Windows FINDSTR命令的未记录的特性和限制是什么?

继续第一部分的回答-我已经达到了30,000个字符的回答限制:-(

有限的正则表达式支持 FINDSTR对正则表达式的支持非常有限。如果HELP文档中没有，则不支持。

除此之外，所支持的regex表达式是以完全非标准的方式实现的，因此结果可能与来自grep或perl等程序的预期结果不同。

正则表达式行位置锚^和$ ^匹配输入流的开始以及紧跟在<LF>后面的任何位置。因为FINDSTR也会在<LF>之后换行，所以简单的“^”正则表达式将始终匹配文件中的所有行，甚至是二进制文件。

$匹配紧挨着<CR>之前的任何位置。这意味着包含$的正则表达式搜索字符串永远不会匹配Unix样式文本文件中的任何行，如果缺少EOL标记<CR><LF>，它也不会匹配Windows文本文件的最后一行。

注意-如前所述，管道和重定向到FINDSTR的输入可能附加了不在源中的<CR><LF>。显然，这会影响使用$的正则表达式搜索。

任何在^之前或$之后有字符的搜索字符串总是找不到匹配。

位置选项/B /E /X 位置选项的工作原理与^和$相同，除了它们也适用于文字搜索字符串。

/B的功能与正则表达式搜索字符串开头的^相同。

/E的功能与正则表达式搜索字符串末尾的$相同。

/X的功能与正则表达式搜索字符串的开头和结尾都有^和$相同。

正则表达式字边界 \<必须是正则表达式中的第一个项。如果正则表达式前面有其他字符，则正则表达式将不匹配任何字符。\<对应于输入的最开始，一行的开始(紧跟在<LF>后面的位置)，或者紧跟在任何“非单词”字符后面的位置。下一个字符不必是“单词”字符。

\>必须是正则表达式中的最后一项。如果正则表达式后面有其他字符，则它将不匹配任何字符。\>对应于输入的结尾，紧挨着<CR>之前的位置，或者紧挨着任何“非单词”字符之前的位置。前面的字符不必是“word”字符。

下面是“非单词”字符的完整列表，用十进制字节代码表示。注意:这个列表是在一台美国机器上编译的。我不知道其他语言对这个列表会有什么影响。

001   028   063   179   204   230
002   029   064   180   205   231
003   030   091   181   206   232
004   031   092   182   207   233
005   032   093   183   208   234
006   033   094   184   209   235
007   034   096   185   210   236
008   035   123   186   211   237
009   036   124   187   212   238
011   037   125   188   213   239
012   038   126   189   214   240
014   039   127   190   215   241
015   040   155   191   216   242
016   041   156   192   217   243
017   042   157   193   218   244
018   043   158   194   219   245
019   044   168   195   220   246
020   045   169   196   221   247
021   046   170   197   222   248
022   047   173   198   223   249
023   058   174   199   224   250
024   059   175   200   226   251
025   060   176   201   227   254
026   061   177   202   228   255
027   062   178   203   229

正则表达式字符类别范围[x-y] 字符类别范围不能正常工作。请看这个问题:为什么findstr不能正确地处理case(在某些情况下)?，并附上这个答案:https://stackoverflow.com/a/8767815/1012053。

问题是FINDSTR不按字节码值(通常认为是ASCII码，但ASCII仅从0x00 - 0x7F定义)对字符进行排序。大多数regex实现将[A-Z]视为所有大写英文大写字母。但是FINDSTR使用大致对应于SORT工作方式的排序序列。所以[a - z]包括完整的英语字母，包括大写字母和小写字母(除了“a”)，以及带变音符的非英语alpha字符。

下面是FINDSTR支持的所有字符的完整列表，按照FINDSTR用于建立正则表达式字符类范围的排序顺序进行排序。字符以十进制字节码值表示。如果使用代码页437查看字符，我认为排序顺序最有意义。注意:这个列表是在一台美国机器上编译的。我不知道其他语言对这个列表会有什么影响。

Regex character class term limit and BUG Not only is FINDSTR limited to a maximum of 15 character class terms within a regex, it fails to properly handle an attempt to exceed the limit. Using 16 or more character class terms results in an interactive Windows pop up stating "Find String (QGREP) Utility has encountered a problem and needs to close. We are sorry for the inconvenience." The message text varies slightly depending on the Windows version. Here is one example of a FINDSTR that will fail:

echo 01234567890123456|findstr [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]

DosTips用户Judago在这里报告了这个错误。在XP、Vista和Windows 7上已经得到确认。

Regex searches fail (and may hang indefinitely) if they include byte code 0xFF (decimal 255) Any regex search that includes byte code 0xFF (decimal 255) will fail. It fails if byte code 0xFF is included directly, or if it is implicitly included within a character class range. Remember that FINDSTR character class ranges do not collate characters based on the byte code value. Character <0xFF> appears relatively early in the collation sequence between the <space> and <tab> characters. So any character class range that includes both <space> and <tab> will fail.

具体的行为根据Windows版本稍有不同。如果包含0xFF, Windows 7将无限期挂起。XP不会挂起，但它总是找不到匹配的，并偶尔打印以下错误消息——“进程试图写入不存在的管道。”

我不能再使用Vista的机器，所以我不能在Vista上测试。

正则表达式错误:。和[^anySet]可以匹配End-Of-File 正则表达式。元字符只能匹配除<CR>或<LF>之外的任何字符。如果文件中的最后一行不以<CR>或<LF>结束，则允许它匹配End-Of-File。然而，。将不匹配空文件。

例如，一个名为"test.txt"的文件包含一行x，没有终止<CR>或<LF>，将匹配以下内容:

findstr /r x......... test.txt

这个bug已经在XP和Win7上得到确认。

负字符集似乎也是如此。像[^abc]这样的东西将匹配End-Of-File。像[abc]这样的正面字符集似乎很有效。我只在Win7上测试过这个功能。

2013-11-23 06:03:30

当几个命令被括在括号中，并且有重定向文件到整个块:

< input.txt (
   command1
   command2
   . . .
) > output.txt

．.．然后，只要块中的命令处于活动状态，文件就保持打开状态，因此这些命令可能会移动重定向文件的文件指针。MORE和FIND命令在处理Stdin文件之前都将Stdin文件指针移动到文件的开头，因此同一个文件可能在块中被处理多次。例如，下面的代码:

more < input.txt >  output.txt
more < input.txt >> output.txt

．.．产生与此相同的结果:

< input.txt (
   more
   more
) > output.txt

这段代码:

find    "search string" < input.txt > matchedLines.txt
find /V "search string" < input.txt > unmatchedLines.txt

．.．产生与此相同的结果:

< input.txt (
   find    "search string" > matchedLines.txt
   find /V "search string" > unmatchedLines.txt
)

FINDSTR则不同;它不会将Stdin文件指针从当前位置移动。例如，这段代码在搜索行之后插入新行:

call :ProcessFile < input.txt
goto :EOF

:ProcessFile
   rem Read the next line from Stdin and copy it
   set /P line=
   echo %line%
   rem Test if it is the search line
   if "%line%" neq "search line" goto ProcessFile
rem Insert the new line at this point
echo New line
rem And copy the rest of lines
findstr "^"
exit /B

我们可以在辅助程序的帮助下很好地利用这个特性，该辅助程序允许我们移动重定向文件的文件指针，如本例所示。

这种行为最初是由jeb在这篇文章中报告的。

编辑2018-08-18:报告新的FINDSTR错误

FINDSTR命令有一个奇怪的错误，当这个命令用于显示字符的颜色，并且这样一个命令的输出被重定向到CON设备时发生。有关如何使用FINDSTR命令以颜色显示文本的详细信息，请参阅本主题。

When the output of this form of FINDSTR command is redirected to CON, something strange happens after the text is output in the desired color: all the text after it is output as "invisible" characters, although a more precise description is that the text is output as black text over black background. The original text will appear if you use COLOR command to reset the foreground and background colors of the entire screen. However, when the text is "invisible" we could execute a SET /P command, so all characters entered will not appear on the screen. This behavior may be used to enter passwords.

@echo off
setlocal

set /P "=_" < NUL > "Enter password"
findstr /A:1E /V "^$" "Enter password" NUL > CON
del "Enter password"
set /P "password="
cls
color 07
echo The password read is: "%password%"

2015-02-02 13:17:38

在搜索大文件时，Findstr有时会意外挂起。

我还没有确认具体的条件和边界大小。我怀疑任何大于2GB的文件都可能存在风险。

我在这方面有过复杂的经历，所以这不仅仅是文件大小的问题。如果重定向输入不以LF结束，这看起来可能是挂在XP和Windows 7上的FINDSTR的变体，但正如所演示的，当输入没有重定向时，这个特定的问题就会出现。

下面的命令行会话(Windows 7)演示了findstr在搜索3GB文件时如何挂起。

C:\Data\Temp\2014-04>echo 1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890> T100B.txt

C:\Data\Temp\2014-04>for /L %i in (1,1,10) do @type T100B.txt >> T1KB.txt

C:\Data\Temp\2014-04>for /L %i in (1,1,1000) do @type T1KB.txt >> T1MB.txt

C:\Data\Temp\2014-04>for /L %i in (1,1,1000) do @type T1MB.txt >> T1GB.txt

C:\Data\Temp\2014-04>echo find this line>> T1GB.txt

C:\Data\Temp\2014-04>copy T1GB.txt + T1GB.txt + T1GB.txt T3GB.txt
T1GB.txt
T1GB.txt
T1GB.txt
        1 file(s) copied.

C:\Data\Temp\2014-04>dir
 Volume in drive C has no label.
 Volume Serial Number is D2B2-FFDF

 Directory of C:\Data\Temp\2014-04

2014/04/08  04:28 PM    <DIR>          .
2014/04/08  04:28 PM    <DIR>          ..
2014/04/08  04:22 PM               102 T100B.txt
2014/04/08  04:28 PM     1 020 000 016 T1GB.txt
2014/04/08  04:23 PM             1 020 T1KB.txt
2014/04/08  04:23 PM         1 020 000 T1MB.txt
2014/04/08  04:29 PM     3 060 000 049 T3GB.txt
               5 File(s)  4 081 021 187 bytes
               2 Dir(s)  51 881 050 112 bytes free
C:\Data\Temp\2014-04>rem Findstr on the 1GB file does not hang

C:\Data\Temp\2014-04>findstr "this" T1GB.txt
find this line

C:\Data\Temp\2014-04>rem On the 3GB file, findstr hangs and must be aborted... even though it clearly reaches end of file

C:\Data\Temp\2014-04>findstr "this" T3GB.txt
find this line
find this line
find this line
^C
C:\Data\Temp\2014-04>

注意，我已经在十六进制编辑器中验证了所有行都以CRLF结束。唯一的异常是，由于复制的工作方式，文件以0x1A终止。但是请注意，这种异常不会对“小”文件造成问题。

通过额外的测试，我确认如下:

Using copy with the /b option for binary files prevents the addition of the 0x1A character, and findstr doesn't hang on the 3GB file. Terminating the 3GB file with a different character also causes a findstr to hang. The 0x1A character doesn't cause any problems on a "small" file. (Similarly for other terminating characters.) Adding CRLF after 0x1A resolves the problem. (LF by itself would probably suffice.) Using type to pipe the file into findstr works without hanging. (This might be due to a side effect of either type or | that inserts an additional End Of Line.) Use redirected input < also causes findstr to hang. But this is expected; as explained in dbenham's post: "redirected input must end in LF".

2014-04-08 16:34:56

FINDSTR有一个颜色错误，我在https://superuser.com/questions/1535810/is-there-a-better-way-to-mitigate-this-obscure-color-bug-when-piping-to-findstr/1538802?noredirect=1#comment2339443_1538802上描述并解决了这个错误

总结一下这个线程，这个错误是，如果输入在括号内的代码块中被输送到FINDSTR，内联ANSI转义颜色代码将在以后执行的命令中停止工作。内联颜色代码的一个例子是:echo %magenta%Alert:发生了不好的事情%yellow%(其中magenta和yellow是前面在.bat文件中定义的变量，作为对应的ANSI转义颜色代码)。

我最初的解决方案是在FINDSTR之后调用一个什么都不做的子例程。调用或返回以某种方式“重置”需要重置的东西。

后来我发现了另一个可能更有效的解决方案:将FINDSTR短语放在括号内，如下例所示: echo success | (FINDSTR /R success) 将FINDSTR短语放置在嵌套代码块中似乎可以隔离FINDSTR的colorcode错误，这样它就不会影响嵌套代码块之外的内容。也许这种技术还可以解决其他一些不希望看到的FINDSTR副作用。

2020-04-09 13:53:12

继续第一部分的回答-我已经达到了30,000个字符的回答限制:-(

有限的正则表达式支持 FINDSTR对正则表达式的支持非常有限。如果HELP文档中没有，则不支持。

除此之外，所支持的regex表达式是以完全非标准的方式实现的，因此结果可能与来自grep或perl等程序的预期结果不同。

正则表达式行位置锚^和$ ^匹配输入流的开始以及紧跟在<LF>后面的任何位置。因为FINDSTR也会在<LF>之后换行，所以简单的“^”正则表达式将始终匹配文件中的所有行，甚至是二进制文件。

$匹配紧挨着<CR>之前的任何位置。这意味着包含$的正则表达式搜索字符串永远不会匹配Unix样式文本文件中的任何行，如果缺少EOL标记<CR><LF>，它也不会匹配Windows文本文件的最后一行。

注意-如前所述，管道和重定向到FINDSTR的输入可能附加了不在源中的<CR><LF>。显然，这会影响使用$的正则表达式搜索。

任何在^之前或$之后有字符的搜索字符串总是找不到匹配。

位置选项/B /E /X 位置选项的工作原理与^和$相同，除了它们也适用于文字搜索字符串。

/B的功能与正则表达式搜索字符串开头的^相同。

/E的功能与正则表达式搜索字符串末尾的$相同。

/X的功能与正则表达式搜索字符串的开头和结尾都有^和$相同。

正则表达式字边界 \<必须是正则表达式中的第一个项。如果正则表达式前面有其他字符，则正则表达式将不匹配任何字符。\<对应于输入的最开始，一行的开始(紧跟在<LF>后面的位置)，或者紧跟在任何“非单词”字符后面的位置。下一个字符不必是“单词”字符。

\>必须是正则表达式中的最后一项。如果正则表达式后面有其他字符，则它将不匹配任何字符。\>对应于输入的结尾，紧挨着<CR>之前的位置，或者紧挨着任何“非单词”字符之前的位置。前面的字符不必是“word”字符。

下面是“非单词”字符的完整列表，用十进制字节代码表示。注意:这个列表是在一台美国机器上编译的。我不知道其他语言对这个列表会有什么影响。

001   028   063   179   204   230
002   029   064   180   205   231
003   030   091   181   206   232
004   031   092   182   207   233
005   032   093   183   208   234
006   033   094   184   209   235
007   034   096   185   210   236
008   035   123   186   211   237
009   036   124   187   212   238
011   037   125   188   213   239
012   038   126   189   214   240
014   039   127   190   215   241
015   040   155   191   216   242
016   041   156   192   217   243
017   042   157   193   218   244
018   043   158   194   219   245
019   044   168   195   220   246
020   045   169   196   221   247
021   046   170   197   222   248
022   047   173   198   223   249
023   058   174   199   224   250
024   059   175   200   226   251
025   060   176   201   227   254
026   061   177   202   228   255
027   062   178   203   229

正则表达式字符类别范围[x-y] 字符类别范围不能正常工作。请看这个问题:为什么findstr不能正确地处理case(在某些情况下)?，并附上这个答案:https://stackoverflow.com/a/8767815/1012053。

问题是FINDSTR不按字节码值(通常认为是ASCII码，但ASCII仅从0x00 - 0x7F定义)对字符进行排序。大多数regex实现将[A-Z]视为所有大写英文大写字母。但是FINDSTR使用大致对应于SORT工作方式的排序序列。所以[a - z]包括完整的英语字母，包括大写字母和小写字母(除了“a”)，以及带变音符的非英语alpha字符。

下面是FINDSTR支持的所有字符的完整列表，按照FINDSTR用于建立正则表达式字符类范围的排序顺序进行排序。字符以十进制字节码值表示。如果使用代码页437查看字符，我认为排序顺序最有意义。注意:这个列表是在一台美国机器上编译的。我不知道其他语言对这个列表会有什么影响。

Regex character class term limit and BUG Not only is FINDSTR limited to a maximum of 15 character class terms within a regex, it fails to properly handle an attempt to exceed the limit. Using 16 or more character class terms results in an interactive Windows pop up stating "Find String (QGREP) Utility has encountered a problem and needs to close. We are sorry for the inconvenience." The message text varies slightly depending on the Windows version. Here is one example of a FINDSTR that will fail:

echo 01234567890123456|findstr [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]

DosTips用户Judago在这里报告了这个错误。在XP、Vista和Windows 7上已经得到确认。

Regex searches fail (and may hang indefinitely) if they include byte code 0xFF (decimal 255) Any regex search that includes byte code 0xFF (decimal 255) will fail. It fails if byte code 0xFF is included directly, or if it is implicitly included within a character class range. Remember that FINDSTR character class ranges do not collate characters based on the byte code value. Character <0xFF> appears relatively early in the collation sequence between the <space> and <tab> characters. So any character class range that includes both <space> and <tab> will fail.

具体的行为根据Windows版本稍有不同。如果包含0xFF, Windows 7将无限期挂起。XP不会挂起，但它总是找不到匹配的，并偶尔打印以下错误消息——“进程试图写入不存在的管道。”

我不能再使用Vista的机器，所以我不能在Vista上测试。

正则表达式错误:。和[^anySet]可以匹配End-Of-File 正则表达式。元字符只能匹配除<CR>或<LF>之外的任何字符。如果文件中的最后一行不以<CR>或<LF>结束，则允许它匹配End-Of-File。然而，。将不匹配空文件。

例如，一个名为"test.txt"的文件包含一行x，没有终止<CR>或<LF>，将匹配以下内容:

findstr /r x......... test.txt

这个bug已经在XP和Win7上得到确认。

负字符集似乎也是如此。像[^abc]这样的东西将匹配End-Of-File。像[abc]这样的正面字符集似乎很有效。我只在Win7上测试过这个功能。

2013-11-23 06:03:30

/D对于多个目录的提示:将目录列表放在搜索字符串之前。这些都有效:

findstr /D:dir1;dir2 "searchString" *.*
findstr /D:"dir1;dir2" "searchString" *.*
findstr /D:"\path\dir1\;\path\dir2\" "searchString" *.*

正如预期的那样，如果不以\开始目录，路径是相对于位置的。如果目录名中没有空格，则用"包围路径是可选的。结尾\是可选的。location的输出将包括您给它的任何路径。它可以在目录列表周围使用“”，也可以不使用。

2015-01-22 20:11:26

Windows FINDSTR命令的未记录的特性和限制是什么?

推荐文章

最新文章

标签