正则表达式中有哪些特殊字符必须转义?

我厌倦了总是试图猜测，如果我应该转义特殊字符，如'()[]{}|'等使用regexp的许多实现时。

它与Python、sed、grep、awk、Perl、rename、Apache、find等不同。有没有什么规则集告诉我什么时候应该转义，什么时候不应该转义特殊字符?它是否依赖于regexp类型，如PCRE、POSIX或扩展的regexp ?

当前回答

现代正则表达式口味(PCRE)

包括C、c++、Delphi、EditPad、Java、JavaScript、Perl、PHP (preg)、PostgreSQL、PowerGREP、PowerShell、Python、REALbasic、Real Studio、Ruby、TCL、VB。Net, VBScript, wxWidgets, XML Schema, Xojo, XRegExp。PCRE兼容性可能有所不同

不会后悔:。^ $ * + - ?( ) [ ] { } \ |

传统RegEx口味(BRE/ERE)

包括awk, ed, egrep, emacs, GNUlib, grep, PHP (ereg)， MySQL, Oracle, R, sed。PCRE支持可以在后续版本中启用或通过使用扩展启用

纪念awk / egrep / emacs

在字符类之外:。^ $ * + ?() [{} \ | . 在字符类中:^ - []

BRE / ed / grep和sed

在字符类之外:。^ $ * [\ 在字符类中:^ - [] 对于字面量，不要转义:+ ?() {} | 对于标准的正则表达式行为，转义:\+ \? \{\} \|

笔记

If unsure about a specific character, it can be escaped like \xFF Alphanumeric characters cannot be escaped with a backslash Arbitrary symbols can be escaped with a backslash in PCRE, but not BRE/ERE (they must only be escaped when required). For PCRE ] - only need escaping within a character class, but I kept them in a single list for simplicity Quoted expression strings must also have the surrounding quote characters escaped, and often with backslashes doubled-up (like "(\")(/)(\\.)" versus /(")(\/)(\.)/ in JavaScript) Aside from escapes, different regex implementations may support different modifiers, character classes, anchors, quantifiers, and other features. For more details, check out regular-expressions.info, or use regex101.com to test your expressions live

2015-08-25 19:12:56

其他回答

对于Ionic (Typescript)，你必须用双斜杠来转义字符。例如(这是为了匹配一些特殊字符):

"^(?=.*[\\]\\[!¡\'=ªº\\-\\_ç@#$%^&*(),;\\.?\":{}|<>\+\\/])"

注意这个]- _。/字符。它们必须被一分为二。如果不这样做，代码中就会出现类型错误。

2019-09-12 19:32:40

哪些字符必须转义，哪些字符不能转义，实际上取决于您使用的正则表达式类型。

对于PCRE和大多数其他所谓的perl兼容版本，转义这些外部字符类:

.^$*+?()[{\|

这些内部字符类:

^-]\

对于POSIX扩展正则表达式(ERE)，转义这些外部字符类(与PCRE相同):

.^$*+?()[{\|

转义任何其他字符是POSIX ERE的错误。

在字符类中，反斜杠是POSIX正则表达式中的一个文字字符。你不能用它来逃避任何事情。如果您希望将字符类元字符作为文字包含，则必须使用“巧妙的放置”。将^放在开头以外的任何地方，将]放在开头，将-放在字符类的开头或结尾，以字面上匹配这些字符，例如:

[]^-]

在POSIX基本正则表达式(BRE)中，您需要转义这些元字符以抑制其含义:

.^$*[\

BREs中的转义括号和花括号赋予了它们在EREs中未转义版本的特殊含义。一些实现(例如GNU)在转义时也会赋予其他字符特殊的含义，例如\?和+。转义除。^$*(){}以外的字符通常是BREs的错误。

在字符类内部，bre遵循与EREs相同的规则。

如果所有这些使您头晕目眩，请获取RegexBuddy的副本。在“创建”选项卡上，单击“插入令牌”，然后单击“文字”。RegexBuddy将根据需要添加转义。

2008-12-30 14:01:58

使用Raku(以前称为Perl_6)

工作(反斜杠或引号除下划线以外的所有非字母数字字符):

~$ raku -e 'say $/ if "#.*?" ~~ m/  \# \. \* \?  /; #works fine'
｢#.*?｣

根据Damian Conway的演讲“你所知道的关于正则表达式的一切都是错误的”，正则表达式语言有六种风格。Raku代表了对标准Perl(5)/PCRE正则表达式的重大(大约15年)重做。

在这15年中，Perl_6 / Raku语言专家决定，所有非字母数字字符(下划线除外)都应保留为Regex元字符，即使目前不存在这种用法。要将非字母数字字符(下划线除外)表示为字面量、反斜杠或转义。

因此，上面的例子打印$/ match变量，如果匹配到文字#.*?找到字符序列。下面是如果你不这样做会发生什么:#被解释为注释的开始，。点被解释为任何字符(包括空格)，*星号被解释为零或多个量词，而?问号被解释为0或1量词或节俭(即非贪婪)量词-修饰语(取决于上下文):

错误:

~$ ~$ raku -e 'say $/ if "#.*?" ~~ m/  # . * ?  /; #ERROR!'
===SORRY!===
Regex not terminated.
at -e:1
------> y $/ if "#.*?" ~~ m/ # . * ?  /; #ERROR!⏏<EOL>
Regex not terminated.
at -e:1
------> y $/ if "#.*?" ~~ m/ # . * ?  /; #ERROR!⏏<EOL>
Couldn't find terminator / (corresponding / was at line 1)
at -e:1
------> y $/ if "#.*?" ~~ m/ # . * ?  /; #ERROR!⏏<EOL>
    expecting any of:
        /

https://docs.raku.org/language/regexes https://raku.org/

2023-02-04 07:47:09

要想准确地理解字符串所经过的上下文链，就必须知道何时以及在不进行尝试的情况下进行转义。您将指定从最远的一端到最终目的地(regexp解析代码处理的内存)的字符串。

注意内存中的字符串是如何处理的:if可以是代码中的普通字符串，也可以是输入到命令行的字符串，但a可以是交互式命令行，也可以是shell脚本文件中声明的命令行，也可以是代码中提到的内存中的变量，或者是通过进一步求值的(字符串)参数，或者包含任何类型封装的动态生成的代码的字符串……

每个上下文都赋予了一些具有特殊功能的字符。

When you want to pass the character literally without using its special function (local to the context), than that's the case you have to escape it, for the next context... which might need some other escape characters which might additionally need to be escaped in the preceding context(s). Furthermore there can be things like character encoding (the most insidious is utf-8 because it look like ASCII for common characters, but might be optionally interpreted even by the terminal depending on its settings so it might behave differently, then the encoding attribute of HTML/XML, it's necessary to understand the process precisely right.

E.g. A regexp in the command line starting with perl -npe, needs to be transferred to a set of exec system calls connecting as pipe the file handles, each of this exec system calls just has a list of arguments that were separated by (non escaped)spaces, and possibly pipes(|) and redirection (> N> N>&M), parenthesis, interactive expansion of * and ?, $(()) ... (all this are special characters used by the *sh which might appear to interfere with the character of the regular expression in the next context, but they are evaluated in order: before the command line. The command line is read by a program as bash/sh/csh/tcsh/zsh, essentially inside double quote or single quote the escape is simpler but it is not necessary to quote a string in the command line because mostly the space has to be prefixed with backslash and the quote are not necessary leaving available the expand functionality for characters * and ?, but this parse as different context as within quote. Then when the command line is evaluated the regexp obtained in memory (not as written in the command line) receives the same treatment as it would be in a source file. For regexp there is character-set context within square brackets [ ], perl regular expression can be quoted by a large set of non alfa-numeric characters (E.g. m// or m:/better/for/path: ...).

在其他答案中有关于字符的更多细节，它们非常特定于最终的regexp上下文。正如我注意到的，您提到您发现regexp转义带有尝试，这可能是因为不同的上下文有不同的字符集，这使您对尝试的记忆混淆了(通常在这些不同的上下文中使用反斜杠来转义文字字符，而不是其函数)。

2019-05-05 14:45:45

对于PHP，“在非字母数字前面加上“\”以指定它代表自身总是安全的。”——http://php.net/manual/en/regexp.reference.escape.php。

除非是"或者'。：/

要转义PHP中的正则模式变量(或部分变量)，请使用preg_quote()

2013-10-01 11:22:23

正则表达式中有哪些特殊字符必须转义?

推荐文章

最新文章

标签