正则表达式中有哪些特殊字符必须转义?

我厌倦了总是试图猜测，如果我应该转义特殊字符，如'()[]{}|'等使用regexp的许多实现时。

它与Python、sed、grep、awk、Perl、rename、Apache、find等不同。有没有什么规则集告诉我什么时候应该转义，什么时候不应该转义特殊字符?它是否依赖于regexp类型，如PCRE、POSIX或扩展的regexp ?

当前回答

有时候简单的转义对于你所列出的字符是不可能的。例如，在sed中，使用反斜杠来转义括号在替换字符串的左边是行不通的

sed -e 's/foo\(bar/something_else/'

我倾向于只使用一个简单的字符类定义，所以上面的表达式变成

sed -e 's/foo[(]bar/something_else/'

我发现它适用于大多数regexp实现。

顺便说一句，字符类是非常普通的regexp组件，所以它们往往适用于大多数需要在regexp中转义字符的情况。

编辑:在下面的评论之后，我只是想提到一个事实，即在查看regexp求值的行为时，您还必须考虑有限状态自动机和非有限状态自动机之间的区别。

您可能想看看“闪亮的球书”，也就是Effective Perl(经过了亚马逊的清洁链接)，特别是关于正则表达式的章节，以了解regexp引擎求值类型的差异。

不是所有的世界都是一个PCRE!

无论如何，regexp与SNOBOL相比太笨拙了!这是一门有趣的编程课程!还有Simula上的那个。

啊，70年代末在新南威尔士大学学习的乐趣!(-):

2008-12-30 00:09:19

其他回答

不幸的是，确实没有一组转义码，因为它根据您使用的语言而变化。

然而，保留一个像正则表达式工具页面或这个正则表达式小抄表这样的页面可以帮助你快速过滤东西。

2008-12-29 23:42:45

现代正则表达式口味(PCRE)

包括C、c++、Delphi、EditPad、Java、JavaScript、Perl、PHP (preg)、PostgreSQL、PowerGREP、PowerShell、Python、REALbasic、Real Studio、Ruby、TCL、VB。Net, VBScript, wxWidgets, XML Schema, Xojo, XRegExp。PCRE兼容性可能有所不同

不会后悔:。^ $ * + - ?( ) [ ] { } \ |

传统RegEx口味(BRE/ERE)

包括awk, ed, egrep, emacs, GNUlib, grep, PHP (ereg)， MySQL, Oracle, R, sed。PCRE支持可以在后续版本中启用或通过使用扩展启用

纪念awk / egrep / emacs

在字符类之外:。^ $ * + ?() [{} \ | . 在字符类中:^ - []

BRE / ed / grep和sed

在字符类之外:。^ $ * [\ 在字符类中:^ - [] 对于字面量，不要转义:+ ?() {} | 对于标准的正则表达式行为，转义:\+ \? \{\} \|

笔记

If unsure about a specific character, it can be escaped like \xFF Alphanumeric characters cannot be escaped with a backslash Arbitrary symbols can be escaped with a backslash in PCRE, but not BRE/ERE (they must only be escaped when required). For PCRE ] - only need escaping within a character class, but I kept them in a single list for simplicity Quoted expression strings must also have the surrounding quote characters escaped, and often with backslashes doubled-up (like "(\")(/)(\\.)" versus /(")(\/)(\.)/ in JavaScript) Aside from escapes, different regex implementations may support different modifiers, character classes, anchors, quantifiers, and other features. For more details, check out regular-expressions.info, or use regex101.com to test your expressions live

2015-08-25 19:12:56

为了避免担心哪个regex变量和所有定制的特性，只需使用这个通用函数，它涵盖了除了BRE之外的每个regex变量(除非它们有unicode多字节字符是元字符):

jot -s '' -c - 32 126 |

mawk ' 功能重返substr(_ =””, gsub ("[][!-/_\ 140 :-@{-~]","[&]",__), gsub ("["(_="\\\\")"^]",_ "&",__))__ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

 !"#$%&'()*+,-./0123456789:;<=>?
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_
`abcdefghijklmnopqrstuvwxyz{|}~

    [!]["][#][$][%][&]['][(][)][*][+][,][-][.][/]
  0  1  2  3  4  5  6  7  8  9 [:][;][<][=][>][?]
 [@] ABCDEFGHIJKLMNOPQRSTUVWXYZ   [[]\\ []]\^ [_]
 [`] abcdefghijklmnopqrstuvwxyz   [{][|][}][~]

方括号更容易处理，因为没有触发关于“转义太多”的警告信息的风险，例如:

function ____(_) {
    return substr("", gsub("[[:punct:]]","\\\\&",_))_ 
} 

                     \!\"\#\$\%\&\'\(\)\*\+\,\-\.\/ 0123456789\:\;\<\=\>\?
\@ABCDEFGHIJKLMNOPQRSTUVWXYZ\[\\\]\^\_\`abcdefghijklmnopqrstuvwxyz \{\|\}\~

gawk: cmd. line:1: warning: regexp escape sequence `\!' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\"' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\#' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\%' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\&' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\,' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\:' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\;' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\=' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\@' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\_' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\~' is not a known regexp operator

2022-06-01 19:04:57

https://perldoc.perl.org/perlre.html#Quoting-metacharacters和https://perldoc.perl.org/functions/quotemeta.html

在官方文档中，这样的字符称为元字符。引用的例子:

my $regex = quotemeta($string)
s/$regex/something/

2019-04-19 11:03:59

使用Raku(以前称为Perl_6)

工作(反斜杠或引号除下划线以外的所有非字母数字字符):

~$ raku -e 'say $/ if "#.*?" ~~ m/  \# \. \* \?  /; #works fine'
｢#.*?｣

根据Damian Conway的演讲“你所知道的关于正则表达式的一切都是错误的”，正则表达式语言有六种风格。Raku代表了对标准Perl(5)/PCRE正则表达式的重大(大约15年)重做。

在这15年中，Perl_6 / Raku语言专家决定，所有非字母数字字符(下划线除外)都应保留为Regex元字符，即使目前不存在这种用法。要将非字母数字字符(下划线除外)表示为字面量、反斜杠或转义。

因此，上面的例子打印$/ match变量，如果匹配到文字#.*?找到字符序列。下面是如果你不这样做会发生什么:#被解释为注释的开始，。点被解释为任何字符(包括空格)，*星号被解释为零或多个量词，而?问号被解释为0或1量词或节俭(即非贪婪)量词-修饰语(取决于上下文):

错误:

~$ ~$ raku -e 'say $/ if "#.*?" ~~ m/  # . * ?  /; #ERROR!'
===SORRY!===
Regex not terminated.
at -e:1
------> y $/ if "#.*?" ~~ m/ # . * ?  /; #ERROR!⏏<EOL>
Regex not terminated.
at -e:1
------> y $/ if "#.*?" ~~ m/ # . * ?  /; #ERROR!⏏<EOL>
Couldn't find terminator / (corresponding / was at line 1)
at -e:1
------> y $/ if "#.*?" ~~ m/ # . * ?  /; #ERROR!⏏<EOL>
    expecting any of:
        /

https://docs.raku.org/language/regexes https://raku.org/

2023-02-04 07:47:09

正则表达式中有哪些特殊字符必须转义?

推荐文章

最新文章

标签