我知道可以匹配一个单词,然后用其他工具逆转比赛(例如 grep -v)。但是,可以匹配不包含一个特定的单词,例如 hede,使用常规表达式的线条吗?

入口:

hoho
hihi
haha
hede

代码:

grep "<Regex for 'doesn't contain hede'>" input

所需的产量:

hoho
hihi
haha

当前回答

# 一个简单的方式
import re
skip_word = 'hede'
stranger_char = '虩'
content = '''hoho
hihi
haha
hede'''
print(
    '\n'.join(re.findall(
        '([^{}]*?)\n'.format(stranger_char), 
        content.replace(skip_word, stranger_char)
    )).replace(stranger_char, skip_word) 
)

# hoho
# hihi
# haha

其他回答

grep "<Regex for 'doesn't contain hede'>" input

原因在于,没有旗帜,POSIX接口只需要使用基本常规表达式(BRE),这些表达式只是不足以完成这个任务,因为缺乏替代的子表达式。

grep "^\([^h]\|h\(h\|eh\|edh\)*\([^eh]\|e[^dh]\|ed[^eh]\)\)*\(\|h\(h\|eh\|edh\)*\(\|e\|ed\)\)$" input

(与格雷尔和一些额外的优化手工完成)。

egrep "^([^h]|h(h|eh|edh)*([^eh]|e[^dh]|ed[^eh]))*(|h(h|eh|edh)*(|e|ed))$" input

#!/bin/bash
REGEX="^\([^h]\|h\(h\|eh\|edh\)*\([^eh]\|e[^dh]\|ed[^eh]\)\)*\(\|h\(h\|eh\|edh\)*\(\|e\|ed\)\)$"

# First four lines as in OP's testcase.
cat > testinput.txt <<EOF
hoho
hihi
haha
hede

h
he
ah
head
ahead
ahed
aheda
ahede
hhede
hehede
hedhede
hehehehehehedehehe
hedecidedthat
EOF
diff -s -u <(grep -v hede testinput.txt) <(grep "$REGEX" testinput.txt)

Files /dev/fd/63 and /dev/fd/62 are identical

如预期。

对于那些对细节感兴趣的人来说,使用的技术是将与词相匹配的常规表达式转换为终端自动,然后转换自动,将每个接受状态转换为不接受,反之亦然,然后将结果的FA转换为常规表达式。

grep -P '^((?!hede).)*$' input

^([^h]|h(h|e(h|dh))*([^eh]|e([^dh]|d[^eh])))*(h(h|e(h|dh))*(ed?)?)?$

不是雷格斯,但我发现使用带管的序列粘贴是合乎逻辑和有用的,以消除噪音。

例如,搜索一个 Apache 配置文件,没有所有评论 -

grep -v '\#' /opt/lampp/etc/httpd.conf      # this gives all the non-comment lines

grep -v '\#' /opt/lampp/etc/httpd.conf |  grep -i dir

序列格雷普的逻辑是(不是一个评论)和(比赛是)

自推出Ruby-2.4.1以来,我们可以在Ruby的常规表达中使用新缺席运营商。

官方DOC

(?~abc) matches: "", "ab", "aab", "cccc", etc.
It doesn't match: "abc", "aabc", "ccccabc", etc.

因此,在你的情况下 ^(?~hede)$ 为你做工作

2.4.1 :016 > ["hoho", "hihi", "haha", "hede"].select{|s| /^(?~hede)$/.match(s)}
 => ["hoho", "hihi", "haha"]

基准

我决定评估一些提交的选项,并比较其性能,以及使用一些新功能。

参考文本:

第一 7 行不应匹配,因为它们包含所搜索的表达式,而下 7 行应匹配!

Regex Hero is a real-time online Silverlight Regular Expression Tester.
XRegex Hero is a real-time online Silverlight Regular Expression Tester.
Regex HeroRegex HeroRegex HeroRegex HeroRegex Hero is a real-time online Silverlight Regular Expression Tester.
Regex Her Regex Her Regex Her Regex Her Regex Her Regex Her Regex Hero is a real-time online Silverlight Regular Expression Tester.
Regex Her is a real-time online Silverlight Regular Expression Tester.Regex Hero
egex Hero egex Hero egex Hero egex Hero egex Hero egex Hero Regex Hero is a real-time online Silverlight Regular Expression Tester.
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRegex Hero is a real-time online Silverlight Regular Expression Tester.

Regex Her
egex Hero
egex Hero is a real-time online Silverlight Regular Expression Tester.
Regex Her is a real-time online Silverlight Regular Expression Tester.
Regex Her Regex Her Regex Her Regex Her Regex Her Regex Her is a real-time online Silverlight Regular Expression Tester.
Nobody is a real-time online Silverlight Regular Expression Tester.
Regex Her o egex Hero Regex  Hero Reg ex Hero is a real-time online Silverlight Regular Expression Tester.

结果:

结果是每秒以 3 轮的平均值 - 大数 = 更好

01: ^((?!Regex Hero).)*$                    3.914   // Accepted Answer
02: ^(?:(?!Regex Hero).)*$                  5.034   // With Non-Capturing group
03: ^(?!.*?Regex Hero).*                   7.356   // Lookahead at the beginning, if not found match everything
04: ^(?>[^R]+|R(?!egex Hero))*$             6.137   // Lookahead only on the right first letter
05: ^(?>(?:.*?Regex Hero)?)^.*$             7.426   // Match the word and check if you're still at linestart
06: ^(?(?=.*?Regex Hero)(?#fail)|.*)$       7.371   // Logic Branch: Find Regex Hero? match nothing, else anything

P1: ^(?(?=.*?Regex Hero)(*FAIL)|(*ACCEPT))  ?????   // Logic Branch in Perl - Quick FAIL
P2: .*?Regex Hero(*COMMIT)(*FAIL)|(*ACCEPT) ?????   // Direct COMMIT & FAIL in Perl

由于.NET 不支持操作 Verbs (*FAIL 等) 我无法测试 P1 和 P2 解决方案。

总结:

总的来说,最可读和最快的解决方案似乎是03与一个简单的负面视图,这也是JavaScript的最快的解决方案,因为JS不支持其他解决方案的更先进的Regex功能。

用此,你避免在每个位置测试一个 lookahead:

/^(?:[^h]+|h++(?!ede))*+$/

相当于(为.net ):

^(?>(?:[^h]+|h+(?!ede))*)$

老答案:

/^(?>[^h]+|h+(?!ede))*$/