我想在正则表达式中使用一个变量,我如何在Python中做到这一点?
TEXTO = sys.argv[1]
if re.search(r"\b(?=\w)TEXTO\b(?!\w)", subject, re.IGNORECASE):
# Successful match
else:
# Match attempt failed
我想在正则表达式中使用一个变量,我如何在Python中做到这一点?
TEXTO = sys.argv[1]
if re.search(r"\b(?=\w)TEXTO\b(?!\w)", subject, re.IGNORECASE):
# Successful match
else:
# Match attempt failed
当前回答
我同意以上所有观点,除非:
sys。argv[1]有点像Chicken\d{2}-\d{2}一个\s*重要的\s*锚
sys.argv[1] = "Chicken\d{2}-\d{2}An\s*important\s*anchor"
你不会想要使用re.escape,因为在这种情况下,你希望它表现得像一个正则表达式
TEXTO = sys.argv[1]
if re.search(r"\b(?<=\w)" + TEXTO + "\b(?!\w)", subject, re.IGNORECASE):
# Successful match
else:
# Match attempt failed
其他回答
你必须以字符串的形式构建正则表达式:
TEXTO = sys.argv[1]
my_regex = r"\b(?=\w)" + re.escape(TEXTO) + r"\b(?!\w)"
if re.search(my_regex, subject, re.IGNORECASE):
etc.
注意re.escape的使用,这样如果你的文本有特殊字符,它们就不会被这样解释。
if re.search(r"\b(?<=\w)%s\b(?!\w)" % TEXTO, subject, re.IGNORECASE):
这将把TEXTO中的内容作为字符串插入到正则表达式中。
from re import search, IGNORECASE
def is_string_match(word1, word2):
# Case insensitively function that checks if two words are the same
# word1: string
# word2: string | list
# if the word1 is in a list of words
if isinstance(word2, list):
for word in word2:
if search(rf'\b{word1}\b', word, IGNORECASE):
return True
return False
# if the word1 is same as word2
if search(rf'\b{word1}\b', word2, IGNORECASE):
return True
return False
is_match_word = is_string_match("Hello", "hELLO")
True
is_match_word = is_string_match("Hello", ["Bye", "hELLO", "@vagavela"])
True
is_match_word = is_string_match("Hello", "Bye")
False
你也可以使用format关键字。Format方法会将{}占位符替换为您传递给Format方法作为参数的变量。
if re.search(r"\b(?=\w)**{}**\b(?!\w)".**format(TEXTO)**, subject, re.IGNORECASE):
# Successful match**strong text**
else:
# Match attempt failed
我需要搜索彼此相似的用户名,Ned Batchelder说的非常有用。然而,当我使用re.compile创建我的re搜索项时,我发现我有更清晰的输出:
pattern = re.compile(r"("+username+".*):(.*?):(.*?):(.*?):(.*)"
matches = re.findall(pattern, lines)
输出可以使用以下命令打印:
print(matches[1]) # prints one whole matching line (in this case, the first line)
print(matches[1][3]) # prints the fourth character group (established with the parentheses in the regex statement) of the first line.