我想使用.replace函数替换多个字符串。

我目前有

string.replace("condition1", "")

但想要一些像

string.replace("condition1", "").replace("condition2", "text")

尽管这样的语法感觉不太好

正确的做法是什么?有点像在grep/regex中,你可以用\1和\2来替换某些搜索字符串的字段


当前回答

下面是一个支持基本正则表达式替换的版本。主要的限制是表达式不能包含子组,并且可能存在一些边缘情况:

基于@bgusach和其他的代码

import re

class StringReplacer:

    def __init__(self, replacements, ignore_case=False):
        patterns = sorted(replacements, key=len, reverse=True)
        self.replacements = [replacements[k] for k in patterns]
        re_mode = re.IGNORECASE if ignore_case else 0
        self.pattern = re.compile('|'.join(("({})".format(p) for p in patterns)), re_mode)
        def tr(matcher):
            index = next((index for index,value in enumerate(matcher.groups()) if value), None)
            return self.replacements[index]
        self.tr = tr

    def __call__(self, string):
        return self.pattern.sub(self.tr, string)

测试

table = {
    "aaa"    : "[This is three a]",
    "b+"     : "[This is one or more b]",
    r"<\w+>" : "[This is a tag]"
}

replacer = StringReplacer(table, True)

sample1 = "whatever bb, aaa, <star> BBB <end>"

print(replacer(sample1))

# output: 
# whatever [This is one or more b], [This is three a], [This is a tag] [This is one or more b] [This is a tag]

诀窍是通过位置来识别匹配的组。它不是超级高效(O(n)),但它是有效的。

index = next((index for index,value in enumerate(matcher.groups()) if value), None)

替换是一次完成的。

其他回答

你可以做一个漂亮的循环函数。

def replace_all(text, dic):
    for i, j in dic.iteritems():
        text = text.replace(i, j)
    return text

其中text是完整的字符串,dic是字典-每个定义都是一个字符串,将替换与术语匹配的字符串。

注意:在Python 3中,iteritems()已被items()取代


注意:Python字典没有迭代的可靠顺序。此解决方案仅在以下情况下解决您的问题:

替换的顺序无关紧要 替换者可以改变之前替换者的结果

更新:上述与插入顺序相关的语句不适用于大于或等于3.6的Python版本,因为标准字典已更改为使用插入顺序进行迭代。

例如:

d = { "cat": "dog", "dog": "pig"}
my_sentence = "This is my cat and this is my dog."
replace_all(my_sentence, d)
print(my_sentence)

可能输出#1:

"This is my pig and this is my pig."

可能的输出#2

"This is my dog and this is my pig."

一个可能的解决方法是使用OrderedDict。

from collections import OrderedDict
def replace_all(text, dic):
    for i, j in dic.items():
        text = text.replace(i, j)
    return text
od = OrderedDict([("cat", "dog"), ("dog", "pig")])
my_sentence = "This is my cat and this is my dog."
replace_all(my_sentence, od)
print(my_sentence)

输出:

"This is my pig and this is my pig."

注意事项#2:如果你的文本字符串太大或字典中有很多对,效率就会很低。

sentence='its some sentence with a something text'

def replaceAll(f,Array1,Array2):
    if len(Array1)==len(Array2):
        for x in range(len(Array1)):
            return f.replace(Array1[x],Array2[x])

newSentence=replaceAll(sentence,['a','sentence','something'],['another','sentence','something something'])

print(newSentence)

这只是F.J和mini夸克的一个更简洁的概述,bgusach的伟大回答和最后但决定性的改进。所有你需要实现多个同步字符串替换是以下函数:

def multiple_replace(string, rep_dict):
    pattern = re.compile("|".join([re.escape(k) for k in sorted(rep_dict,key=len,reverse=True)]), flags=re.DOTALL)
    return pattern.sub(lambda x: rep_dict[x.group(0)], string)

用法:

>>>multiple_replace("Do you like cafe? No, I prefer tea.", {'cafe':'tea', 'tea':'cafe', 'like':'prefer'})
'Do you prefer tea? No, I prefer cafe.'

如果您愿意,您可以从这个更简单的函数开始创建自己的专用替换函数。

对于只替换一个字符,使用翻译和str.maketrans是我最喜欢的方法。

Tl;dr > result_string = your_string.translate(str.maketrans(dict_mapping))


demo

my_string = 'This is a test string.'
dict_mapping = {'i': 's', 's': 'S'}
result_good = my_string.translate(str.maketrans(dict_mapping))
result_bad = my_string
for x, y in dict_mapping.items():
    result_bad = result_bad.replace(x, y)
print(result_good)  # ThsS sS a teSt Strsng.
print(result_bad)   # ThSS SS a teSt StrSng.

我需要一个解决方案,其中字符串可以被替换为正则表达式, 例如,通过将多个空格字符替换为一个空格字符来帮助规范化长文本。根据其他人(包括MiniQuark和mmj)的一系列答案,我得出了以下结论:

def multiple_replace(string, reps, re_flags = 0):
    """ Transforms string, replacing keys from re_str_dict with values.
    reps: dictionary, or list of key-value pairs (to enforce ordering;
          earlier items have higher priority).
          Keys are used as regular expressions.
    re_flags: interpretation of regular expressions, such as re.DOTALL
    """
    if isinstance(reps, dict):
        reps = reps.items()
    pattern = re.compile("|".join("(?P<_%d>%s)" % (i, re_str[0])
                                  for i, re_str in enumerate(reps)),
                         re_flags)
    return pattern.sub(lambda x: reps[int(x.lastgroup[1:])][1], string)

它适用于其他答案中给出的例子,例如:

>>> multiple_replace("(condition1) and --condition2--",
...                  {"condition1": "", "condition2": "text"})
'() and --text--'

>>> multiple_replace('hello, world', {'hello' : 'goodbye', 'world' : 'earth'})
'goodbye, earth'

>>> multiple_replace("Do you like cafe? No, I prefer tea.",
...                  {'cafe': 'tea', 'tea': 'cafe', 'like': 'prefer'})
'Do you prefer tea? No, I prefer cafe.'

对我来说,最重要的是你也可以使用正则表达式,例如只替换整个单词,或规范化空白:

>>> s = "I don't want to change this name:\n  Philip II of Spain"
>>> re_str_dict = {r'\bI\b': 'You', r'[\n\t ]+': ' '}
>>> multiple_replace(s, re_str_dict)
"You don't want to change this name: Philip II of Spain"

如果你想使用字典键作为普通字符串, 你可以在调用multiple_replace之前转义这些,例如使用下面的函数:

def escape_keys(d):
    """ transform dictionary d by applying re.escape to the keys """
    return dict((re.escape(k), v) for k, v in d.items())

>>> multiple_replace(s, escape_keys(re_str_dict))
"I don't want to change this name:\n  Philip II of Spain"

下面的函数可以帮助在你的字典键中找到错误的正则表达式(因为来自multiple_replace的错误消息不是很明显):

def check_re_list(re_list):
    """ Checks if each regular expression in list is well-formed. """
    for i, e in enumerate(re_list):
        try:
            re.compile(e)
        except (TypeError, re.error):
            print("Invalid regular expression string "
                  "at position {}: '{}'".format(i, e))

>>> check_re_list(re_str_dict.keys())

请注意,它没有链接替换,而是同时执行它们。这使得它更有效率,而不会限制它能做什么。为了模仿链接的效果,你可能只需要添加更多的字符串替换对,并确保这些对的预期顺序:

>>> multiple_replace("button", {"but": "mut", "mutton": "lamb"})
'mutton'
>>> multiple_replace("button", [("button", "lamb"),
...                             ("but", "mut"), ("mutton", "lamb")])
'lamb'