我想使用.replace函数替换多个字符串。

我目前有

string.replace("condition1", "")

但想要一些像

string.replace("condition1", "").replace("condition2", "text")

尽管这样的语法感觉不太好

正确的做法是什么?有点像在grep/regex中,你可以用\1和\2来替换某些搜索字符串的字段


当前回答

我在学校作业中也做过类似的练习。这就是我的解

dictionary = {1: ['hate', 'love'],
              2: ['salad', 'burger'],
              3: ['vegetables', 'pizza']}

def normalize(text):
    for i in dictionary:
        text = text.replace(dictionary[i][0], dictionary[i][1])
    return text

自己查看测试字符串上的结果

string_to_change = 'I hate salad and vegetables'
print(normalize(string_to_change))

其他回答

我把这句话建立在fj的精彩回答上:

import re

def multiple_replacer(*key_values):
    replace_dict = dict(key_values)
    replacement_function = lambda match: replace_dict[match.group(0)]
    pattern = re.compile("|".join([re.escape(k) for k, v in key_values]), re.M)
    return lambda string: pattern.sub(replacement_function, string)

def multiple_replace(string, *key_values):
    return multiple_replacer(*key_values)(string)

一针用法:

>>> replacements = (u"café", u"tea"), (u"tea", u"café"), (u"like", u"love")
>>> print multiple_replace(u"Do you like café? No, I prefer tea.", *replacements)
Do you love tea? No, I prefer café.

注意,由于替换只在一次传递中完成,“café”会变成“tea”,但不会变回“café”。

如果你需要做相同的替换多次,你可以很容易地创建一个替换函数:

>>> my_escaper = multiple_replacer(('"','\\"'), ('\t', '\\t'))
>>> many_many_strings = (u'This text will be escaped by "my_escaper"',
                       u'Does this work?\tYes it does',
                       u'And can we span\nmultiple lines?\t"Yes\twe\tcan!"')
>>> for line in many_many_strings:
...     print my_escaper(line)
... 
This text will be escaped by \"my_escaper\"
Does this work?\tYes it does
And can we span
multiple lines?\t\"Yes\twe\tcan!\"

改进:

将代码转换为函数 增加了多线支持 修正了逃跑的错误 容易创建一个函数,用于特定的多个替换

享受吧!: -)

从安德鲁的宝贵答案开始,我开发了一个脚本,从一个文件加载字典,并详细说明所有文件上打开的文件夹做替换。脚本从一个外部文件加载映射,您可以在该文件中设置分隔符。我是一个初学者,但我发现这个脚本在多个文件中做多个替换时非常有用。它在几秒钟内加载了一个包含1000多个条目的字典。这并不优雅,但对我来说很管用

import glob
import re

mapfile = input("Enter map file name with extension eg. codifica.txt: ")
sep = input("Enter map file column separator eg. |: ")
mask = input("Enter search mask with extension eg. 2010*txt for all files to be processed: ")
suff = input("Enter suffix with extension eg. _NEW.txt for newly generated files: ")

rep = {} # creation of empy dictionary

with open(mapfile) as temprep: # loading of definitions in the dictionary using input file, separator is prompted
    for line in temprep:
        (key, val) = line.strip('\n').split(sep)
        rep[key] = val

for filename in glob.iglob(mask): # recursion on all the files with the mask prompted

    with open (filename, "r") as textfile: # load each file in the variable text
        text = textfile.read()

        # start replacement
        #rep = dict((re.escape(k), v) for k, v in rep.items()) commented to enable the use in the mapping of re reserved characters
        pattern = re.compile("|".join(rep.keys()))
        text = pattern.sub(lambda m: rep[m.group(0)], text)

        #write of te output files with the prompted suffice
        target = open(filename[:-4]+"_NEW.txt", "w")
        target.write(text)
        target.close()
sentence='its some sentence with a something text'

def replaceAll(f,Array1,Array2):
    if len(Array1)==len(Array2):
        for x in range(len(Array1)):
            return f.replace(Array1[x],Array2[x])

newSentence=replaceAll(sentence,['a','sentence','something'],['another','sentence','something something'])

print(newSentence)

我觉得这个问题需要一个单行递归lambda函数的答案,只是因为。所以有:

>>> mrep = lambda s, d: s if not d else mrep(s.replace(*d.popitem()), d)

用法:

>>> mrep('abcabc', {'a': '1', 'c': '2'})
'1b21b2'

注:

这将消耗输入字典。 Python字典保留3.6起的键顺序;其他答案中的相应警告不再相关。为了向后兼容,可以使用基于元组的版本:

>>> mrep = lambda s, d: s if not d else mrep(s.replace(*d.pop()), d)
>>> mrep('abcabc', [('a', '1'), ('c', '2')])

注意:与python中的所有递归函数一样,太大的递归深度(即替换字典太大)将导致错误。请看这里。

我需要一个解决方案,其中字符串可以被替换为正则表达式, 例如,通过将多个空格字符替换为一个空格字符来帮助规范化长文本。根据其他人(包括MiniQuark和mmj)的一系列答案,我得出了以下结论:

def multiple_replace(string, reps, re_flags = 0):
    """ Transforms string, replacing keys from re_str_dict with values.
    reps: dictionary, or list of key-value pairs (to enforce ordering;
          earlier items have higher priority).
          Keys are used as regular expressions.
    re_flags: interpretation of regular expressions, such as re.DOTALL
    """
    if isinstance(reps, dict):
        reps = reps.items()
    pattern = re.compile("|".join("(?P<_%d>%s)" % (i, re_str[0])
                                  for i, re_str in enumerate(reps)),
                         re_flags)
    return pattern.sub(lambda x: reps[int(x.lastgroup[1:])][1], string)

它适用于其他答案中给出的例子,例如:

>>> multiple_replace("(condition1) and --condition2--",
...                  {"condition1": "", "condition2": "text"})
'() and --text--'

>>> multiple_replace('hello, world', {'hello' : 'goodbye', 'world' : 'earth'})
'goodbye, earth'

>>> multiple_replace("Do you like cafe? No, I prefer tea.",
...                  {'cafe': 'tea', 'tea': 'cafe', 'like': 'prefer'})
'Do you prefer tea? No, I prefer cafe.'

对我来说,最重要的是你也可以使用正则表达式,例如只替换整个单词,或规范化空白:

>>> s = "I don't want to change this name:\n  Philip II of Spain"
>>> re_str_dict = {r'\bI\b': 'You', r'[\n\t ]+': ' '}
>>> multiple_replace(s, re_str_dict)
"You don't want to change this name: Philip II of Spain"

如果你想使用字典键作为普通字符串, 你可以在调用multiple_replace之前转义这些,例如使用下面的函数:

def escape_keys(d):
    """ transform dictionary d by applying re.escape to the keys """
    return dict((re.escape(k), v) for k, v in d.items())

>>> multiple_replace(s, escape_keys(re_str_dict))
"I don't want to change this name:\n  Philip II of Spain"

下面的函数可以帮助在你的字典键中找到错误的正则表达式(因为来自multiple_replace的错误消息不是很明显):

def check_re_list(re_list):
    """ Checks if each regular expression in list is well-formed. """
    for i, e in enumerate(re_list):
        try:
            re.compile(e)
        except (TypeError, re.error):
            print("Invalid regular expression string "
                  "at position {}: '{}'".format(i, e))

>>> check_re_list(re_str_dict.keys())

请注意,它没有链接替换,而是同时执行它们。这使得它更有效率,而不会限制它能做什么。为了模仿链接的效果,你可能只需要添加更多的字符串替换对,并确保这些对的预期顺序:

>>> multiple_replace("button", {"but": "mut", "mutton": "lamb"})
'mutton'
>>> multiple_replace("button", [("button", "lamb"),
...                             ("but", "mut"), ("mutton", "lamb")])
'lamb'