我想使用.replace函数替换多个字符串。

我目前有

string.replace("condition1", "")

但想要一些像

string.replace("condition1", "").replace("condition2", "text")

尽管这样的语法感觉不太好

正确的做法是什么?有点像在grep/regex中,你可以用\1和\2来替换某些搜索字符串的字段


当前回答

对于只替换一个字符,使用翻译和str.maketrans是我最喜欢的方法。

Tl;dr > result_string = your_string.translate(str.maketrans(dict_mapping))


demo

my_string = 'This is a test string.'
dict_mapping = {'i': 's', 's': 'S'}
result_good = my_string.translate(str.maketrans(dict_mapping))
result_bad = my_string
for x, y in dict_mapping.items():
    result_bad = result_bad.replace(x, y)
print(result_good)  # ThsS sS a teSt Strsng.
print(result_bad)   # ThSS SS a teSt StrSng.

其他回答

另一个例子: 输入列表

error_list = ['[br]', '[ex]', 'Something']
words = ['how', 'much[ex]', 'is[br]', 'the', 'fish[br]', 'noSomething', 'really']

期望的输出将是

words = ['how', 'much', 'is', 'the', 'fish', 'no', 'really']

代码:

[n[0][0] if len(n[0]) else n[1] for n in [[[w.replace(e,"") for e in error_list if e in w],w] for w in words]] 

您可以使用pandas库和replace函数,它既支持精确匹配,也支持正则表达式替换。例如:

df = pd.DataFrame({'text': ['Billy is going to visit Rome in November', 'I was born in 10/10/2010', 'I will be there at 20:00']})

to_replace=['Billy','Rome','January|February|March|April|May|June|July|August|September|October|November|December', '\d{2}:\d{2}', '\d{2}/\d{2}/\d{4}']
replace_with=['name','city','month','time', 'date']

print(df.text.replace(to_replace, replace_with, regex=True))

修改后的文本为:

0    name is going to visit city in month
1                      I was born in date
2                 I will be there at time

你可以在这里找到一个例子。请注意,文本上的替换是按照它们在列表中出现的顺序进行的

下面是一个支持基本正则表达式替换的版本。主要的限制是表达式不能包含子组,并且可能存在一些边缘情况:

基于@bgusach和其他的代码

import re

class StringReplacer:

    def __init__(self, replacements, ignore_case=False):
        patterns = sorted(replacements, key=len, reverse=True)
        self.replacements = [replacements[k] for k in patterns]
        re_mode = re.IGNORECASE if ignore_case else 0
        self.pattern = re.compile('|'.join(("({})".format(p) for p in patterns)), re_mode)
        def tr(matcher):
            index = next((index for index,value in enumerate(matcher.groups()) if value), None)
            return self.replacements[index]
        self.tr = tr

    def __call__(self, string):
        return self.pattern.sub(self.tr, string)

测试

table = {
    "aaa"    : "[This is three a]",
    "b+"     : "[This is one or more b]",
    r"<\w+>" : "[This is a tag]"
}

replacer = StringReplacer(table, True)

sample1 = "whatever bb, aaa, <star> BBB <end>"

print(replacer(sample1))

# output: 
# whatever [This is one or more b], [This is three a], [This is a tag] [This is one or more b] [This is a tag]

诀窍是通过位置来识别匹配的组。它不是超级高效(O(n)),但它是有效的。

index = next((index for index,value in enumerate(matcher.groups()) if value), None)

替换是一次完成的。

我的方法是首先将字符串标记化,然后决定每个标记是否包含它。

潜在地,如果我们可以假设一个hashmap/set的O(1)查找,可能会更好:

remove_words = {"we", "this"}
target_sent = "we should modify this string"
target_sent_words = target_sent.split()
filtered_sent = " ".join(list(filter(lambda word: word not in remove_words, target_sent_words)))

Filtered_sent现在是'应该修改字符串'

下面是另一种使用字典的方法:

listA="The cat jumped over the house".split()
modify = {word:word for number,word in enumerate(listA)}
modify["cat"],modify["jumped"]="dog","walked"
print " ".join(modify[x] for x in listA)