我想使用.replace函数替换多个字符串。
我目前有
string.replace("condition1", "")
但想要一些像
string.replace("condition1", "").replace("condition2", "text")
尽管这样的语法感觉不太好
正确的做法是什么?有点像在grep/regex中,你可以用\1和\2来替换某些搜索字符串的字段
我想使用.replace函数替换多个字符串。
我目前有
string.replace("condition1", "")
但想要一些像
string.replace("condition1", "").replace("condition2", "text")
尽管这样的语法感觉不太好
正确的做法是什么?有点像在grep/regex中,你可以用\1和\2来替换某些搜索字符串的字段
当前回答
这只是F.J和mini夸克的一个更简洁的概述,bgusach的伟大回答和最后但决定性的改进。所有你需要实现多个同步字符串替换是以下函数:
def multiple_replace(string, rep_dict):
pattern = re.compile("|".join([re.escape(k) for k in sorted(rep_dict,key=len,reverse=True)]), flags=re.DOTALL)
return pattern.sub(lambda x: rep_dict[x.group(0)], string)
用法:
>>>multiple_replace("Do you like cafe? No, I prefer tea.", {'cafe':'tea', 'tea':'cafe', 'like':'prefer'})
'Do you prefer tea? No, I prefer cafe.'
如果您愿意,您可以从这个更简单的函数开始创建自己的专用替换函数。
其他回答
另一个例子: 输入列表
error_list = ['[br]', '[ex]', 'Something']
words = ['how', 'much[ex]', 'is[br]', 'the', 'fish[br]', 'noSomething', 'really']
期望的输出将是
words = ['how', 'much', 'is', 'the', 'fish', 'no', 'really']
代码:
[n[0][0] if len(n[0]) else n[1] for n in [[[w.replace(e,"") for e in error_list if e in w],w] for w in words]]
我把这句话建立在fj的精彩回答上:
import re
def multiple_replacer(*key_values):
replace_dict = dict(key_values)
replacement_function = lambda match: replace_dict[match.group(0)]
pattern = re.compile("|".join([re.escape(k) for k, v in key_values]), re.M)
return lambda string: pattern.sub(replacement_function, string)
def multiple_replace(string, *key_values):
return multiple_replacer(*key_values)(string)
一针用法:
>>> replacements = (u"café", u"tea"), (u"tea", u"café"), (u"like", u"love")
>>> print multiple_replace(u"Do you like café? No, I prefer tea.", *replacements)
Do you love tea? No, I prefer café.
注意,由于替换只在一次传递中完成,“café”会变成“tea”,但不会变回“café”。
如果你需要做相同的替换多次,你可以很容易地创建一个替换函数:
>>> my_escaper = multiple_replacer(('"','\\"'), ('\t', '\\t'))
>>> many_many_strings = (u'This text will be escaped by "my_escaper"',
u'Does this work?\tYes it does',
u'And can we span\nmultiple lines?\t"Yes\twe\tcan!"')
>>> for line in many_many_strings:
... print my_escaper(line)
...
This text will be escaped by \"my_escaper\"
Does this work?\tYes it does
And can we span
multiple lines?\t\"Yes\twe\tcan!\"
改进:
将代码转换为函数 增加了多线支持 修正了逃跑的错误 容易创建一个函数,用于特定的多个替换
享受吧!: -)
这里有一个使用reduce的第一个解决方案的变体,如果你喜欢功能性的。:)
repls = {'hello' : 'goodbye', 'world' : 'earth'}
s = 'hello, world'
reduce(lambda a, kv: a.replace(*kv), repls.iteritems(), s)
马蒂诺的版本更好:
repls = ('hello', 'goodbye'), ('world', 'earth')
s = 'hello, world'
reduce(lambda a, kv: a.replace(*kv), repls, s)
从安德鲁的宝贵答案开始,我开发了一个脚本,从一个文件加载字典,并详细说明所有文件上打开的文件夹做替换。脚本从一个外部文件加载映射,您可以在该文件中设置分隔符。我是一个初学者,但我发现这个脚本在多个文件中做多个替换时非常有用。它在几秒钟内加载了一个包含1000多个条目的字典。这并不优雅,但对我来说很管用
import glob
import re
mapfile = input("Enter map file name with extension eg. codifica.txt: ")
sep = input("Enter map file column separator eg. |: ")
mask = input("Enter search mask with extension eg. 2010*txt for all files to be processed: ")
suff = input("Enter suffix with extension eg. _NEW.txt for newly generated files: ")
rep = {} # creation of empy dictionary
with open(mapfile) as temprep: # loading of definitions in the dictionary using input file, separator is prompted
for line in temprep:
(key, val) = line.strip('\n').split(sep)
rep[key] = val
for filename in glob.iglob(mask): # recursion on all the files with the mask prompted
with open (filename, "r") as textfile: # load each file in the variable text
text = textfile.read()
# start replacement
#rep = dict((re.escape(k), v) for k, v in rep.items()) commented to enable the use in the mapping of re reserved characters
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[m.group(0)], text)
#write of te output files with the prompted suffice
target = open(filename[:-4]+"_NEW.txt", "w")
target.write(text)
target.close()
您可以使用pandas库和replace函数,它既支持精确匹配,也支持正则表达式替换。例如:
df = pd.DataFrame({'text': ['Billy is going to visit Rome in November', 'I was born in 10/10/2010', 'I will be there at 20:00']})
to_replace=['Billy','Rome','January|February|March|April|May|June|July|August|September|October|November|December', '\d{2}:\d{2}', '\d{2}/\d{2}/\d{4}']
replace_with=['name','city','month','time', 'date']
print(df.text.replace(to_replace, replace_with, regex=True))
修改后的文本为:
0 name is going to visit city in month
1 I was born in date
2 I will be there at time
你可以在这里找到一个例子。请注意,文本上的替换是按照它们在列表中出现的顺序进行的