如何找到一个子字符串的所有事件?

Python有string.find()和string.rfind()来获取字符串中子字符串的索引。

我想知道是否有像string.find_all()这样的东西可以返回所有找到的索引(不仅是从开始的第一个索引，还是从结束的第一个索引)。

例如:

string = "test test test test"

print string.find('test') # 0
print string.rfind('test') # 15

#this is the goal
print string.find_all('test') # [0,5,10,15]

要统计出现次数，请参见计算字符串中子字符串出现的次数。

当前回答

如果你只是寻找一个单一的字符，这是可行的:

string = "dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7

同时,

string = "test test test test"
match = "test"
len(string.split(match)) - 1
# produces 4

我的直觉是，这两个(尤其是#2)的性能都不太好。

2014-09-24 21:12:28

其他回答

当在一份文件中寻找大量的关键词时，使用flash文本

from flashtext import KeywordProcessor
words = ['test', 'exam', 'quiz']
txt = 'this is a test'
kwp = KeywordProcessor()
kwp.add_keywords_from_list(words)
result = kwp.extract_keywords(txt, span_info=True)

在大量搜索词列表上，Flashtext比正则表达式运行得更快。

2018-09-28 17:29:11

没有简单的内置字符串函数来做你正在寻找的事情，但你可以使用更强大的正则表达式:

import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]

如果你想找到重叠的匹配，lookahead会这样做:

[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]

如果你想要一个没有重叠的反向查找-all，你可以将正负前向组合成这样的表达式:

search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]

red .finditer返回一个生成器，因此您可以将上面的[]更改为()以获得一个生成器，而不是一个列表，如果您只迭代一次结果，这将更有效。

2011-01-12 02:43:23

这不完全是OP要求的，但你也可以使用split函数来获得所有子字符串不出现的列表。OP没有指定代码的最终目标，但如果您的目标是删除子字符串，那么这可能是一个简单的一行程序。对于更大的字符串，可能有更有效的方法来做到这一点;在这种情况下，正则表达式更可取

# Extract all non-substrings
s = "an-example-string"
s_no_dash = s.split('-')
# >>> s_no_dash
# ['an', 'example', 'string']

# Or extract and join them into a sentence
s_no_dash2 = ' '.join(s.split('-'))
# >>> s_no_dash2
# 'an example string'

我简单浏览了一下其他的答案，如果这个已经在上面了，我很抱歉。

2021-05-19 13:43:55

这个函数不会查看字符串内的所有位置，它不会浪费计算资源。我的尝试:

def findAll(string,word):
    all_positions=[]
    next_pos=-1
    while True:
        next_pos=string.find(word,next_pos+1)
        if(next_pos<0):
            break
        all_positions.append(next_pos)
    return all_positions

要使用它，可以这样调用它:

result=findAll('this word is a big word man how many words are there?','word')

2020-01-13 12:39:43

这是一个老帖子，但我很感兴趣，想分享我的解决方案。

def find_all(a_string, sub):
    result = []
    k = 0
    while k < len(a_string):
        k = a_string.find(sub, k)
        if k == -1:
            return result
        else:
            result.append(k)
            k += 1 #change to k += len(sub) to not search overlapping results
    return result

它应该返回找到子字符串的位置列表。如果您看到错误或需要改进的地方，请评论。

2015-04-01 09:23:24

如何找到一个子字符串的所有事件?

推荐文章

最新文章

标签