如何提取两个标记之间的子字符串?

假设我有一个字符串'gfgfdAAA1234ZZZuijjk'，我想提取'1234'部分。

我只知道在AAA之前的几个字符，以及在ZZZ之后的我感兴趣的部分1234。

使用sed，可以对字符串执行如下操作:

echo "$STRING" | sed -e "s|.*AAA\(.*\)ZZZ.*|\1|"

结果是1234。

如何在Python中做同样的事情?

当前回答

这里有一个没有regex的解决方案，它也适用于第一个子字符串包含第二个子字符串的场景。如果第二个标记在第一个标记之后，此函数将只查找子字符串。

def find_substring(string, start, end):
    len_until_end_of_first_match = string.find(start) + len(start)
    after_start = string[len_until_end_of_first_match:]
    return string[string.find(start) + len(start):len_until_end_of_first_match + after_start.find(end)]

2019-02-23 18:26:39

其他回答

你可以使用re模块:

>>> import re
>>> re.compile(".*AAA(.*)ZZZ.*").match("gfgfdAAA1234ZZZuijjk").groups()
('1234,)

2011-01-12 09:19:21

如果你想寻找多次出现的情况。

content ="Prefix_helloworld_Suffix_stuff_Prefix_42_Suffix_andsoon"
strings = []
for c in content.split('Prefix_'):
    spos = c.find('_Suffix')
    if spos!=-1:
        strings.append( c[:spos])
print( strings )

或者更快:

strings = [ c[:c.find('_Suffix')] for c in content.split('Prefix_') if c.find('_Suffix')!=-1 ]

2022-08-02 13:28:35

text = 'I want to find a string between two substrings'
left = 'find a '
right = 'between two'

print(text[text.index(left)+len(left):text.index(right)])

给了

string

2019-03-04 01:31:31

>>> s = '/tmp/10508.constantstring'
>>> s.split('/tmp/')[1].split('constantstring')[0].strip('.')

2014-02-08 00:12:43

此外，您可以在波纹函数中找到所有的组合

s = 'Part 1. Part 2. Part 3 then more text'
def find_all_places(text,word):
    word_places = []
    i=0
    while True:
        word_place = text.find(word,i)
        i+=len(word)+word_place
        if i>=len(text):
            break
        if word_place<0:
            break
        word_places.append(word_place)
    return word_places
def find_all_combination(text,start,end):
    start_places = find_all_places(text,start)
    end_places = find_all_places(text,end)
    combination_list = []
    for start_place in start_places:
        for end_place in end_places:
            print(start_place)
            print(end_place)
            if start_place>=end_place:
                continue
            combination_list.append(text[start_place:end_place])
    return combination_list
find_all_combination(s,"Part","Part")

结果:

['Part 1. ', 'Part 1. Part 2. ', 'Part 2. ']

2021-10-05 19:02:30

如何提取两个标记之间的子字符串?

推荐文章

最新文章

标签