假设我有一个字符串'gfgfdAAA1234ZZZuijjk',我想提取'1234'部分。

我只知道在AAA之前的几个字符,以及在ZZZ之后的我感兴趣的部分1234。

使用sed,可以对字符串执行如下操作:

echo "$STRING" | sed -e "s|.*AAA\(.*\)ZZZ.*|\1|"

结果是1234。

如何在Python中做同样的事情?


当前回答

此外,您可以在波纹函数中找到所有的组合

s = 'Part 1. Part 2. Part 3 then more text'
def find_all_places(text,word):
    word_places = []
    i=0
    while True:
        word_place = text.find(word,i)
        i+=len(word)+word_place
        if i>=len(text):
            break
        if word_place<0:
            break
        word_places.append(word_place)
    return word_places
def find_all_combination(text,start,end):
    start_places = find_all_places(text,start)
    end_places = find_all_places(text,end)
    combination_list = []
    for start_place in start_places:
        for end_place in end_places:
            print(start_place)
            print(end_place)
            if start_place>=end_place:
                continue
            combination_list.append(text[start_place:end_place])
    return combination_list
find_all_combination(s,"Part","Part")

结果:

['Part 1. ', 'Part 1. Part 2. ', 'Part 2. ']

其他回答

此外,您可以在波纹函数中找到所有的组合

s = 'Part 1. Part 2. Part 3 then more text'
def find_all_places(text,word):
    word_places = []
    i=0
    while True:
        word_place = text.find(word,i)
        i+=len(word)+word_place
        if i>=len(text):
            break
        if word_place<0:
            break
        word_places.append(word_place)
    return word_places
def find_all_combination(text,start,end):
    start_places = find_all_places(text,start)
    end_places = find_all_places(text,end)
    combination_list = []
    for start_place in start_places:
        for end_place in end_places:
            print(start_place)
            print(end_place)
            if start_place>=end_place:
                continue
            combination_list.append(text[start_place:end_place])
    return combination_list
find_all_combination(s,"Part","Part")

结果:

['Part 1. ', 'Part 1. Part 2. ', 'Part 2. ']

使用sed,可以对字符串执行如下操作:

echo "$STRING" | sed -e "s|.*AAA\(.*\)ZZZ.*|\1|"

结果是1234。

你可以使用相同的正则表达式对re.sub函数做同样的事情。

>>> re.sub(r'.*AAA(.*)ZZZ.*', r'\1', 'gfgfdAAA1234ZZZuijjk')
'1234'

在基本sed中,捕获组由\(..\)表示,但在python中由(..)表示。

你可以使用re模块:

>>> import re
>>> re.compile(".*AAA(.*)ZZZ.*").match("gfgfdAAA1234ZZZuijjk").groups()
('1234,)

在python中,可以使用正则表达式(re)模块中的findall方法从字符串中提取子字符串。

>>> import re
>>> s = 'gfgfdAAA1234ZZZuijjk'
>>> ss = re.findall('AAA(.+)ZZZ', s)
>>> print ss
['1234']
>>> s = '/tmp/10508.constantstring'
>>> s.split('/tmp/')[1].split('constantstring')[0].strip('.')