我如何找到两个子字符串('123STRINGabc' -> '字符串')之间的字符串?

我现在的方法是这样的:

>>> start = 'asdf=5;'
>>> end = '123jasd'
>>> s = 'asdf=5;iwantthis123jasd'
>>> print((s.split(start))[1].split(end)[0])
iwantthis

然而,这似乎非常低效且不符合python规则。有什么更好的方法来做这样的事情吗?

忘了说: 字符串可能不是以start和end开始和结束的。他们可能会有更多的字符之前和之后。


当前回答

这里有一种方法

_,_,rest = s.partition(start)
result,_,_ = rest.partition(end)
print result

另一种方法是使用regexp

import re
print re.findall(re.escape(start)+"(.*)"+re.escape(end),s)[0]

or

print re.search(re.escape(start)+"(.*)"+re.escape(end),s).group(1)

其他回答

s = "123123STRINGabcabc"

def find_between( s, first, last ):
    try:
        start = s.index( first ) + len( first )
        end = s.index( last, start )
        return s[start:end]
    except ValueError:
        return ""

def find_between_r( s, first, last ):
    try:
        start = s.rindex( first ) + len( first )
        end = s.rindex( last, start )
        return s[start:end]
    except ValueError:
        return ""


print find_between( s, "123", "abc" )
print find_between_r( s, "123", "abc" )

给:

123STRING
STRINGabc

我认为应该注意的是-根据需要的行为,您可以混合使用index和rindex调用,或者使用上述版本之一(它相当于regex(.*)和(.*?)组)。

我的方法是,

find index of start string in s => i
find index of end string in s => j

substring = substring(i+len(start) to j-1)
import re

s = 'asdf=5;iwantthis123jasd'
result = re.search('asdf=5;(.*)123jasd', s)
print(result.group(1))
from timeit import timeit
from re import search, DOTALL


def partition_find(string, start, end):
    return string.partition(start)[2].rpartition(end)[0]


def re_find(string, start, end):
    # applying re.escape to start and end would be safer
    return search(start + '(.*)' + end, string, DOTALL).group(1)


def index_find(string, start, end):
    return string[string.find(start) + len(start):string.rfind(end)]


# The wikitext of "Alan Turing law" article form English Wikipeida
# https://en.wikipedia.org/w/index.php?title=Alan_Turing_law&action=edit&oldid=763725886
string = """..."""
start = '==Proposals=='
end = '==Rival bills=='

assert index_find(string, start, end) \
       == partition_find(string, start, end) \
       == re_find(string, start, end)

print('index_find', timeit(
    'index_find(string, start, end)',
    globals=globals(),
    number=100_000,
))

print('partition_find', timeit(
    'partition_find(string, start, end)',
    globals=globals(),
    number=100_000,
))

print('re_find', timeit(
    're_find(string, start, end)',
    globals=globals(),
    number=100_000,
))

结果:

index_find 0.35047444528454114
partition_find 0.5327825636197754
re_find 7.552149639286381

在这个例子中,Re_find几乎比index_find慢20倍。

如果你不想导入任何东西,试试字符串方法.index():

text = 'I want to find a string between two substrings'
left = 'find a '
right = 'between two'

# Output: 'string'
print(text[text.index(left)+len(left):text.index(right)])