如何从字符串的末尾删除子字符串?

我有以下代码:

url = 'abcdc.com'
print(url.strip('.com'))

我以为是abcdc

我得到:abcd

现在我知道了

url.rsplit('.com', 1)

有没有更好的办法?

参见.strip/.rstrip/。lstrip字符串方法在Python中工作?对于第一次尝试的具体解释。

当前回答

假设你想删除域名，不管它是什么(.com， .net等)。我建议找到。并从那一刻起移除一切。

url = 'abcdc.com'
dot_index = url.rfind('.')
url = url[:dot_index]

在这里，我使用rfind来解决像abcdc.com这样的url的问题，它应该被简化为abcdc.com的名称。

如果你也关心www.s，你应该明确地检查它们:

if url.startswith("www."):
   url = url.replace("www.","", 1)

replace中的1用于奇怪的边例，例如www.net.www.com

如果你的url比这更乱，看看人们用正则表达式回答的问题。

2020-04-10 18:31:26

其他回答

使用replace和count

这可能看起来有点hack，但它确保你不使用startwith和if语句进行安全替换，使用replace的count参数你可以限制替换为一个:

mystring = "www.comwww.com"

前缀:

print(mystring.replace("www.","",1))

后缀(把前缀写反了).com变成了moc。

print(mystring[::-1].replace("moc.","",1)[::-1])

2021-06-22 08:27:01

在我的情况下，我需要引发一个异常，所以我做了:

class UnableToStripEnd(Exception):
    """A Exception type to indicate that the suffix cannot be removed from the text."""

    @staticmethod
    def get_exception(text, suffix):
        return UnableToStripEnd("Could not find suffix ({0}) on text: {1}."
                                .format(suffix, text))


def strip_end(text, suffix):
    """Removes the end of a string. Otherwise fails."""
    if not text.endswith(suffix):
        raise UnableToStripEnd.get_exception(text, suffix)
    return text[:len(text)-len(suffix)]

2016-09-28 15:59:55

因为这是一个非常受欢迎的问题，我添加了另一个现在可用的解决方案。在python 3.9 (https://docs.python.org/3.9/whatsnew/3.9.html)中，函数removesuffix()将被添加(和removeprefix())，这个函数正是这里所质疑的。

url = 'abcdc.com'
print(url.removesuffix('.com'))

输出:

'abcdc'

PEP 616 (https://www.python.org/dev/peps/pep-0616/)显示了它的行为(它不是真正的实现):

def removeprefix(self: str, prefix: str, /) -> str:
    if self.startswith(prefix):
        return self[len(prefix):]
    else:
        return self[:]

与自我实现的解决方案相比，它有什么好处:

不那么脆弱: 代码将不依赖于用户来计算文字的长度。更多的性能: 该代码不需要调用Python内置的len函数，也不需要调用更昂贵的str.replace()方法。更具描述性的: 与传统的字符串切片方法相比，这些方法为代码可读性提供了更高级别的API。

2020-10-06 14:38:33

如果你需要剥离字符串的某一端，如果它存在，否则什么都不做。我最好的解决方案。您可能会想使用前两个实现中的一个，但是为了完整性，我已经包括了第三个实现。

对于常量后缀:

def remove_suffix(v, s):
    return v[:-len(s)] if v.endswith(s) else v
remove_suffix("abc.com", ".com") == 'abc'
remove_suffix("abc", ".com") == 'abc'

对于正则表达式:

def remove_suffix_compile(suffix_pattern):
    r = re.compile(f"(.*?)({suffix_pattern})?$")
    return lambda v: r.match(v)[1]
remove_domain = remove_suffix_compile(r"\.[a-zA-Z0-9]{3,}")
remove_domain("abc.com") == "abc"
remove_domain("sub.abc.net") == "sub.abc"
remove_domain("abc.") == "abc."
remove_domain("abc") == "abc"

对于常量后缀的集合，对于大量调用的渐近最快的方法:

def remove_suffix_preprocess(*suffixes):
    suffixes = set(suffixes)
    try:
        suffixes.remove('')
    except KeyError:
        pass

    def helper(suffixes, pos):
        if len(suffixes) == 1:
            suf = suffixes[0]
            l = -len(suf)
            ls = slice(0, l)
            return lambda v: v[ls] if v.endswith(suf) else v
        si = iter(suffixes)
        ml = len(next(si))
        exact = False
        for suf in si:
            l = len(suf)
            if -l == pos:
                exact = True
            else:
                ml = min(len(suf), ml)
        ml = -ml
        suffix_dict = {}
        for suf in suffixes:
            sub = suf[ml:pos]
            if sub in suffix_dict:
                suffix_dict[sub].append(suf)
            else:
                suffix_dict[sub] = [suf]
        if exact:
            del suffix_dict['']
            for key in suffix_dict:
                suffix_dict[key] = helper([s[:pos] for s in suffix_dict[key]], None)
            return lambda v: suffix_dict.get(v[ml:pos], lambda v: v)(v[:pos])
        else:
            for key in suffix_dict:
                suffix_dict[key] = helper(suffix_dict[key], ml)
            return lambda v: suffix_dict.get(v[ml:pos], lambda v: v)(v)
    return helper(tuple(suffixes), None)
domain_remove = remove_suffix_preprocess(".com", ".net", ".edu", ".uk", '.tv', '.co.uk', '.org.uk')

最后一个在pypy中可能比在cpython中快得多。对于几乎所有不涉及大量潜在后缀字典的情况(至少在cPython中无法轻松表示为regex)， regex变体可能比这个更快。

在PyPy中，regex变体对于大量调用或长字符串几乎肯定更慢，即使re模块使用DFA编译regex引擎，因为lambda的绝大多数开销将由JIT优化。

然而，在cPython中，在几乎所有情况下，你为正则表达式运行的c代码的比较几乎肯定超过了后缀集合版本的算法优势。

编辑:https://m.xkcd.com/859/

2020-04-22 17:09:00

import re

def rm_suffix(url = 'abcdc.com', suffix='\.com'):
    return(re.sub(suffix+'$', '', url))

我想重复一下这个答案，作为最具表现力的回答。当然，以下操作会占用更少的CPU时间:

def rm_dotcom(url = 'abcdc.com'):
    return(url[:-4] if url.endswith('.com') else url)

但是，如果CPU是瓶颈，为什么要用Python编写呢?

CPU什么时候会成为瓶颈?在驾驶方面，可能是这样。

使用正则表达式的优点是代码可重用性。如果你接下来想要移除。Me '，只有三个字?

同样的代码可以做到这一点:

>>> rm_sub('abcdc.me','.me')
'abcdc'

2017-03-27 18:58:25

如何从字符串的末尾删除子字符串?

推荐文章

最新文章

标签