我有一个字符串列表,我想执行一个自然的字母排序。

例如,下面的列表是自然排序(我想要的):

['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

下面是上面列表的“排序”版本(我使用sorted()得到的):

['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']

我在寻找一个排序函数它的行为和第一个一样。


当前回答

a = ['H1', 'H100', 'H10', 'H3', 'H2', 'H6', 'H11', 'H50', 'H5', 'H99', 'H8']
b = ''
c = []

def bubble(bad_list):#bubble sort method
        length = len(bad_list) - 1
        sorted = False

        while not sorted:
                sorted = True
                for i in range(length):
                        if bad_list[i] > bad_list[i+1]:
                                sorted = False
                                bad_list[i], bad_list[i+1] = bad_list[i+1], bad_list[i] #sort the integer list 
                                a[i], a[i+1] = a[i+1], a[i] #sort the main list based on the integer list index value

for a_string in a: #extract the number in the string character by character
        for letter in a_string:
                if letter.isdigit():
                        #print letter
                        b += letter
        c.append(b)
        b = ''

print 'Before sorting....'
print a
c = map(int, c) #converting string list into number list
print c
bubble(c)

print 'After sorting....'
print c
print a

应答:

气泡排序作业

如何在python中一次读一个字母的字符串

其他回答

在@Mark Byers的回答之后,这里有一个接受关键参数的适应,并且更符合pep8。

def natsorted(seq, key=None):
    def convert(text):
        return int(text) if text.isdigit() else text

    def alphanum(obj):
        if key is not None:
            return [convert(c) for c in re.split(r'([0-9]+)', key(obj))]
        return [convert(c) for c in re.split(r'([0-9]+)', obj)]

    return sorted(seq, key=alphanum)

我还做了一个Gist

一个紧凑的解决方案,基于将字符串转换为List[Tuple(str, int)]。

Code

def string_to_pairs(s, pairs=re.compile(r"(\D*)(\d*)").findall):
    return [(text.lower(), int(digits or 0)) for (text, digits) in pairs(s)[:-1]]

示范

sorted(['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9'], key=string_to_pairs)

输出:

['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

测试

转换

assert string_to_pairs("") == []
assert string_to_pairs("123") == [("", 123)]
assert string_to_pairs("abc") == [("abc", 0)]
assert string_to_pairs("123abc") == [("", 123), ("abc", 0)]
assert string_to_pairs("abc123") == [("abc", 123)]
assert string_to_pairs("123abc456") == [("", 123), ("abc", 456)]
assert string_to_pairs("abc123efg") == [("abc", 123), ("efg", 0)]

排序

# Some extracts from the test suite of the natsort library. Permalink:
# https://github.com/SethMMorton/natsort/blob/e3c32f5638bf3a0e9a23633495269bea0e75d379/tests/test_natsorted.py

sort_data = [
    (  # same as test_natsorted_can_sort_as_unsigned_ints_which_is_default()
        ["a50", "a51.", "a50.31", "a-50", "a50.4", "a5.034e1", "a50.300"],
        ["a5.034e1", "a50", "a50.4", "a50.31", "a50.300", "a51.", "a-50"],
    ),
    (  # same as test_natsorted_numbers_in_ascending_order()
        ["a2", "a5", "a9", "a1", "a4", "a10", "a6"],
        ["a1", "a2", "a4", "a5", "a6", "a9", "a10"],
    ),
    (  # same as test_natsorted_can_sort_as_version_numbers()
        ["1.9.9a", "1.11", "1.9.9b", "1.11.4", "1.10.1"],
        ["1.9.9a", "1.9.9b", "1.10.1", "1.11", "1.11.4"],
    ),
    (  # different from test_natsorted_handles_filesystem_paths()
        [
            "/p/Folder (10)/file.tar.gz",
            "/p/Folder (1)/file (1).tar.gz",
            "/p/Folder/file.x1.9.tar.gz",
            "/p/Folder (1)/file.tar.gz",
            "/p/Folder/file.x1.10.tar.gz",
        ],
        [
            "/p/Folder (1)/file (1).tar.gz",
            "/p/Folder (1)/file.tar.gz",
            "/p/Folder (10)/file.tar.gz",
            "/p/Folder/file.x1.9.tar.gz",
            "/p/Folder/file.x1.10.tar.gz",
        ],
    ),
    (  # same as test_natsorted_path_extensions_heuristic()
        [
            "Try.Me.Bug - 09 - One.Two.Three.[text].mkv",
            "Try.Me.Bug - 07 - One.Two.5.[text].mkv",
            "Try.Me.Bug - 08 - One.Two.Three[text].mkv",
        ],
        [
            "Try.Me.Bug - 07 - One.Two.5.[text].mkv",
            "Try.Me.Bug - 08 - One.Two.Three[text].mkv",
            "Try.Me.Bug - 09 - One.Two.Three.[text].mkv",
        ],
    ),
    (  # same as ns.IGNORECASE for test_natsorted_supports_case_handling()
        ["Apple", "corn", "Corn", "Banana", "apple", "banana"],
        ["Apple", "apple", "Banana", "banana", "corn", "Corn"],
    ),

]

for (given, expected) in sort_data:
    assert sorted(given, key=string_to_pairs) == expected

奖金

如果字符串混合了非ascii文本和数字,您可能会对将string_to_pairs()与我在其他地方给出的函数remove_diacritics()组合感兴趣。

下面是马克·拜尔斯的另一个版本的回答。这个版本演示了如何传入一个属性名,该属性名将用于计算列表中的对象。

def natural_sort(l, attrib):
    convert = lambda text: int(text) if text.isdigit() else text.lower()
    alphanum_key = lambda key: [convert(c) for c in re.split('([0-9]+)', key.__dict__[attrib])]
    return sorted(l, key=alphanum_key)

results = natural_sort(albums, 'albumid')

其中albums是一个Album实例列表,albumid是一个字符串属性,名义上包含数字。

很可能functools.cmp_to_key()与python的sort的底层实现密切相关。此外,cmp参数是遗留的。现代的方法是将输入项转换为支持所需的丰富比较操作的对象。

在CPython 2下。X,即使没有实现各自的富比较操作符,也可以对不同类型的对象排序。在CPython 3下。X,不同类型的对象必须显式地支持比较。参见Python如何比较字符串和int?链接到官方文件。大多数答案都依赖于这种隐含的顺序。切换到Python 3。X将需要一个新的类型来实现和统一数字和字符串之间的比较。

Python 2.7.12 (default, Sep 29 2016, 13:30:34) 
>>> (0,"foo") < ("foo",0)
True  
Python 3.5.2 (default, Oct 14 2016, 12:54:53) 
>>> (0,"foo") < ("foo",0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  TypeError: unorderable types: int() < str()

有三种不同的方法。第一种使用嵌套类来利用Python的Iterable比较算法。第二个函数将这个嵌套展开到单个类中。第三种方法放弃了str的子类化,专注于性能。所有都是有时间的;第二辆快了一倍,第三辆快了近六倍。对str进行子类化并不是必需的,而且首先可能是一个坏主意,但它确实带来了某些便利。

排序字符被复制以强制按大小写排序,并交换大小写以强制小写字母优先排序;这就是“自然排序”的典型定义。我无法决定分组的类型;有些人可能更喜欢以下选项,这也会带来显著的性能优势:

d = lambda s: s.lower()+s.swapcase()

在使用的地方,比较运算符被设置为object的运算符,这样它们就不会被functools. total_orders忽略。

import functools
import itertools


@functools.total_ordering
class NaturalStringA(str):
    def __repr__(self):
        return "{}({})".format\
            ( type(self).__name__
            , super().__repr__()
            )
    d = lambda c, s: [ c.NaturalStringPart("".join(v))
                        for k,v in
                       itertools.groupby(s, c.isdigit)
                     ]
    d = classmethod(d)
    @functools.total_ordering
    class NaturalStringPart(str):
        d = lambda s: "".join(c.lower()+c.swapcase() for c in s)
        d = staticmethod(d)
        def __lt__(self, other):
            if not isinstance(self, type(other)):
                return NotImplemented
            try:
                return int(self) < int(other)
            except ValueError:
                if self.isdigit():
                    return True
                elif other.isdigit():
                    return False
                else:
                    return self.d(self) < self.d(other)
        def __eq__(self, other):
            if not isinstance(self, type(other)):
                return NotImplemented
            try:
                return int(self) == int(other)
            except ValueError:
                if self.isdigit() or other.isdigit():
                    return False
                else:
                    return self.d(self) == self.d(other)
        __le__ = object.__le__
        __ne__ = object.__ne__
        __gt__ = object.__gt__
        __ge__ = object.__ge__
    def __lt__(self, other):
        return self.d(self) < self.d(other)
    def __eq__(self, other):
        return self.d(self) == self.d(other)
    __le__ = object.__le__
    __ne__ = object.__ne__
    __gt__ = object.__gt__
    __ge__ = object.__ge__
import functools
import itertools


@functools.total_ordering
class NaturalStringB(str):
    def __repr__(self):
        return "{}({})".format\
            ( type(self).__name__
            , super().__repr__()
            )
    d = lambda s: "".join(c.lower()+c.swapcase() for c in s)
    d = staticmethod(d)
    def __lt__(self, other):
        if not isinstance(self, type(other)):
            return NotImplemented
        groups = map(lambda i: itertools.groupby(i, type(self).isdigit), (self, other))
        zipped = itertools.zip_longest(*groups)
        for s,o in zipped:
            if s is None:
                return True
            if o is None:
                return False
            s_k, s_v = s[0], "".join(s[1])
            o_k, o_v = o[0], "".join(o[1])
            if s_k and o_k:
                s_v, o_v = int(s_v), int(o_v)
                if s_v == o_v:
                    continue
                return s_v < o_v
            elif s_k:
                return True
            elif o_k:
                return False
            else:
                s_v, o_v = self.d(s_v), self.d(o_v)
                if s_v == o_v:
                    continue
                return s_v < o_v
        return False
    def __eq__(self, other):
        if not isinstance(self, type(other)):
            return NotImplemented
        groups = map(lambda i: itertools.groupby(i, type(self).isdigit), (self, other))
        zipped = itertools.zip_longest(*groups)
        for s,o in zipped:
            if s is None or o is None:
                return False
            s_k, s_v = s[0], "".join(s[1])
            o_k, o_v = o[0], "".join(o[1])
            if s_k and o_k:
                s_v, o_v = int(s_v), int(o_v)
                if s_v == o_v:
                    continue
                return False
            elif s_k or o_k:
                return False
            else:
                s_v, o_v = self.d(s_v), self.d(o_v)
                if s_v == o_v:
                    continue
                return False
        return True
    __le__ = object.__le__
    __ne__ = object.__ne__
    __gt__ = object.__gt__
    __ge__ = object.__ge__
import functools
import itertools
import enum


class OrderingType(enum.Enum):
    PerWordSwapCase         = lambda s: s.lower()+s.swapcase()
    PerCharacterSwapCase    = lambda s: "".join(c.lower()+c.swapcase() for c in s)


class NaturalOrdering:
    @classmethod
    def by(cls, ordering):
        def wrapper(string):
            return cls(string, ordering)
        return wrapper
    def __init__(self, string, ordering=OrderingType.PerCharacterSwapCase):
        self.string = string
        self.groups = [ (k,int("".join(v)))
                            if k else
                        (k,ordering("".join(v)))
                            for k,v in
                        itertools.groupby(string, str.isdigit)
                      ]
    def __repr__(self):
        return "{}({})".format\
            ( type(self).__name__
            , self.string
            )
    def __lesser(self, other, default):
        if not isinstance(self, type(other)):
            return NotImplemented
        for s,o in itertools.zip_longest(self.groups, other.groups):
            if s is None:
                return True
            if o is None:
                return False
            s_k, s_v = s
            o_k, o_v = o
            if s_k and o_k:
                if s_v == o_v:
                    continue
                return s_v < o_v
            elif s_k:
                return True
            elif o_k:
                return False
            else:
                if s_v == o_v:
                    continue
                return s_v < o_v
        return default
    def __lt__(self, other):
        return self.__lesser(other, default=False)
    def __le__(self, other):
        return self.__lesser(other, default=True)
    def __eq__(self, other):
        if not isinstance(self, type(other)):
            return NotImplemented
        for s,o in itertools.zip_longest(self.groups, other.groups):
            if s is None or o is None:
                return False
            s_k, s_v = s
            o_k, o_v = o
            if s_k and o_k:
                if s_v == o_v:
                    continue
                return False
            elif s_k or o_k:
                return False
            else:
                if s_v == o_v:
                    continue
                return False
        return True
    # functools.total_ordering doesn't create single-call wrappers if both
    # __le__ and __lt__ exist, so do it manually.
    def __gt__(self, other):
        op_result = self.__le__(other)
        if op_result is NotImplemented:
            return op_result
        return not op_result
    def __ge__(self, other):
        op_result = self.__lt__(other)
        if op_result is NotImplemented:
            return op_result
        return not op_result
    # __ne__ is the only implied ordering relationship, it automatically
    # delegates to __eq__
>>> import natsort
>>> import timeit
>>> l1 = ['Apple', 'corn', 'apPlE', 'arbour', 'Corn', 'Banana', 'apple', 'banana']
>>> l2 = list(map(str, range(30)))
>>> l3 = ["{} {}".format(x,y) for x in l1 for y in l2]
>>> print(timeit.timeit('sorted(l3+["0"], key=NaturalStringA)', number=10000, globals=globals()))
362.4729259099986
>>> print(timeit.timeit('sorted(l3+["0"], key=NaturalStringB)', number=10000, globals=globals()))
189.7340817489967
>>> print(timeit.timeit('sorted(l3+["0"], key=NaturalOrdering.by(OrderingType.PerCharacterSwapCase))', number=10000, globals=globals()))
69.34636392899847
>>> print(timeit.timeit('natsort.natsorted(l3+["0"], alg=natsort.ns.GROUPLETTERS | natsort.ns.LOWERCASEFIRST)', number=10000, globals=globals()))
98.2531585780016

自然排序既相当复杂,又定义模糊。不要忘记事先运行unicodedata.normalize(…),并考虑使用str.casefold()而不是str.lower()。可能有一些微妙的编码问题我还没有考虑到。因此,我尝试性地推荐natsort库。我快速浏览了一下github存储库;代码维护非常出色。

All the algorithms I've seen depend on tricks such as duplicating and lowering characters, and swapping case. While this doubles the running time, an alternative would require a total natural ordering on the input character set. I don't think this is part of the unicode specification, and since there are many more unicode digits than [0-9], creating such a sorting would be equally daunting. If you want locale-aware comparisons, prepare your strings with locale.strxfrm per Python's Sorting HOW TO.

这是一个更高级的解决方案,由Claudiu和Mark Byers改进:

它使用casefold()而不是lower()来匹配字符串 您可以传递另一个键lambda来选择一个内部元素(就像您习惯使用普通排序函数一样) 它当然适用于列表。Sort, sorted, max,等等。

def natural_sort(key=None, _nsre=re.compile('([0-9]+)')):
    return lambda x: [int(text) if text.isdigit() else text.casefold()
            for text in _nsre.split(key(x) if key else x)]

使用示例:

# Original solution
data.sort(key=natural_sort)

# Select an additional key
image_files.sort(key=natural_sort(lambda x: x.original_filename))