我有一个字符串列表,我想执行一个自然的字母排序。
例如,下面的列表是自然排序(我想要的):
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
下面是上面列表的“排序”版本(我使用sorted()得到的):
['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']
我在寻找一个排序函数它的行为和第一个一样。
我有一个字符串列表,我想执行一个自然的字母排序。
例如,下面的列表是自然排序(我想要的):
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
下面是上面列表的“排序”版本(我使用sorted()得到的):
['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']
我在寻找一个排序函数它的行为和第一个一样。
当前回答
一个紧凑的解决方案,基于将字符串转换为List[Tuple(str, int)]。
Code
def string_to_pairs(s, pairs=re.compile(r"(\D*)(\d*)").findall):
return [(text.lower(), int(digits or 0)) for (text, digits) in pairs(s)[:-1]]
示范
sorted(['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9'], key=string_to_pairs)
输出:
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
测试
转换
assert string_to_pairs("") == []
assert string_to_pairs("123") == [("", 123)]
assert string_to_pairs("abc") == [("abc", 0)]
assert string_to_pairs("123abc") == [("", 123), ("abc", 0)]
assert string_to_pairs("abc123") == [("abc", 123)]
assert string_to_pairs("123abc456") == [("", 123), ("abc", 456)]
assert string_to_pairs("abc123efg") == [("abc", 123), ("efg", 0)]
排序
# Some extracts from the test suite of the natsort library. Permalink:
# https://github.com/SethMMorton/natsort/blob/e3c32f5638bf3a0e9a23633495269bea0e75d379/tests/test_natsorted.py
sort_data = [
( # same as test_natsorted_can_sort_as_unsigned_ints_which_is_default()
["a50", "a51.", "a50.31", "a-50", "a50.4", "a5.034e1", "a50.300"],
["a5.034e1", "a50", "a50.4", "a50.31", "a50.300", "a51.", "a-50"],
),
( # same as test_natsorted_numbers_in_ascending_order()
["a2", "a5", "a9", "a1", "a4", "a10", "a6"],
["a1", "a2", "a4", "a5", "a6", "a9", "a10"],
),
( # same as test_natsorted_can_sort_as_version_numbers()
["1.9.9a", "1.11", "1.9.9b", "1.11.4", "1.10.1"],
["1.9.9a", "1.9.9b", "1.10.1", "1.11", "1.11.4"],
),
( # different from test_natsorted_handles_filesystem_paths()
[
"/p/Folder (10)/file.tar.gz",
"/p/Folder (1)/file (1).tar.gz",
"/p/Folder/file.x1.9.tar.gz",
"/p/Folder (1)/file.tar.gz",
"/p/Folder/file.x1.10.tar.gz",
],
[
"/p/Folder (1)/file (1).tar.gz",
"/p/Folder (1)/file.tar.gz",
"/p/Folder (10)/file.tar.gz",
"/p/Folder/file.x1.9.tar.gz",
"/p/Folder/file.x1.10.tar.gz",
],
),
( # same as test_natsorted_path_extensions_heuristic()
[
"Try.Me.Bug - 09 - One.Two.Three.[text].mkv",
"Try.Me.Bug - 07 - One.Two.5.[text].mkv",
"Try.Me.Bug - 08 - One.Two.Three[text].mkv",
],
[
"Try.Me.Bug - 07 - One.Two.5.[text].mkv",
"Try.Me.Bug - 08 - One.Two.Three[text].mkv",
"Try.Me.Bug - 09 - One.Two.Three.[text].mkv",
],
),
( # same as ns.IGNORECASE for test_natsorted_supports_case_handling()
["Apple", "corn", "Corn", "Banana", "apple", "banana"],
["Apple", "apple", "Banana", "banana", "corn", "Corn"],
),
]
for (given, expected) in sort_data:
assert sorted(given, key=string_to_pairs) == expected
奖金
如果字符串混合了非ascii文本和数字,您可能会对将string_to_pairs()与我在其他地方给出的函数remove_diacritics()组合感兴趣。
其他回答
考虑到:
data = ['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']
类似于SergO的解决方案,没有外部库的1-liner将是:
data.sort(key=lambda x: int(x[3:]))
or
sorted_data = sorted(data, key=lambda x: int(x[3:]))
解释:
该解决方案使用sort的关键特性来定义将用于排序的函数。因为我们知道每个数据条目前面都有'elm',排序函数将字符串中第三个字符之后的部分(即int(x[3:]))转换为整数。如果数据的数值部分在不同的位置,那么函数的这部分将不得不改变。
一个紧凑的解决方案,基于将字符串转换为List[Tuple(str, int)]。
Code
def string_to_pairs(s, pairs=re.compile(r"(\D*)(\d*)").findall):
return [(text.lower(), int(digits or 0)) for (text, digits) in pairs(s)[:-1]]
示范
sorted(['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9'], key=string_to_pairs)
输出:
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
测试
转换
assert string_to_pairs("") == []
assert string_to_pairs("123") == [("", 123)]
assert string_to_pairs("abc") == [("abc", 0)]
assert string_to_pairs("123abc") == [("", 123), ("abc", 0)]
assert string_to_pairs("abc123") == [("abc", 123)]
assert string_to_pairs("123abc456") == [("", 123), ("abc", 456)]
assert string_to_pairs("abc123efg") == [("abc", 123), ("efg", 0)]
排序
# Some extracts from the test suite of the natsort library. Permalink:
# https://github.com/SethMMorton/natsort/blob/e3c32f5638bf3a0e9a23633495269bea0e75d379/tests/test_natsorted.py
sort_data = [
( # same as test_natsorted_can_sort_as_unsigned_ints_which_is_default()
["a50", "a51.", "a50.31", "a-50", "a50.4", "a5.034e1", "a50.300"],
["a5.034e1", "a50", "a50.4", "a50.31", "a50.300", "a51.", "a-50"],
),
( # same as test_natsorted_numbers_in_ascending_order()
["a2", "a5", "a9", "a1", "a4", "a10", "a6"],
["a1", "a2", "a4", "a5", "a6", "a9", "a10"],
),
( # same as test_natsorted_can_sort_as_version_numbers()
["1.9.9a", "1.11", "1.9.9b", "1.11.4", "1.10.1"],
["1.9.9a", "1.9.9b", "1.10.1", "1.11", "1.11.4"],
),
( # different from test_natsorted_handles_filesystem_paths()
[
"/p/Folder (10)/file.tar.gz",
"/p/Folder (1)/file (1).tar.gz",
"/p/Folder/file.x1.9.tar.gz",
"/p/Folder (1)/file.tar.gz",
"/p/Folder/file.x1.10.tar.gz",
],
[
"/p/Folder (1)/file (1).tar.gz",
"/p/Folder (1)/file.tar.gz",
"/p/Folder (10)/file.tar.gz",
"/p/Folder/file.x1.9.tar.gz",
"/p/Folder/file.x1.10.tar.gz",
],
),
( # same as test_natsorted_path_extensions_heuristic()
[
"Try.Me.Bug - 09 - One.Two.Three.[text].mkv",
"Try.Me.Bug - 07 - One.Two.5.[text].mkv",
"Try.Me.Bug - 08 - One.Two.Three[text].mkv",
],
[
"Try.Me.Bug - 07 - One.Two.5.[text].mkv",
"Try.Me.Bug - 08 - One.Two.Three[text].mkv",
"Try.Me.Bug - 09 - One.Two.Three.[text].mkv",
],
),
( # same as ns.IGNORECASE for test_natsorted_supports_case_handling()
["Apple", "corn", "Corn", "Banana", "apple", "banana"],
["Apple", "apple", "Banana", "banana", "corn", "Corn"],
),
]
for (given, expected) in sort_data:
assert sorted(given, key=string_to_pairs) == expected
奖金
如果字符串混合了非ascii文本和数字,您可能会对将string_to_pairs()与我在其他地方给出的函数remove_diacritics()组合感兴趣。
试试这个:
import re
def natural_sort(l):
convert = lambda text: int(text) if text.isdigit() else text.lower()
alphanum_key = lambda key: [convert(c) for c in re.split('([0-9]+)', key)]
return sorted(l, key=alphanum_key)
输出:
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
代码改编自这里:排序人类:自然排序顺序。
我建议您简单地使用关键字参数sorted来实现所需的列表 例如:
to_order= [e2,E1,e5,E4,e3]
ordered= sorted(to_order, key= lambda x: x.lower())
# ordered should be [E1,e2,e3,E4,e5]
这是一个更高级的解决方案,由Claudiu和Mark Byers改进:
它使用casefold()而不是lower()来匹配字符串 您可以传递另一个键lambda来选择一个内部元素(就像您习惯使用普通排序函数一样) 它当然适用于列表。Sort, sorted, max,等等。
def natural_sort(key=None, _nsre=re.compile('([0-9]+)')):
return lambda x: [int(text) if text.isdigit() else text.casefold()
for text in _nsre.split(key(x) if key else x)]
使用示例:
# Original solution
data.sort(key=natural_sort)
# Select an additional key
image_files.sort(key=natural_sort(lambda x: x.original_filename))