Python集vs列表

在Python中，哪种数据结构更高效/快速?假设顺序对我来说不重要，无论如何我都会检查重复，Python集比Python列表慢吗?

当前回答

设置因近即时“包含”检查而获胜:https://en.wikipedia.org/wiki/Hash_table

列表实现:通常是一个数组，低层接近金属，适合迭代和随机访问的元素索引。

Set implementation: https://en.wikipedia.org/wiki/Hash_table, it does not iterate on a list, but finds the element by computing a hash from the key, so it depends on the nature of the key elements and the hash function. Similar to what is used for dict. I suspect list could be faster if you have very few elements (< 5), the larger element count the better the set will perform for a contains check. It is also fast for element addition and removal. Also always keep in mind that building a set has a cost !

注意:如果列表已经排序，那么在小列表上搜索列表可能会非常快，但是对于更多的数据集，对于包含检查会更快。

2016-08-02 14:35:22

其他回答

我建议使用Set实现，用例仅限于引用或搜索存在，而使用Tuple实现，用例要求执行迭代。列表是一种低级实现，需要大量内存开销。

2018-05-07 08:35:40

这取决于你打算用它做什么。

当涉及到确定一个对象是否存在于set中时(如x在s中)，set的速度要快得多，但它的元素是没有顺序的，因此您不能像在列表中那样通过索引访问项目。在实践中，迭代集的速度也比较慢。

您可以使用timeit模块查看哪种方法更适合您的情况。

2010-05-14 01:04:04

与@Ellis Percival的测试相同，我想添加的是，在添加元素时，列表以类似于集合的方式执行。

添加元素

>>> def add_test_set(iterable):
...     for i in range(10000):
...         iterable.add(i)
...
>>> def add_test_list(iterable):
...     for i in range(10000):
...         iterable.append(i)
...
>>> timeit("add_test_set(iterable)",
...     setup="from __main__ import add_test_set; iterable = set()",
...     number=10000)
7.073143866999999
>>> timeit("add_test_list(iterable)",
...     setup="from __main__ import add_test_list; iterable = list()",
...     number=10000)
6.80650725000001

(我本来想编辑他的帖子，但编辑队列已经满了)

2021-05-31 08:55:47

集合更快，而且你可以得到更多有集合的函数，比如你有两个集合:

set1 = {"Harry Potter", "James Bond", "Iron Man"}
set2 = {"Captain America", "Black Widow", "Hulk", "Harry Potter", "James Bond"}

我们可以很容易地连接两个集合:

set3 = set1.union(set2)

找出两者的共同点:

set3 = set1.intersection(set2)

找出两者的不同之处:

set3 = set1.difference(set2)

还有更多!试试吧，很有趣的!此外，如果你必须处理两个列表中的不同值或两个列表中的通用值，我更喜欢将列表转换为集合，许多程序员都是这样做的。希望它能帮助你:-)

2020-05-22 07:24:08

我感兴趣的结果时，检查与CPython，如果一个值是一个少量文字。set在python3中胜过tuple, list和or:

from timeit import timeit

def in_test1():
  for i in range(1000):
    if i in (314, 628):
      pass

def in_test2():
  for i in range(1000):
    if i in [314, 628]:
      pass

def in_test3():
  for i in range(1000):
    if i in {314, 628}:
      pass

def in_test4():
  for i in range(1000):
    if i == 314 or i == 628:
      pass

print("tuple")
print(timeit("in_test1()", setup="from __main__ import in_test1", number=100000))
print("list")
print(timeit("in_test2()", setup="from __main__ import in_test2", number=100000))
print("set")
print(timeit("in_test3()", setup="from __main__ import in_test3", number=100000))
print("or")
print(timeit("in_test4()", setup="from __main__ import in_test4", number=100000))

输出:

tuple
4.735646052286029
list
4.7308746771886945
set
3.5755991376936436
or
4.687681658193469

对于3到5个字面量，set仍然以较大的优势胜出，并且or成为最慢的。

在Python 2中，set总是最慢的。Or是2到3个字面量时最快的，tuple和list是4个或更多字面量时更快的。我无法区分元组和列表的速度。

当要测试的值缓存在函数外的全局变量中，而不是在循环中创建文字时，set每次都胜出，即使在python2中也是如此。

这些结果适用于Core i7上的64位CPython。

2019-10-25 20:20:58

Python集vs列表

推荐文章

最新文章

标签