如何从列表中删除重复项，同时保持顺序?

如何从列表中删除重复项，同时保持顺序?使用集合删除重复项会破坏原始顺序。是否有内置的或python的习语?

当前回答

这将保持秩序并在O(n)时间内运行。基本上，这个想法是在任何发现副本的地方创建一个洞，并将其沉到底部。使用读写指针。每当发现一个重复项时，只有读指针前进，写指针停留在重复项上覆盖它。

def deduplicate(l):
    count = {}
    (read,write) = (0,0)
    while read < len(l):
        if l[read] in count:
            read += 1
            continue
        count[l[read]] = True
        l[write] = l[read]
        read += 1
        write += 1
    return l[0:write]

2016-01-12 17:16:19

其他回答

sequence = ['1', '2', '3', '3', '6', '4', '5', '6']
unique = []
[unique.append(item) for item in sequence if item not in unique]

unique→[1、(2)、(3)、(6)、(4)、(5)]

2013-04-13 17:32:19

这里有一些替代选项:http://www.peterbe.com/plog/uniqifiers-benchmark

最快的一个:

def f7(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

为什么要赋值。添加到seen_add而不是只调用see . Add ?Python是一种动态语言，解析可见。每次迭代添加比解析一个局部变量代价更大。观察。Add可能会在迭代之间发生更改，而运行时还不够聪明，无法排除这种情况。为了安全起见，它必须每次检查对象。

如果您计划在同一个数据集上大量使用这个函数，那么使用一个有序集可能会更好:http://code.activestate.com/recipes/528878/

O(1)每次操作的插入、删除和成员检查。

(小额外注意:see .add()总是返回None，所以以上值只是作为一种尝试更新集合的方式，而不是逻辑测试的组成部分。)

2009-01-26 15:47:01

对于另一个非常古老的问题的一个非常晚的回答:

itertools食谱有一个函数可以做到这一点，使用了见集技术，但是:

处理标准键函数。不使用不体面的黑客。通过预绑定优化循环。加，而不是查N次。(f7也这样做，但有些版本没有。) 通过使用ifilterfalse优化循环，因此只需遍历Python中唯一的元素，而不是所有元素。(当然，您仍然在ifilterfalse中遍历所有它们，但这是在C中，而且要快得多。)

Is it actually faster than f7? It depends on your data, so you'll have to test it and see. If you want a list in the end, f7 uses a listcomp, and there's no way to do that here. (You can directly append instead of yielding, or you can feed the generator into the list function, but neither one can be as fast as the LIST_APPEND inside a listcomp.) At any rate, usually, squeezing out a few microseconds is not going to be as important as having an easily-understandable, reusable, already-written function that doesn't require DSU when you want to decorate.

和所有的食谱一样，它也有更多的版本。

如果你只想要无键的情况，你可以简化为:

def unique(iterable):
    seen = set()
    seen_add = seen.add
    for element in itertools.ifilterfalse(seen.__contains__, iterable):
        seen_add(element)
        yield element

2013-10-09 18:27:09

1. 这些解决方案很好…… 为了在保留秩序的同时删除重复项，本页其他地方提出了优秀的解决方案:

seen = set()
[x for x in seq if not (x in seen or seen.add(x))]

以及变化，例如:

seen = set()
[x for x in seq if x not in seen and not seen.add(x)]

确实很受欢迎，因为它们简单、极简，并部署了正确的哈希以获得最佳效率。关于这些方法的主要抱怨似乎是，将方法see .add(x)“返回”的不变量None用作逻辑表达式中的常量(因此是多余的/不必要的)值(只是为了它的副作用)是笨拙和/或令人困惑的。

2. …but they waste one hash lookup per iteration. Surprisingly, given the amount of discussion and debate on this topic, there is actually a significant improvement to the code that seems to have been overlooked. As shown, each "test-and-set" iteration requires two hash lookups: the first to test membership x not in seen and then again to actually add the value seen.add(x). Since the first operation guarantees that the second will always be successful, there is a wasteful duplication of effort here. And because the overall technique here is so efficient, the excess hash lookups will likely end up being the most expensive proportion of what little work remains.

3.相反，让布景完成它的工作吧! 注意，上面的例子只调用set。加上预见，这样做总是会导致集合成员的增加。集合本身永远没有机会拒绝副本;我们的代码片段实际上已经篡夺了这个角色。使用显式的两步测试和设置代码剥夺了set自身排除这些重复的核心能力。

4. 单哈希查找代码: 下面的版本将每次迭代的哈希查找次数减少了一半，从两次减少到只有一次。

seen = set()
[x for x in seq if len(seen) < len(seen.add(x) or seen)]

2021-07-08 20:31:29

在CPython 3.6+(以及从Python 3.7+开始的所有其他Python实现)中，字典是有序的，因此从可迭代对象中删除重复项同时保持其原始顺序的方法是:

>>> list(dict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']

在Python 3.5及以下版本(包括Python 2.7)中，使用OrderedDict。我的计时表明，这是Python 3.5的各种方法中最快和最短的(当它获得C实现时;在3.5之前，它仍然是最清晰的解决方案，尽管不是最快的)。

>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']

2016-10-03 15:47:33

如何从列表中删除重复项，同时保持顺序?

推荐文章

最新文章

标签