如何从列表中删除重复项,同时保持顺序?使用集合删除重复项会破坏原始顺序。 是否有内置的或python的习语?
当前回答
如果你经常使用pandas,并且美学优先于性能,那么考虑内置函数pandas. series .drop_duplicate:
import pandas as pd
import numpy as np
uniquifier = lambda alist: pd.Series(alist).drop_duplicates().tolist()
# from the chosen answer
def f7(seq):
seen = set()
seen_add = seen.add
return [ x for x in seq if not (x in seen or seen_add(x))]
alist = np.random.randint(low=0, high=1000, size=10000).tolist()
print uniquifier(alist) == f7(alist) # True
时间:
In [104]: %timeit f7(alist)
1000 loops, best of 3: 1.3 ms per loop
In [110]: %timeit uniquifier(alist)
100 loops, best of 3: 4.39 ms per loop
其他回答
你可以引用一个列表推导式,因为它是由符号'_[1]'构建的。例如,下面的函数通过引用列表推导式对元素列表进行惟一化,而不改变它们的顺序。
def unique(my_list):
return [x for x in my_list if x not in locals()['_[1]']]
演示:
l1 = [1, 2, 3, 4, 1, 2, 3, 4, 5]
l2 = [x for x in l1 if x not in locals()['_[1]']]
print l2
输出:
[1, 2, 3, 4, 5]
x = [1, 2, 1, 3, 1, 4]
# brute force method
arr = []
for i in x:
if not i in arr:
arr.insert(x[i],i)
# recursive method
tmp = []
def remove_duplicates(j=0):
if j < len(x):
if not x[j] in tmp:
tmp.append(x[j])
i = j+1
remove_duplicates(i)
remove_duplicates()
5倍更快减少变种,但更复杂
>>> l = [5, 6, 6, 1, 1, 2, 2, 3, 4]
>>> reduce(lambda r, v: v in r[1] and r or (r[0].append(v) or r[1].add(v)) or r, l, ([], set()))[0]
[5, 6, 1, 2, 3, 4]
解释:
default = (list(), set())
# use list to keep order
# use set to make lookup faster
def reducer(result, item):
if item not in result[1]:
result[0].append(item)
result[1].add(item)
return result
>>> reduce(reducer, l, default)[0]
[5, 6, 1, 2, 3, 4]
对于不可哈希类型(例如列表的列表),基于MizardX的:
def f7_noHash(seq)
seen = set()
return [ x for x in seq if str( x ) not in seen and not seen.add( str( x ) )]
l = [1,2,2,3,3,...]
n = []
n.extend(ele for ele in l if ele not in set(n))
一个生成器表达式,它使用集合的O(1)查找来确定是否在新列表中包含元素。