如何从列表中删除重复项,同时保持顺序?使用集合删除重复项会破坏原始顺序。 是否有内置的或python的习语?


当前回答

对于不可哈希类型(例如列表的列表),基于MizardX的:

def f7_noHash(seq)
    seen = set()
    return [ x for x in seq if str( x ) not in seen and not seen.add( str( x ) )]

其他回答

MizardX的答案提供了多种方法的良好集合。

这是我自言自语时想到的:

mylist = [x for i,x in enumerate(mylist) if x not in mylist[i+1:]]

我觉得如果你想维持秩序,

你可以试试这个:

list1 = ['b','c','d','b','c','a','a']    
list2 = list(set(list1))    
list2.sort(key=list1.index)    
print list2

或者类似地,你可以这样做:

list1 = ['b','c','d','b','c','a','a']  
list2 = sorted(set(list1),key=list1.index)  
print list2 

你还可以这样做:

list1 = ['b','c','d','b','c','a','a']    
list2 = []    
for i in list1:    
    if not i in list2:  
        list2.append(i)`    
print list2

它也可以写成这样:

list1 = ['b','c','d','b','c','a','a']    
list2 = []    
[list2.append(i) for i in list1 if not i in list2]    
print list2 

使用_sorted_ a numpy数组的相对有效方法:

b = np.array([1,3,3, 8, 12, 12,12])    
numpy.hstack([b[0], [x[0] for x in zip(b[1:], b[:-1]) if x[0]!=x[1]]])

输出:

array([ 1,  3,  8, 12])

这里有一个简单的方法:

list1 = ["hello", " ", "w", "o", "r", "l", "d"]
sorted(set(list1 ), key=list1.index)

输出如下:

["hello", " ", "w", "o", "r", "l", "d"]

如果你经常使用pandas,并且美学优先于性能,那么考虑内置函数pandas. series .drop_duplicate:

    import pandas as pd
    import numpy as np

    uniquifier = lambda alist: pd.Series(alist).drop_duplicates().tolist()

    # from the chosen answer 
    def f7(seq):
        seen = set()
        seen_add = seen.add
        return [ x for x in seq if not (x in seen or seen_add(x))]

    alist = np.random.randint(low=0, high=1000, size=10000).tolist()

    print uniquifier(alist) == f7(alist)  # True

时间:

    In [104]: %timeit f7(alist)
    1000 loops, best of 3: 1.3 ms per loop
    In [110]: %timeit uniquifier(alist)
    100 loops, best of 3: 4.39 ms per loop