我有这样的代码:
good = [x for x in mylist if x in goodvals]
bad = [x for x in mylist if x not in goodvals]
目标是根据mylist的内容是否满足条件,将它们拆分为另外两个列表。
我怎样才能做得更优雅呢?我能避免在mylist上做两个单独的迭代吗?我可以通过这样做来提高性能吗?
我有这样的代码:
good = [x for x in mylist if x in goodvals]
bad = [x for x in mylist if x not in goodvals]
目标是根据mylist的内容是否满足条件,将它们拆分为另外两个列表。
我怎样才能做得更优雅呢?我能避免在mylist上做两个单独的迭代吗?我可以通过这样做来提高性能吗?
当前回答
def partition(pred, iterable):
'Use a predicate to partition entries into false entries and true entries'
# partition(is_odd, range(10)) --> 0 2 4 6 8 and 1 3 5 7 9
t1, t2 = tee(iterable)
return filterfalse(pred, t1), filter(pred, t2)
检查这个
其他回答
例如,按偶数和奇数拆分列表
arr = range(20)
even, odd = reduce(lambda res, next: res[next % 2].append(next) or res, arr, ([], []))
或者概括地说:
def split(predicate, iterable):
return reduce(lambda res, e: res[predicate(e)].append(e) or res, iterable, ([], []))
优点:
最短路径 Predicate对每个元素只应用一次
缺点
需要函数式编程范例的知识
所有提出的解决方案的问题是,它将扫描和应用过滤功能两次。我会做一个简单的小函数,像这样:
def split_into_two_lists(lst, f):
a = []
b = []
for elem in lst:
if f(elem):
a.append(elem)
else:
b.append(elem)
return a, b
这样你就不会重复处理任何东西,也不会重复代码。
itertools。Groupby几乎可以满足您的要求,除了它要求对条目进行排序以确保您获得一个连续的范围之外,因此您需要首先根据键进行排序(否则您将为每种类型获得多个交错的组)。如。
def is_good(f):
return f[2].lower() in IMAGE_TYPES
files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ('file3.gif', 123L, '.gif')]
for key, group in itertools.groupby(sorted(files, key=is_good), key=is_good):
print key, list(group)
给:
False [('file2.avi', 999L, '.avi')]
True [('file1.jpg', 33L, '.jpg'), ('file3.gif', 123L, '.gif')]
与其他解决方案类似,可以将键func定义为任意数量的组。
之前的答案似乎并不能满足我所有的四种强迫症:
尽可能的懒惰, 只对原始Iterable求值一次 每个项只计算谓词一次 提供良好的类型注释(适用于python 3.7)
我的解决方案并不漂亮,我不认为我可以推荐使用它,但它是:
def iter_split_on_predicate(predicate: Callable[[T], bool], iterable: Iterable[T]) -> Tuple[Iterator[T], Iterator[T]]:
deque_predicate_true = deque()
deque_predicate_false = deque()
# define a generator function to consume the input iterable
# the Predicate is evaluated once per item, added to the appropriate deque, and the predicate result it yielded
def shared_generator(definitely_an_iterator):
for item in definitely_an_iterator:
print("Evaluate predicate.")
if predicate(item):
deque_predicate_true.appendleft(item)
yield True
else:
deque_predicate_false.appendleft(item)
yield False
# consume input iterable only once,
# converting to an iterator with the iter() function if necessary. Probably this conversion is unnecessary
shared_gen = shared_generator(
iterable if isinstance(iterable, collections.abc.Iterator) else iter(iterable)
)
# define a generator function for each predicate outcome and queue
def iter_for(predicate_value, hold_queue):
def consume_shared_generator_until_hold_queue_contains_something():
if not hold_queue:
try:
while next(shared_gen) != predicate_value:
pass
except:
pass
consume_shared_generator_until_hold_queue_contains_something()
while hold_queue:
print("Yield where predicate is "+str(predicate_value))
yield hold_queue.pop()
consume_shared_generator_until_hold_queue_contains_something()
# return a tuple of two generators
return iter_for(predicate_value=True, hold_queue=deque_predicate_true), iter_for(predicate_value=False, hold_queue=deque_predicate_false)
用下面的测试,我们从print语句中得到如下输出:
t,f = iter_split_on_predicate(lambda item:item>=10,[1,2,3,10,11,12,4,5,6,13,14,15])
print(list(zip(t,f)))
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# [(10, 1), (11, 2), (12, 3), (13, 4), (14, 5), (15, 6)]
第一步(pre-OP-edit):使用集合:
mylist = [1,2,3,4,5,6,7]
goodvals = [1,3,7,8,9]
myset = set(mylist)
goodset = set(goodvals)
print list(myset.intersection(goodset)) # [1, 3, 7]
print list(myset.difference(goodset)) # [2, 4, 5, 6]
这对可读性(IMHO)和性能都有好处。
第二步(post-OP-edit):
创建一个好的扩展列表:
IMAGE_TYPES = set(['.jpg','.jpeg','.gif','.bmp','.png'])
这将提高性能。否则,你现在的情况在我看来还不错。