我有这样的代码:
good = [x for x in mylist if x in goodvals]
bad = [x for x in mylist if x not in goodvals]
目标是根据mylist的内容是否满足条件,将它们拆分为另外两个列表。
我怎样才能做得更优雅呢?我能避免在mylist上做两个单独的迭代吗?我可以通过这样做来提高性能吗?
我有这样的代码:
good = [x for x in mylist if x in goodvals]
bad = [x for x in mylist if x not in goodvals]
目标是根据mylist的内容是否满足条件,将它们拆分为另外两个列表。
我怎样才能做得更优雅呢?我能避免在mylist上做两个单独的迭代吗?我可以通过这样做来提高性能吗?
当前回答
有时候你并不需要清单上的另一半。 例如:
import sys
from itertools import ifilter
trustedPeople = sys.argv[1].split(',')
newName = sys.argv[2]
myFriends = ifilter(lambda x: x.startswith('Shi'), trustedPeople)
print '%s is %smy friend.' % (newName, newName not in myFriends 'not ' or '')
其他回答
这个问题已经有很多答案了,但似乎都不如我最喜欢的解决这个问题的方法,这种方法只遍历和测试每个项目一次,并使用列表理解的速度来构建两个输出列表之一,因此它只需要使用相对较慢的附加来构建一个输出列表:
bad = []
good = [x for x in mylist if x in goodvals or bad.append(x)]
In my answer to a similar question, I explain how this approach works (a combination of Python's greedy evaluation of or refraining from executing the append for "good" items, and append returning a false-like value which leaves the if condition false for "bad" items), and I show timeit results indicating that this approach outcompetes alternatives like those suggested here, especially in cases where the majority of items will go into the list built by list-comprehension (in this case, the good list).
我基本上喜欢安德斯的方法,因为它非常普遍。下面的版本将分类器放在前面(以匹配过滤器语法),并使用defaultdict(假定已导入)。
def categorize(func, seq):
"""Return mapping from categories to lists
of categorized items.
"""
d = defaultdict(list)
for item in seq:
d[func(item)].append(item)
return d
之前的答案似乎并不能满足我所有的四种强迫症:
尽可能的懒惰, 只对原始Iterable求值一次 每个项只计算谓词一次 提供良好的类型注释(适用于python 3.7)
我的解决方案并不漂亮,我不认为我可以推荐使用它,但它是:
def iter_split_on_predicate(predicate: Callable[[T], bool], iterable: Iterable[T]) -> Tuple[Iterator[T], Iterator[T]]:
deque_predicate_true = deque()
deque_predicate_false = deque()
# define a generator function to consume the input iterable
# the Predicate is evaluated once per item, added to the appropriate deque, and the predicate result it yielded
def shared_generator(definitely_an_iterator):
for item in definitely_an_iterator:
print("Evaluate predicate.")
if predicate(item):
deque_predicate_true.appendleft(item)
yield True
else:
deque_predicate_false.appendleft(item)
yield False
# consume input iterable only once,
# converting to an iterator with the iter() function if necessary. Probably this conversion is unnecessary
shared_gen = shared_generator(
iterable if isinstance(iterable, collections.abc.Iterator) else iter(iterable)
)
# define a generator function for each predicate outcome and queue
def iter_for(predicate_value, hold_queue):
def consume_shared_generator_until_hold_queue_contains_something():
if not hold_queue:
try:
while next(shared_gen) != predicate_value:
pass
except:
pass
consume_shared_generator_until_hold_queue_contains_something()
while hold_queue:
print("Yield where predicate is "+str(predicate_value))
yield hold_queue.pop()
consume_shared_generator_until_hold_queue_contains_something()
# return a tuple of two generators
return iter_for(predicate_value=True, hold_queue=deque_predicate_true), iter_for(predicate_value=False, hold_queue=deque_predicate_false)
用下面的测试,我们从print语句中得到如下输出:
t,f = iter_split_on_predicate(lambda item:item>=10,[1,2,3,10,11,12,4,5,6,13,14,15])
print(list(zip(t,f)))
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# [(10, 1), (11, 2), (12, 3), (13, 4), (14, 5), (15, 6)]
这是这个问题的另一个解决方案。我需要一个尽可能快的解决方案。这意味着只对列表进行一次迭代,并且最好是O(1)用于向结果列表之一添加数据。这与sastanin提供的解决方案非常相似,只是更短:
from collections import deque
def split(iterable, function):
dq_true = deque()
dq_false = deque()
# deque - the fastest way to consume an iterator and append items
deque((
(dq_true if function(item) else dq_false).append(item) for item in iterable
), maxlen=0)
return dq_true, dq_false
此时,可以按照如下方式使用该函数:
lower, higher = split([0,1,2,3,4,5,6,7,8,9], lambda x: x < 5)
selected, other = split([0,1,2,3,4,5,6,7,8,9], lambda x: x in {0,4,9})
如果你对结果的deque对象不满意,你可以很容易地将其转换为list、set或任何你喜欢的对象(例如list(lower))。转换要快得多,直接构建列表。
该方法保持项目的顺序,以及任何副本。
一个基于生成器的版本,如果你能忍受一个或两个原始列表的反转。
设置…
random.seed(1234)
a = list(range(10))
random.shuffle(a)
a
[2, 8, 3, 5, 6, 4, 9, 0, 1, 7]
至于分裂……
(list((a.pop(j) for j, y in [(len(a)-i-1, x) for i,x in enumerate(a[::-1])] if y%2 == 0))[::-1], a)
([2, 8, 6, 4, 0], [3, 5, 9, 1, 7])
Another list of tuples of locations and each element is built in reverse order. In a generator wrapped round that each element is tested against the predicate (here test for even) and if True then the element is poped using previously computed locations. We are working backwards along the list so poping elements out does not change positions closer to the beginning of the list. A wrapping list() evaluates the generator and a final revers [::-1] puts the elements back in the right order. The original list "a" now only contains elements that for which the predicate is False.