我有这样的代码:
good = [x for x in mylist if x in goodvals]
bad = [x for x in mylist if x not in goodvals]
目标是根据mylist的内容是否满足条件,将它们拆分为另外两个列表。
我怎样才能做得更优雅呢?我能避免在mylist上做两个单独的迭代吗?我可以通过这样做来提高性能吗?
我有这样的代码:
good = [x for x in mylist if x in goodvals]
bad = [x for x in mylist if x not in goodvals]
目标是根据mylist的内容是否满足条件,将它们拆分为另外两个列表。
我怎样才能做得更优雅呢?我能避免在mylist上做两个单独的迭代吗?我可以通过这样做来提高性能吗?
当前回答
这是这个问题的另一个解决方案。我需要一个尽可能快的解决方案。这意味着只对列表进行一次迭代,并且最好是O(1)用于向结果列表之一添加数据。这与sastanin提供的解决方案非常相似,只是更短:
from collections import deque
def split(iterable, function):
dq_true = deque()
dq_false = deque()
# deque - the fastest way to consume an iterator and append items
deque((
(dq_true if function(item) else dq_false).append(item) for item in iterable
), maxlen=0)
return dq_true, dq_false
此时,可以按照如下方式使用该函数:
lower, higher = split([0,1,2,3,4,5,6,7,8,9], lambda x: x < 5)
selected, other = split([0,1,2,3,4,5,6,7,8,9], lambda x: x in {0,4,9})
如果你对结果的deque对象不满意,你可以很容易地将其转换为list、set或任何你喜欢的对象(例如list(lower))。转换要快得多,直接构建列表。
该方法保持项目的顺序,以及任何副本。
其他回答
我将采用2步方法,将谓词的求值与列表的过滤分离:
def partition(pred, iterable):
xs = list(zip(map(pred, iterable), iterable))
return [x[1] for x in xs if x[0]], [x[1] for x in xs if not x[0]]
就性能而言(除了在iterable的每个成员上只对pred求值一次之外),这样做的好处在于它将大量逻辑从解释器中移出,转移到高度优化的迭代和映射代码中。这可以加快长迭代对象的迭代速度,就像回答中描述的那样。
在表达性方面,它利用了像理解和映射这样的表达性习语。
Good = [x for x in mylist if x in goodvals] Bad = [x for x in mylist if x not in goodvals] 我怎样才能做得更优雅呢?
代码已经非常优雅了。
使用集合可能会有轻微的性能改进,但差异是微不足道的。基于集合的方法也会丢弃重复项,并且不会保留元素的顺序。我发现列表理解也更容易阅读。
事实上,我们甚至可以更简单地使用for循环:
good, bad = [], []
for x in mylist:
if x in goodvals:
good.append(f)
else:
bad.append(f)
这种方法可以更容易地添加额外的逻辑。例如,代码很容易被修改为丢弃None值:
good, bad = [], []
for x in mylist:
if x is None:
continue
if x in goodvals:
good.append(f)
else:
bad.append(f)
images = [f for f in files if f[2].lower() in IMAGE_TYPES]
anims = [f for f in files if f not in images]
当条件较长时很好,例如在您的示例中。读者不需要弄清楚否定条件以及它是否适用于所有其他情况。
如果你坚持聪明,你可以采用温登的解决方案,再加上一点虚假的聪明:
def splay(l, f, d=None):
d = d or {}
for x in l: d.setdefault(f(x), []).append(x)
return d
优雅快捷
受到DanSalmo评论的启发,这里有一个简洁、优雅的解决方案,同时也是最快的解决方案之一。
good_set = set(goodvals)
good, bad = [], []
for item in my_list:
good.append(item) if item in good_set else bad.append(item)
提示:将goodvals转换为一组可以很容易地提高速度。
最快
为了获得最大速度,我们取最快的答案,并通过将good_list转换为一个集合来对其进行涡轮增压。仅这一项就为我们提供了40%以上的速度提升,我们最终得到了比最慢的解决方案快5.5倍以上的解决方案,即使它仍然可读。
good_list_set = set(good_list) # 40%+ faster than a tuple.
good, bad = [], []
for item in my_origin_list:
if item in good_list_set:
good.append(item)
else:
bad.append(item)
稍微短一点
这是之前答案的一个更简洁的版本。
good_list_set = set(good_list) # 40%+ faster than a tuple.
good, bad = [], []
for item in my_origin_list:
out = good if item in good_list_set else bad
out.append(item)
优雅可能有点主观,但一些鲁布·戈德堡风格的解决方案很可爱,很巧妙,不应该用于任何语言的产品代码中,更不用说本质上优雅的python了。
基准测试结果:
filter_BJHomer 80/s -- -3265% -5312% -5900% -6262% -7273% -7363% -8051% -8162% -8244%
zip_Funky 118/s 4848% -- -3040% -3913% -4450% -5951% -6085% -7106% -7271% -7393%
two_lst_tuple_JohnLaRoy 170/s 11332% 4367% -- -1254% -2026% -4182% -4375% -5842% -6079% -6254%
if_else_DBR 195/s 14392% 6428% 1434% -- -882% -3348% -3568% -5246% -5516% -5717%
two_lst_compr_Parand 213/s 16750% 8016% 2540% 967% -- -2705% -2946% -4786% -5083% -5303%
if_else_1_line_DanSalmo 292/s 26668% 14696% 7189% 5033% 3707% -- -331% -2853% -3260% -3562%
tuple_if_else 302/s 27923% 15542% 7778% 5548% 4177% 343% -- -2609% -3029% -3341%
set_1_line 409/s 41308% 24556% 14053% 11035% 9181% 3993% 3529% -- -569% -991%
set_shorter 434/s 44401% 26640% 15503% 12303% 10337% 4836% 4345% 603% -- -448%
set_if_else 454/s 46952% 28358% 16699% 13349% 11290% 5532% 5018% 1100% 469% --
Python 3.7的完整基准代码(从FunkySayu修改而来):
good_list = ['.jpg','.jpeg','.gif','.bmp','.png']
import random
import string
my_origin_list = []
for i in range(10000):
fname = ''.join(random.choice(string.ascii_lowercase) for i in range(random.randrange(10)))
if random.getrandbits(1):
fext = random.choice(list(good_list))
else:
fext = "." + ''.join(random.choice(string.ascii_lowercase) for i in range(3))
my_origin_list.append((fname + fext, random.randrange(1000), fext))
# Parand
def two_lst_compr_Parand(*_):
return [e for e in my_origin_list if e[2] in good_list], [e for e in my_origin_list if not e[2] in good_list]
# dbr
def if_else_DBR(*_):
a, b = list(), list()
for e in my_origin_list:
if e[2] in good_list:
a.append(e)
else:
b.append(e)
return a, b
# John La Rooy
def two_lst_tuple_JohnLaRoy(*_):
a, b = list(), list()
for e in my_origin_list:
(b, a)[e[2] in good_list].append(e)
return a, b
# # Ants Aasma
# def f4():
# l1, l2 = tee((e[2] in good_list, e) for e in my_origin_list)
# return [i for p, i in l1 if p], [i for p, i in l2 if not p]
# My personal way to do
def zip_Funky(*_):
a, b = zip(*[(e, None) if e[2] in good_list else (None, e) for e in my_origin_list])
return list(filter(None, a)), list(filter(None, b))
# BJ Homer
def filter_BJHomer(*_):
return list(filter(lambda e: e[2] in good_list, my_origin_list)), list(filter(lambda e: not e[2] in good_list, my_origin_list))
# ChaimG's answer; as a list.
def if_else_1_line_DanSalmo(*_):
good, bad = [], []
for e in my_origin_list:
_ = good.append(e) if e[2] in good_list else bad.append(e)
return good, bad
# ChaimG's answer; as a set.
def set_1_line(*_):
good_list_set = set(good_list)
good, bad = [], []
for e in my_origin_list:
_ = good.append(e) if e[2] in good_list_set else bad.append(e)
return good, bad
# ChaimG set and if else list.
def set_shorter(*_):
good_list_set = set(good_list)
good, bad = [], []
for e in my_origin_list:
out = good if e[2] in good_list_set else bad
out.append(e)
return good, bad
# ChaimG's best answer; if else as a set.
def set_if_else(*_):
good_list_set = set(good_list)
good, bad = [], []
for e in my_origin_list:
if e[2] in good_list_set:
good.append(e)
else:
bad.append(e)
return good, bad
# ChaimG's best answer; if else as a set.
def tuple_if_else(*_):
good_list_tuple = tuple(good_list)
good, bad = [], []
for e in my_origin_list:
if e[2] in good_list_tuple:
good.append(e)
else:
bad.append(e)
return good, bad
def cmpthese(n=0, functions=None):
results = {}
for func_name in functions:
args = ['%s(range(256))' % func_name, 'from __main__ import %s' % func_name]
t = Timer(*args)
results[func_name] = 1 / (t.timeit(number=n) / n) # passes/sec
functions_sorted = sorted(functions, key=results.__getitem__)
for f in functions_sorted:
diff = []
for func in functions_sorted:
if func == f:
diff.append("--")
else:
diff.append(f"{results[f]/results[func]*100 - 100:5.0%}")
diffs = " ".join(f'{x:>8s}' for x in diff)
print(f"{f:27s} \t{results[f]:,.0f}/s {diffs}")
if __name__=='__main__':
from timeit import Timer
cmpthese(1000, 'two_lst_compr_Parand if_else_DBR two_lst_tuple_JohnLaRoy zip_Funky filter_BJHomer if_else_1_line_DanSalmo set_1_line set_if_else tuple_if_else set_shorter'.split(" "))