我有这样的代码:
good = [x for x in mylist if x in goodvals]
bad = [x for x in mylist if x not in goodvals]
目标是根据mylist的内容是否满足条件,将它们拆分为另外两个列表。
我怎样才能做得更优雅呢?我能避免在mylist上做两个单独的迭代吗?我可以通过这样做来提高性能吗?
我有这样的代码:
good = [x for x in mylist if x in goodvals]
bad = [x for x in mylist if x not in goodvals]
目标是根据mylist的内容是否满足条件,将它们拆分为另外两个列表。
我怎样才能做得更优雅呢?我能避免在mylist上做两个单独的迭代吗?我可以通过这样做来提高性能吗?
当前回答
Good = [x for x in mylist if x in goodvals] Bad = [x for x in mylist if x not in goodvals] 我怎样才能做得更优雅呢?
代码已经非常优雅了。
使用集合可能会有轻微的性能改进,但差异是微不足道的。基于集合的方法也会丢弃重复项,并且不会保留元素的顺序。我发现列表理解也更容易阅读。
事实上,我们甚至可以更简单地使用for循环:
good, bad = [], []
for x in mylist:
if x in goodvals:
good.append(f)
else:
bad.append(f)
这种方法可以更容易地添加额外的逻辑。例如,代码很容易被修改为丢弃None值:
good, bad = [], []
for x in mylist:
if x is None:
continue
if x in goodvals:
good.append(f)
else:
bad.append(f)
其他回答
如果你坚持聪明,你可以采用温登的解决方案,再加上一点虚假的聪明:
def splay(l, f, d=None):
d = d or {}
for x in l: d.setdefault(f(x), []).append(x)
return d
这个问题已经有很多答案了,但似乎都不如我最喜欢的解决这个问题的方法,这种方法只遍历和测试每个项目一次,并使用列表理解的速度来构建两个输出列表之一,因此它只需要使用相对较慢的附加来构建一个输出列表:
bad = []
good = [x for x in mylist if x in goodvals or bad.append(x)]
In my answer to a similar question, I explain how this approach works (a combination of Python's greedy evaluation of or refraining from executing the append for "good" items, and append returning a false-like value which leaves the if condition false for "bad" items), and I show timeit results indicating that this approach outcompetes alternatives like those suggested here, especially in cases where the majority of items will go into the list built by list-comprehension (in this case, the good list).
这里已经有很多解了,但另一种方法是
anims = []
images = [f for f in files if (lambda t: True if f[2].lower() in IMAGE_TYPES else anims.append(t) and False)(f)]
只在列表上迭代一次,看起来更python化,因此对我来说是可读的。
>>> files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ('file1.bmp', 33L, '.bmp')]
>>> IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
>>> anims = []
>>> images = [f for f in files if (lambda t: True if f[2].lower() in IMAGE_TYPES else anims.append(t) and False)(f)]
>>> print '\n'.join([str(anims), str(images)])
[('file2.avi', 999L, '.avi')]
[('file1.jpg', 33L, '.jpg'), ('file1.bmp', 33L, '.bmp')]
>>>
有时候,列表理解并不是最好的选择!
我根据人们对这个话题的回答做了一个小测试,在一个随机生成的列表上测试。以下是列表的生成(可能有更好的方法,但这不是重点):
good_list = ('.jpg','.jpeg','.gif','.bmp','.png')
import random
import string
my_origin_list = []
for i in xrange(10000):
fname = ''.join(random.choice(string.lowercase) for i in range(random.randrange(10)))
if random.getrandbits(1):
fext = random.choice(good_list)
else:
fext = "." + ''.join(random.choice(string.lowercase) for i in range(3))
my_origin_list.append((fname + fext, random.randrange(1000), fext))
好了
# Parand
def f1():
return [e for e in my_origin_list if e[2] in good_list], [e for e in my_origin_list if not e[2] in good_list]
# dbr
def f2():
a, b = list(), list()
for e in my_origin_list:
if e[2] in good_list:
a.append(e)
else:
b.append(e)
return a, b
# John La Rooy
def f3():
a, b = list(), list()
for e in my_origin_list:
(b, a)[e[2] in good_list].append(e)
return a, b
# Ants Aasma
def f4():
l1, l2 = tee((e[2] in good_list, e) for e in my_origin_list)
return [i for p, i in l1 if p], [i for p, i in l2 if not p]
# My personal way to do
def f5():
a, b = zip(*[(e, None) if e[2] in good_list else (None, e) for e in my_origin_list])
return list(filter(None, a)), list(filter(None, b))
# BJ Homer
def f6():
return filter(lambda e: e[2] in good_list, my_origin_list), filter(lambda e: not e[2] in good_list, my_origin_list)
使用cmpthese函数,最好的结果是dbr答案:
f1 204/s -- -5% -14% -15% -20% -26%
f6 215/s 6% -- -9% -11% -16% -22%
f3 237/s 16% 10% -- -2% -7% -14%
f4 240/s 18% 12% 2% -- -6% -13%
f5 255/s 25% 18% 8% 6% -- -8%
f2 277/s 36% 29% 17% 15% 9% --
优雅快捷
受到DanSalmo评论的启发,这里有一个简洁、优雅的解决方案,同时也是最快的解决方案之一。
good_set = set(goodvals)
good, bad = [], []
for item in my_list:
good.append(item) if item in good_set else bad.append(item)
提示:将goodvals转换为一组可以很容易地提高速度。
最快
为了获得最大速度,我们取最快的答案,并通过将good_list转换为一个集合来对其进行涡轮增压。仅这一项就为我们提供了40%以上的速度提升,我们最终得到了比最慢的解决方案快5.5倍以上的解决方案,即使它仍然可读。
good_list_set = set(good_list) # 40%+ faster than a tuple.
good, bad = [], []
for item in my_origin_list:
if item in good_list_set:
good.append(item)
else:
bad.append(item)
稍微短一点
这是之前答案的一个更简洁的版本。
good_list_set = set(good_list) # 40%+ faster than a tuple.
good, bad = [], []
for item in my_origin_list:
out = good if item in good_list_set else bad
out.append(item)
优雅可能有点主观,但一些鲁布·戈德堡风格的解决方案很可爱,很巧妙,不应该用于任何语言的产品代码中,更不用说本质上优雅的python了。
基准测试结果:
filter_BJHomer 80/s -- -3265% -5312% -5900% -6262% -7273% -7363% -8051% -8162% -8244%
zip_Funky 118/s 4848% -- -3040% -3913% -4450% -5951% -6085% -7106% -7271% -7393%
two_lst_tuple_JohnLaRoy 170/s 11332% 4367% -- -1254% -2026% -4182% -4375% -5842% -6079% -6254%
if_else_DBR 195/s 14392% 6428% 1434% -- -882% -3348% -3568% -5246% -5516% -5717%
two_lst_compr_Parand 213/s 16750% 8016% 2540% 967% -- -2705% -2946% -4786% -5083% -5303%
if_else_1_line_DanSalmo 292/s 26668% 14696% 7189% 5033% 3707% -- -331% -2853% -3260% -3562%
tuple_if_else 302/s 27923% 15542% 7778% 5548% 4177% 343% -- -2609% -3029% -3341%
set_1_line 409/s 41308% 24556% 14053% 11035% 9181% 3993% 3529% -- -569% -991%
set_shorter 434/s 44401% 26640% 15503% 12303% 10337% 4836% 4345% 603% -- -448%
set_if_else 454/s 46952% 28358% 16699% 13349% 11290% 5532% 5018% 1100% 469% --
Python 3.7的完整基准代码(从FunkySayu修改而来):
good_list = ['.jpg','.jpeg','.gif','.bmp','.png']
import random
import string
my_origin_list = []
for i in range(10000):
fname = ''.join(random.choice(string.ascii_lowercase) for i in range(random.randrange(10)))
if random.getrandbits(1):
fext = random.choice(list(good_list))
else:
fext = "." + ''.join(random.choice(string.ascii_lowercase) for i in range(3))
my_origin_list.append((fname + fext, random.randrange(1000), fext))
# Parand
def two_lst_compr_Parand(*_):
return [e for e in my_origin_list if e[2] in good_list], [e for e in my_origin_list if not e[2] in good_list]
# dbr
def if_else_DBR(*_):
a, b = list(), list()
for e in my_origin_list:
if e[2] in good_list:
a.append(e)
else:
b.append(e)
return a, b
# John La Rooy
def two_lst_tuple_JohnLaRoy(*_):
a, b = list(), list()
for e in my_origin_list:
(b, a)[e[2] in good_list].append(e)
return a, b
# # Ants Aasma
# def f4():
# l1, l2 = tee((e[2] in good_list, e) for e in my_origin_list)
# return [i for p, i in l1 if p], [i for p, i in l2 if not p]
# My personal way to do
def zip_Funky(*_):
a, b = zip(*[(e, None) if e[2] in good_list else (None, e) for e in my_origin_list])
return list(filter(None, a)), list(filter(None, b))
# BJ Homer
def filter_BJHomer(*_):
return list(filter(lambda e: e[2] in good_list, my_origin_list)), list(filter(lambda e: not e[2] in good_list, my_origin_list))
# ChaimG's answer; as a list.
def if_else_1_line_DanSalmo(*_):
good, bad = [], []
for e in my_origin_list:
_ = good.append(e) if e[2] in good_list else bad.append(e)
return good, bad
# ChaimG's answer; as a set.
def set_1_line(*_):
good_list_set = set(good_list)
good, bad = [], []
for e in my_origin_list:
_ = good.append(e) if e[2] in good_list_set else bad.append(e)
return good, bad
# ChaimG set and if else list.
def set_shorter(*_):
good_list_set = set(good_list)
good, bad = [], []
for e in my_origin_list:
out = good if e[2] in good_list_set else bad
out.append(e)
return good, bad
# ChaimG's best answer; if else as a set.
def set_if_else(*_):
good_list_set = set(good_list)
good, bad = [], []
for e in my_origin_list:
if e[2] in good_list_set:
good.append(e)
else:
bad.append(e)
return good, bad
# ChaimG's best answer; if else as a set.
def tuple_if_else(*_):
good_list_tuple = tuple(good_list)
good, bad = [], []
for e in my_origin_list:
if e[2] in good_list_tuple:
good.append(e)
else:
bad.append(e)
return good, bad
def cmpthese(n=0, functions=None):
results = {}
for func_name in functions:
args = ['%s(range(256))' % func_name, 'from __main__ import %s' % func_name]
t = Timer(*args)
results[func_name] = 1 / (t.timeit(number=n) / n) # passes/sec
functions_sorted = sorted(functions, key=results.__getitem__)
for f in functions_sorted:
diff = []
for func in functions_sorted:
if func == f:
diff.append("--")
else:
diff.append(f"{results[f]/results[func]*100 - 100:5.0%}")
diffs = " ".join(f'{x:>8s}' for x in diff)
print(f"{f:27s} \t{results[f]:,.0f}/s {diffs}")
if __name__=='__main__':
from timeit import Timer
cmpthese(1000, 'two_lst_compr_Parand if_else_DBR two_lst_tuple_JohnLaRoy zip_Funky filter_BJHomer if_else_1_line_DanSalmo set_1_line set_if_else tuple_if_else set_shorter'.split(" "))