我有这样的代码:
good = [x for x in mylist if x in goodvals]
bad = [x for x in mylist if x not in goodvals]
目标是根据mylist的内容是否满足条件,将它们拆分为另外两个列表。
我怎样才能做得更优雅呢?我能避免在mylist上做两个单独的迭代吗?我可以通过这样做来提高性能吗?
我有这样的代码:
good = [x for x in mylist if x in goodvals]
bad = [x for x in mylist if x not in goodvals]
目标是根据mylist的内容是否满足条件,将它们拆分为另外两个列表。
我怎样才能做得更优雅呢?我能避免在mylist上做两个单独的迭代吗?我可以通过这样做来提高性能吗?
当前回答
有时候你并不需要清单上的另一半。 例如:
import sys
from itertools import ifilter
trustedPeople = sys.argv[1].split(',')
newName = sys.argv[2]
myFriends = ifilter(lambda x: x.startswith('Shi'), trustedPeople)
print '%s is %smy friend.' % (newName, newName not in myFriends 'not ' or '')
其他回答
有时候,列表理解并不是最好的选择!
我根据人们对这个话题的回答做了一个小测试,在一个随机生成的列表上测试。以下是列表的生成(可能有更好的方法,但这不是重点):
good_list = ('.jpg','.jpeg','.gif','.bmp','.png')
import random
import string
my_origin_list = []
for i in xrange(10000):
fname = ''.join(random.choice(string.lowercase) for i in range(random.randrange(10)))
if random.getrandbits(1):
fext = random.choice(good_list)
else:
fext = "." + ''.join(random.choice(string.lowercase) for i in range(3))
my_origin_list.append((fname + fext, random.randrange(1000), fext))
好了
# Parand
def f1():
return [e for e in my_origin_list if e[2] in good_list], [e for e in my_origin_list if not e[2] in good_list]
# dbr
def f2():
a, b = list(), list()
for e in my_origin_list:
if e[2] in good_list:
a.append(e)
else:
b.append(e)
return a, b
# John La Rooy
def f3():
a, b = list(), list()
for e in my_origin_list:
(b, a)[e[2] in good_list].append(e)
return a, b
# Ants Aasma
def f4():
l1, l2 = tee((e[2] in good_list, e) for e in my_origin_list)
return [i for p, i in l1 if p], [i for p, i in l2 if not p]
# My personal way to do
def f5():
a, b = zip(*[(e, None) if e[2] in good_list else (None, e) for e in my_origin_list])
return list(filter(None, a)), list(filter(None, b))
# BJ Homer
def f6():
return filter(lambda e: e[2] in good_list, my_origin_list), filter(lambda e: not e[2] in good_list, my_origin_list)
使用cmpthese函数,最好的结果是dbr答案:
f1 204/s -- -5% -14% -15% -20% -26%
f6 215/s 6% -- -9% -11% -16% -22%
f3 237/s 16% 10% -- -2% -7% -14%
f4 240/s 18% 12% 2% -- -6% -13%
f5 255/s 25% 18% 8% 6% -- -8%
f2 277/s 36% 29% 17% 15% 9% --
不确定这是否是一个好方法,但也可以这样做
IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi')]
images, anims = reduce(lambda (i, a), f: (i + [f], a) if f[2] in IMAGE_TYPES else (i, a + [f]), files, ([], []))
清晰快速
这个列表理解是简单的阅读和快速。这正是上级要求的。
set_good_vals = set(good_vals) # Speed boost.
good = [x for x in my_list if x in set_good_vals]
bad = [x for x in my_list if x not in set_good_vals]
我更喜欢一个列表理解而不是两个,但不像张贴的许多答案(其中一些相当巧妙),它是可读的和清晰的。这也是网页上最快的答案之一。
唯一(稍微)快一点的答案是:
set_good_vals = set(good_vals)
good, bad = [], []
for item in my_list:
_ = good.append(item) if item in set_good_vals else bad.append(item)
...还有它的变体。(见我的另一个答案)。但我觉得第一种方法更优雅,而且几乎一样快。
一个基于生成器的版本,如果你能忍受一个或两个原始列表的反转。
设置…
random.seed(1234)
a = list(range(10))
random.shuffle(a)
a
[2, 8, 3, 5, 6, 4, 9, 0, 1, 7]
至于分裂……
(list((a.pop(j) for j, y in [(len(a)-i-1, x) for i,x in enumerate(a[::-1])] if y%2 == 0))[::-1], a)
([2, 8, 6, 4, 0], [3, 5, 9, 1, 7])
Another list of tuples of locations and each element is built in reverse order. In a generator wrapped round that each element is tested against the predicate (here test for even) and if True then the element is poped using previously computed locations. We are working backwards along the list so poping elements out does not change positions closer to the beginning of the list. A wrapping list() evaluates the generator and a final revers [::-1] puts the elements back in the right order. The original list "a" now only contains elements that for which the predicate is False.
优雅快捷
受到DanSalmo评论的启发,这里有一个简洁、优雅的解决方案,同时也是最快的解决方案之一。
good_set = set(goodvals)
good, bad = [], []
for item in my_list:
good.append(item) if item in good_set else bad.append(item)
提示:将goodvals转换为一组可以很容易地提高速度。
最快
为了获得最大速度,我们取最快的答案,并通过将good_list转换为一个集合来对其进行涡轮增压。仅这一项就为我们提供了40%以上的速度提升,我们最终得到了比最慢的解决方案快5.5倍以上的解决方案,即使它仍然可读。
good_list_set = set(good_list) # 40%+ faster than a tuple.
good, bad = [], []
for item in my_origin_list:
if item in good_list_set:
good.append(item)
else:
bad.append(item)
稍微短一点
这是之前答案的一个更简洁的版本。
good_list_set = set(good_list) # 40%+ faster than a tuple.
good, bad = [], []
for item in my_origin_list:
out = good if item in good_list_set else bad
out.append(item)
优雅可能有点主观,但一些鲁布·戈德堡风格的解决方案很可爱,很巧妙,不应该用于任何语言的产品代码中,更不用说本质上优雅的python了。
基准测试结果:
filter_BJHomer 80/s -- -3265% -5312% -5900% -6262% -7273% -7363% -8051% -8162% -8244%
zip_Funky 118/s 4848% -- -3040% -3913% -4450% -5951% -6085% -7106% -7271% -7393%
two_lst_tuple_JohnLaRoy 170/s 11332% 4367% -- -1254% -2026% -4182% -4375% -5842% -6079% -6254%
if_else_DBR 195/s 14392% 6428% 1434% -- -882% -3348% -3568% -5246% -5516% -5717%
two_lst_compr_Parand 213/s 16750% 8016% 2540% 967% -- -2705% -2946% -4786% -5083% -5303%
if_else_1_line_DanSalmo 292/s 26668% 14696% 7189% 5033% 3707% -- -331% -2853% -3260% -3562%
tuple_if_else 302/s 27923% 15542% 7778% 5548% 4177% 343% -- -2609% -3029% -3341%
set_1_line 409/s 41308% 24556% 14053% 11035% 9181% 3993% 3529% -- -569% -991%
set_shorter 434/s 44401% 26640% 15503% 12303% 10337% 4836% 4345% 603% -- -448%
set_if_else 454/s 46952% 28358% 16699% 13349% 11290% 5532% 5018% 1100% 469% --
Python 3.7的完整基准代码(从FunkySayu修改而来):
good_list = ['.jpg','.jpeg','.gif','.bmp','.png']
import random
import string
my_origin_list = []
for i in range(10000):
fname = ''.join(random.choice(string.ascii_lowercase) for i in range(random.randrange(10)))
if random.getrandbits(1):
fext = random.choice(list(good_list))
else:
fext = "." + ''.join(random.choice(string.ascii_lowercase) for i in range(3))
my_origin_list.append((fname + fext, random.randrange(1000), fext))
# Parand
def two_lst_compr_Parand(*_):
return [e for e in my_origin_list if e[2] in good_list], [e for e in my_origin_list if not e[2] in good_list]
# dbr
def if_else_DBR(*_):
a, b = list(), list()
for e in my_origin_list:
if e[2] in good_list:
a.append(e)
else:
b.append(e)
return a, b
# John La Rooy
def two_lst_tuple_JohnLaRoy(*_):
a, b = list(), list()
for e in my_origin_list:
(b, a)[e[2] in good_list].append(e)
return a, b
# # Ants Aasma
# def f4():
# l1, l2 = tee((e[2] in good_list, e) for e in my_origin_list)
# return [i for p, i in l1 if p], [i for p, i in l2 if not p]
# My personal way to do
def zip_Funky(*_):
a, b = zip(*[(e, None) if e[2] in good_list else (None, e) for e in my_origin_list])
return list(filter(None, a)), list(filter(None, b))
# BJ Homer
def filter_BJHomer(*_):
return list(filter(lambda e: e[2] in good_list, my_origin_list)), list(filter(lambda e: not e[2] in good_list, my_origin_list))
# ChaimG's answer; as a list.
def if_else_1_line_DanSalmo(*_):
good, bad = [], []
for e in my_origin_list:
_ = good.append(e) if e[2] in good_list else bad.append(e)
return good, bad
# ChaimG's answer; as a set.
def set_1_line(*_):
good_list_set = set(good_list)
good, bad = [], []
for e in my_origin_list:
_ = good.append(e) if e[2] in good_list_set else bad.append(e)
return good, bad
# ChaimG set and if else list.
def set_shorter(*_):
good_list_set = set(good_list)
good, bad = [], []
for e in my_origin_list:
out = good if e[2] in good_list_set else bad
out.append(e)
return good, bad
# ChaimG's best answer; if else as a set.
def set_if_else(*_):
good_list_set = set(good_list)
good, bad = [], []
for e in my_origin_list:
if e[2] in good_list_set:
good.append(e)
else:
bad.append(e)
return good, bad
# ChaimG's best answer; if else as a set.
def tuple_if_else(*_):
good_list_tuple = tuple(good_list)
good, bad = [], []
for e in my_origin_list:
if e[2] in good_list_tuple:
good.append(e)
else:
bad.append(e)
return good, bad
def cmpthese(n=0, functions=None):
results = {}
for func_name in functions:
args = ['%s(range(256))' % func_name, 'from __main__ import %s' % func_name]
t = Timer(*args)
results[func_name] = 1 / (t.timeit(number=n) / n) # passes/sec
functions_sorted = sorted(functions, key=results.__getitem__)
for f in functions_sorted:
diff = []
for func in functions_sorted:
if func == f:
diff.append("--")
else:
diff.append(f"{results[f]/results[func]*100 - 100:5.0%}")
diffs = " ".join(f'{x:>8s}' for x in diff)
print(f"{f:27s} \t{results[f]:,.0f}/s {diffs}")
if __name__=='__main__':
from timeit import Timer
cmpthese(1000, 'two_lst_compr_Parand if_else_DBR two_lst_tuple_JohnLaRoy zip_Funky filter_BJHomer if_else_1_line_DanSalmo set_1_line set_if_else tuple_if_else set_shorter'.split(" "))