我有一个场景,用户想要对Pandas DataFrame或Series对象应用几个过滤器。从本质上讲,我希望有效地将用户在运行时指定的一系列过滤(比较操作)链接在一起。
The filters should be additive (aka each one applied should narrow results). I'm currently using reindex() (as below) but this creates a new object each time and copies the underlying data (if I understand the documentation correctly). I want to avoid this unnecessary copying as it will be really inefficient when filtering a big Series or DataFrame. I'm thinking that using apply(), map(), or something similar might be better. I'm pretty new to Pandas though so still trying to wrap my head around everything. Also, I would like to expand this so that the dictionary passed in can include the columns to operate on and filter an entire DataFrame based on the input dictionary. However, I'm assuming whatever works for a Series can be easily expanded to a DataFrame.
relops = {'>=': [1], '<=': [1]}
def apply_relops(series, relops):
Pass dictionary of relational operators to perform on given series object
for op, vals in relops.iteritems():
op_func = ops[op]
for val in vals:
filtered = op_func(series, val)
series = series.reindex(series[filtered])
return series
>>> df = pandas.DataFrame({'col1': [0, 1, 2], 'col2': [10, 11, 12]})
>>> print df
>>> print df
col1 col2
0 0 10
1 1 11
2 2 12
>>> from operator import le, ge
>>> ops ={'>=': ge, '<=': le}
>>> apply_relops(df['col1'], {'>=': [1]})
1 1
2 2
Name: col1
>>> apply_relops(df['col1'], relops = {'>=': [1], '<=': [1]})
1 1
Name: col1