我有一个场景,用户想要对Pandas DataFrame或Series对象应用几个过滤器。从本质上讲,我希望有效地将用户在运行时指定的一系列过滤(比较操作)链接在一起。
The filters should be additive (aka each one applied should narrow results). I'm currently using reindex() (as below) but this creates a new object each time and copies the underlying data (if I understand the documentation correctly). I want to avoid this unnecessary copying as it will be really inefficient when filtering a big Series or DataFrame. I'm thinking that using apply(), map(), or something similar might be better. I'm pretty new to Pandas though so still trying to wrap my head around everything. Also, I would like to expand this so that the dictionary passed in can include the columns to operate on and filter an entire DataFrame based on the input dictionary. However, I'm assuming whatever works for a Series can be easily expanded to a DataFrame.
博士TL;
我想采用以下形式的字典,并将每个操作应用到给定的Series对象,并返回一个“筛选”Series对象。
relops = {'>=': [1], '<=': [1]}
长时间的例子
我将从当前的一个示例开始,只过滤一个Series对象。下面是我目前使用的函数:
def apply_relops(series, relops):
"""
Pass dictionary of relational operators to perform on given series object
"""
for op, vals in relops.iteritems():
op_func = ops[op]
for val in vals:
filtered = op_func(series, val)
series = series.reindex(series[filtered])
return series
用户提供了一个包含他们想要执行的操作的字典:
>>> df = pandas.DataFrame({'col1': [0, 1, 2], 'col2': [10, 11, 12]})
>>> print df
>>> print df
col1 col2
0 0 10
1 1 11
2 2 12
>>> from operator import le, ge
>>> ops ={'>=': ge, '<=': le}
>>> apply_relops(df['col1'], {'>=': [1]})
col1
1 1
2 2
Name: col1
>>> apply_relops(df['col1'], relops = {'>=': [1], '<=': [1]})
col1
1 1
Name: col1
同样,我上述方法的“问题”是,我认为中间步骤可能存在大量不必要的数据复制。