文档展示了如何在一个groupby对象上同时应用多个函数,使用输出列名作为键的dict:
In [563]: grouped['D'].agg({'result1' : np.sum,
.....: 'result2' : np.mean})
.....:
Out[563]:
result2 result1
A
bar -0.579846 -1.739537
foo -0.280588 -1.402938
但是,这只适用于Series groupby对象。当dict类似地通过DataFrame传递给一个组时,它期望键是函数将应用到的列名。
What I want to do is apply multiple functions to several columns (but certain columns will be operated on multiple times). Also, some functions will depend on other columns in the groupby object (like sumif functions). My current solution is to go column by column, and doing something like the code above, using lambdas for functions that depend on other rows. But this is taking a long time, (I think it takes a long time to iterate through a groupby object). I'll have to change it so that I iterate through the whole groupby object in a single run, but I'm wondering if there's a built in way in pandas to do this somewhat cleanly.
例如,我曾经尝试过
grouped.agg({'C_sum' : lambda x: x['C'].sum(),
'C_std': lambda x: x['C'].std(),
'D_sum' : lambda x: x['D'].sum()},
'D_sumifC3': lambda x: x['D'][x['C'] == 3].sum(), ...)
但正如预期的那样,我得到一个KeyError(因为键必须是一列,如果agg从一个DataFrame调用)。
是否有任何内置的方式来做我想做的事情,或者这种功能可能会被添加,或者我只需要手动遍历组?