我从这样的输入数据开始

df1 = pandas.DataFrame( { 
    "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , 
    "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"] } )

印刷出来时是这样的:

   City     Name
0   Seattle    Alice
1   Seattle      Bob
2  Portland  Mallory
3   Seattle  Mallory
4   Seattle      Bob
5  Portland  Mallory

分组非常简单:

g1 = df1.groupby( [ "Name", "City"] ).count()

打印产生一个GroupBy对象:

                  City  Name
Name    City
Alice   Seattle      1     1
Bob     Seattle      2     2
Mallory Portland     2     2
        Seattle      1     1

但我最终想要的是另一个DataFrame对象,它包含GroupBy对象中的所有行。换句话说,我想得到以下结果:

                  City  Name
Name    City
Alice   Seattle      1     1
Bob     Seattle      2     2
Mallory Portland     2     2
Mallory Seattle      1     1

我不太清楚如何在pandas文档中实现这一点。欢迎任何提示。


当前回答

我已经与Qty明智的数据聚合并存储到dataframe

almo_grp_data = pd.DataFrame({'Qty_cnt' :
almo_slt_models_data.groupby( ['orderDate','Item','State Abv']
          )['Qty'].sum()}).reset_index()

其他回答

我已经与Qty明智的数据聚合并存储到dataframe

almo_grp_data = pd.DataFrame({'Qty_cnt' :
almo_slt_models_data.groupby( ['orderDate','Item','State Abv']
          )['Qty'].sum()}).reset_index()

建议在group_by方法中设置group_keys=False,避免将组键添加到索引中。

例子:

import numpy as np
import pandas as pd

df1 = pd.DataFrame({ 
    "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , 
    "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"]})
df1.groupby(["Name"], group_keys=False)

下面的解决方案可能更简单:

df1.reset_index().groupby( [ "Name", "City"],as_index=False ).count()

简单地说,这应该完成任务:

import pandas as pd

grouped_df = df1.groupby( [ "Name", "City"] )

pd.DataFrame(grouped_df.size().reset_index(name = "Group_Count"))

在这里,grouped_df.size()提取唯一的groupby计数,reset_index()方法重置您希望它是的列的名称。 最后,调用pandas Dataframe()函数来创建一个Dataframe对象。

 grouped=df.groupby(['Team','Year'])['W'].count().reset_index()

 team_wins_df=pd.DataFrame(grouped)
 team_wins_df=team_wins_df.rename({'W':'Wins'},axis=1)
 team_wins_df['Wins']=team_wins_df['Wins'].astype(np.int32)
 team_wins_df.reset_index()
 print(team_wins_df)