我从这样的输入数据开始

df1 = pandas.DataFrame( { 
    "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , 
    "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"] } )

印刷出来时是这样的:

   City     Name
0   Seattle    Alice
1   Seattle      Bob
2  Portland  Mallory
3   Seattle  Mallory
4   Seattle      Bob
5  Portland  Mallory

分组非常简单:

g1 = df1.groupby( [ "Name", "City"] ).count()

打印产生一个GroupBy对象:

                  City  Name
Name    City
Alice   Seattle      1     1
Bob     Seattle      2     2
Mallory Portland     2     2
        Seattle      1     1

但我最终想要的是另一个DataFrame对象,它包含GroupBy对象中的所有行。换句话说,我想得到以下结果:

                  City  Name
Name    City
Alice   Seattle      1     1
Bob     Seattle      2     2
Mallory Portland     2     2
Mallory Seattle      1     1

我不太清楚如何在pandas文档中实现这一点。欢迎任何提示。


当前回答

我已经与Qty明智的数据聚合并存储到dataframe

almo_grp_data = pd.DataFrame({'Qty_cnt' :
almo_slt_models_data.groupby( ['orderDate','Item','State Abv']
          )['Qty'].sum()}).reset_index()

其他回答

我发现这对我很有用。

import numpy as np
import pandas as pd

df1 = pd.DataFrame({ 
    "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , 
    "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"]})

df1['City_count'] = 1
df1['Name_count'] = 1

df1.groupby(['Name', 'City'], as_index=False).count()
 grouped=df.groupby(['Team','Year'])['W'].count().reset_index()

 team_wins_df=pd.DataFrame(grouped)
 team_wins_df=team_wins_df.rename({'W':'Wins'},axis=1)
 team_wins_df['Wins']=team_wins_df['Wins'].astype(np.int32)
 team_wins_df.reset_index()
 print(team_wins_df)

这将以与普通groupby()方法相同的顺序返回序数级/索引。它基本上与@NehalJWani在他的评论中发布的答案相同,但存储在一个变量中,并调用了reset_index()方法。

fare_class = df.groupby(['Satisfaction Rating','Fare Class']).size().to_frame(name = 'Count')
fare_class.reset_index()

这个版本不仅返回相同的百分比数据,这是有用的统计,而且还包括一个lambda函数。

fare_class_percent = df.groupby(['Satisfaction Rating', 'Fare Class']).size().to_frame(name = 'Percentage')
fare_class_percent.transform(lambda x: 100 * x/x.sum()).reset_index()

      Satisfaction Rating      Fare Class  Percentage
0            Dissatisfied        Business   14.624269
1            Dissatisfied         Economy   36.469048
2               Satisfied        Business    5.460425
3               Satisfied         Economy   33.235294

例子:

建议在group_by方法中设置group_keys=False,避免将组键添加到索引中。

例子:

import numpy as np
import pandas as pd

df1 = pd.DataFrame({ 
    "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , 
    "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"]})
df1.groupby(["Name"], group_keys=False)

也许我误解了这个问题,但如果你想将groupby转换回数据帧,你可以使用.to_frame()。当我这样做的时候,我想重置索引,所以我也包括了这一部分。

与问题无关的示例代码

df = df['TIME'].groupby(df['Name']).min()
df = df.to_frame()
df = df.reset_index(level=['Name',"TIME"])