我从这样的输入数据开始

df1 = pandas.DataFrame( { 
    "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , 
    "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"] } )

印刷出来时是这样的:

   City     Name
0   Seattle    Alice
1   Seattle      Bob
2  Portland  Mallory
3   Seattle  Mallory
4   Seattle      Bob
5  Portland  Mallory

分组非常简单:

g1 = df1.groupby( [ "Name", "City"] ).count()

打印产生一个GroupBy对象:

                  City  Name
Name    City
Alice   Seattle      1     1
Bob     Seattle      2     2
Mallory Portland     2     2
        Seattle      1     1

但我最终想要的是另一个DataFrame对象,它包含GroupBy对象中的所有行。换句话说,我想得到以下结果:

                  City  Name
Name    City
Alice   Seattle      1     1
Bob     Seattle      2     2
Mallory Portland     2     2
Mallory Seattle      1     1

我不太清楚如何在pandas文档中实现这一点。欢迎任何提示。


当前回答

简单地说,这应该完成任务:

import pandas as pd

grouped_df = df1.groupby( [ "Name", "City"] )

pd.DataFrame(grouped_df.size().reset_index(name = "Group_Count"))

在这里,grouped_df.size()提取唯一的groupby计数,reset_index()方法重置您希望它是的列的名称。 最后,调用pandas Dataframe()函数来创建一个Dataframe对象。

其他回答

关键是使用reset_index()方法。

Use:

import pandas

df1 = pandas.DataFrame( { 
    "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , 
    "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"] } )

g1 = df1.groupby( [ "Name", "City"] ).count().reset_index()

现在你在g1中有了新的数据框架:

这将以与普通groupby()方法相同的顺序返回序数级/索引。它基本上与@NehalJWani在他的评论中发布的答案相同,但存储在一个变量中,并调用了reset_index()方法。

fare_class = df.groupby(['Satisfaction Rating','Fare Class']).size().to_frame(name = 'Count')
fare_class.reset_index()

这个版本不仅返回相同的百分比数据,这是有用的统计,而且还包括一个lambda函数。

fare_class_percent = df.groupby(['Satisfaction Rating', 'Fare Class']).size().to_frame(name = 'Percentage')
fare_class_percent.transform(lambda x: 100 * x/x.sum()).reset_index()

      Satisfaction Rating      Fare Class  Percentage
0            Dissatisfied        Business   14.624269
1            Dissatisfied         Economy   36.469048
2               Satisfied        Business    5.460425
3               Satisfied         Economy   33.235294

例子:

我发现这对我很有用。

import numpy as np
import pandas as pd

df1 = pd.DataFrame({ 
    "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , 
    "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"]})

df1['City_count'] = 1
df1['Name_count'] = 1

df1.groupby(['Name', 'City'], as_index=False).count()

建议在group_by方法中设置group_keys=False,避免将组键添加到索引中。

例子:

import numpy as np
import pandas as pd

df1 = pd.DataFrame({ 
    "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , 
    "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"]})
df1.groupby(["Name"], group_keys=False)
 grouped=df.groupby(['Team','Year'])['W'].count().reset_index()

 team_wins_df=pd.DataFrame(grouped)
 team_wins_df=team_wins_df.rename({'W':'Wins'},axis=1)
 team_wins_df['Wins']=team_wins_df['Wins'].astype(np.int32)
 team_wins_df.reset_index()
 print(team_wins_df)