DataFrame:

  c_os_family_ss c_os_major_is l_customer_id_i
0      Windows 7                         90418
1      Windows 7                         90418
2      Windows 7                         90418

代码:

print df
for name, group in df.groupby('l_customer_id_i').agg(lambda x: ','.join(x)):
    print name
    print group

我试图只是循环聚合数据,但我得到了错误:

ValueError:解包的值太多

@EdChum,这是预期的输出:

                                                    c_os_family_ss  \
l_customer_id_i
131572           Windows 7,Windows 7,Windows 7,Windows 7,Window...
135467           Windows 7,Windows 7,Windows 7,Windows 7,Window...

                                                     c_os_major_is
l_customer_id_i
131572           ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...
135467           ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...

输出不是问题,我希望遍历每一组。


df.groupby(“l_customer_id_i”)。Agg (lambda x: ','.join(x))已经返回一个数据帧,所以你不能再循环组。

一般来说:

df.groupby(...) returns a GroupBy object (a DataFrameGroupBy or SeriesGroupBy), and with this, you can iterate through the groups (as explained in the docs here). You can do something like: grouped = df.groupby('A') for name, group in grouped: ... When you apply a function on the groupby, in your example df.groupby(...).agg(...) (but this can also be transform, apply, mean, ...), you combine the result of applying the function to the different groups together in one dataframe (the apply and combine step of the 'split-apply-combine' paradigm of groupby). So the result of this will always be again a DataFrame (or a Series depending on the applied function).


如果已经创建了数据帧,则可以遍历索引值。

df = df.groupby('l_customer_id_i').agg(lambda x: ','.join(x))
for name in df.index:
    print name
    print df.loc[name]

这是一个遍历pd的例子。按列表分组的数据帧。对于这个示例,SQL数据库的“create”语句是在For循环中生成的:

import pandas as pd

df1 = pd.DataFrame({
    'atable':     ['Users', 'Users', 'Domains', 'Domains', 'Locks'],
    'column':     ['col_1', 'col_2', 'col_a', 'col_b', 'col'],
    'column_type':['varchar', 'varchar', 'int', 'varchar', 'varchar'],
    'is_null':    ['No', 'No', 'Yes', 'No', 'Yes'],
})

df1_grouped = df1.groupby('atable')

# iterate over each group
for group_name, df_group in df1_grouped:
    print('\nCREATE TABLE {}('.format(group_name))

    for row_index, row in df_group.iterrows():
        col = row['column']
        column_type = row['column_type']
        is_null = 'NOT NULL' if row['is_null'] == 'No' else ''
        print('\t{} {} {},'.format(col, column_type, is_null))

    print(");")