什么是主元? 我如何旋转? 长幅转宽幅?



如何在熊猫中透视一个数据框架?-很好的问题和回答。但答案只是回答了具体的问题,几乎没有解释。 pandas数据透视表到数据帧- OP关心的是透视的输出,即列的样子。OP想让它看起来像r,这对熊猫用户没有多大帮助。 另一个不错的问题,但答案集中在一个方法,即pd.DataFrame.pivot



import numpy as np
import pandas as pd
from numpy.core.defchararray import add

n = 20

cols = np.array(['key', 'row', 'item', 'col'])
arr1 = (np.random.randint(5, size=(n, 4)) // [2, 1, 2, 1]).astype(str)

df = pd.DataFrame(
    add(cols, arr1), columns=cols
    pd.DataFrame(np.random.rand(n, 2).round(2)).add_prefix('val')
     key   row   item   col  val0  val1
0   key0  row3  item1  col3  0.81  0.04
1   key1  row2  item1  col2  0.44  0.07
2   key1  row0  item1  col0  0.77  0.01
3   key0  row4  item0  col2  0.15  0.59
4   key1  row0  item2  col1  0.81  0.64
5   key1  row2  item2  col4  0.13  0.88
6   key2  row4  item1  col3  0.88  0.39
7   key1  row4  item1  col1  0.10  0.07
8   key1  row0  item2  col4  0.65  0.02
9   key1  row2  item0  col2  0.35  0.61
10  key2  row0  item2  col1  0.40  0.85
11  key2  row4  item1  col2  0.64  0.25
12  key0  row2  item2  col3  0.50  0.44
13  key0  row4  item1  col4  0.24  0.46
14  key1  row3  item2  col3  0.28  0.11
15  key0  row3  item1  col1  0.31  0.23
16  key0  row0  item2  col3  0.86  0.01
17  key0  row4  item0  col3  0.64  0.21
18  key2  row2  item2  col0  0.13  0.45
19  key0  row2  item0  col4  0.37  0.70


Why do I get ValueError: Index contains duplicate entries, cannot reshape? How do I pivot df such that the col values are columns, row values are the index, and mean of val0 are the values? col col0 col1 col2 col3 col4 row row0 0.77 0.605 NaN 0.860 0.65 row2 0.13 NaN 0.395 0.500 0.25 row3 NaN 0.310 NaN 0.545 NaN row4 NaN 0.100 0.395 0.760 0.24 How do I make it so that missing values are 0? col col0 col1 col2 col3 col4 row row0 0.77 0.605 0.000 0.860 0.65 row2 0.13 0.000 0.395 0.500 0.25 row3 0.00 0.310 0.000 0.545 0.00 row4 0.00 0.100 0.395 0.760 0.24 Can I get something other than mean, like maybe sum? col col0 col1 col2 col3 col4 row row0 0.77 1.21 0.00 0.86 0.65 row2 0.13 0.00 0.79 0.50 0.50 row3 0.00 0.31 0.00 1.09 0.00 row4 0.00 0.10 0.79 1.52 0.24 Can I do more that one aggregation at a time? sum mean col col0 col1 col2 col3 col4 col0 col1 col2 col3 col4 row row0 0.77 1.21 0.00 0.86 0.65 0.77 0.605 0.000 0.860 0.65 row2 0.13 0.00 0.79 0.50 0.50 0.13 0.000 0.395 0.500 0.25 row3 0.00 0.31 0.00 1.09 0.00 0.00 0.310 0.000 0.545 0.00 row4 0.00 0.10 0.79 1.52 0.24 0.00 0.100 0.395 0.760 0.24 Can I aggregate over multiple value columns? val0 val1 col col0 col1 col2 col3 col4 col0 col1 col2 col3 col4 row row0 0.77 0.605 0.000 0.860 0.65 0.01 0.745 0.00 0.010 0.02 row2 0.13 0.000 0.395 0.500 0.25 0.45 0.000 0.34 0.440 0.79 row3 0.00 0.310 0.000 0.545 0.00 0.00 0.230 0.00 0.075 0.00 row4 0.00 0.100 0.395 0.760 0.24 0.00 0.070 0.42 0.300 0.46 Can I subdivide by multiple columns? item item0 item1 item2 col col2 col3 col4 col0 col1 col2 col3 col4 col0 col1 col3 col4 row row0 0.00 0.00 0.00 0.77 0.00 0.00 0.00 0.00 0.00 0.605 0.86 0.65 row2 0.35 0.00 0.37 0.00 0.00 0.44 0.00 0.00 0.13 0.000 0.50 0.13 row3 0.00 0.00 0.00 0.00 0.31 0.00 0.81 0.00 0.00 0.000 0.28 0.00 row4 0.15 0.64 0.00 0.00 0.10 0.64 0.88 0.24 0.00 0.000 0.00 0.00 Or item item0 item1 item2 col col2 col3 col4 col0 col1 col2 col3 col4 col0 col1 col3 col4 key row key0 row0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.86 0.00 row2 0.00 0.00 0.37 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.50 0.00 row3 0.00 0.00 0.00 0.00 0.31 0.00 0.81 0.00 0.00 0.00 0.00 0.00 row4 0.15 0.64 0.00 0.00 0.00 0.00 0.00 0.24 0.00 0.00 0.00 0.00 key1 row0 0.00 0.00 0.00 0.77 0.00 0.00 0.00 0.00 0.00 0.81 0.00 0.65 row2 0.35 0.00 0.00 0.00 0.00 0.44 0.00 0.00 0.00 0.00 0.00 0.13 row3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.28 0.00 row4 0.00 0.00 0.00 0.00 0.10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 key2 row0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.40 0.00 0.00 row2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.13 0.00 0.00 0.00 row4 0.00 0.00 0.00 0.00 0.00 0.64 0.88 0.00 0.00 0.00 0.00 0.00 Can I aggregate the frequency in which the column and rows occur together, aka "cross tabulation"? col col0 col1 col2 col3 col4 row row0 1 2 0 1 1 row2 1 0 2 1 2 row3 0 1 0 2 0 row4 0 1 2 2 1 How do I convert a DataFrame from long to wide by pivoting on ONLY two columns? Given, np.random.seed([3, 1415]) df2 = pd.DataFrame({'A': list('aaaabbbc'), 'B': np.random.choice(15, 8)}) df2 A B 0 a 0 1 a 11 2 a 2 3 a 11 4 b 10 5 b 10 6 b 14 7 c 7 The expected should look something like a b c 0 0.0 10.0 7.0 1 11.0 10.0 NaN 2 2.0 14.0 NaN 3 11.0 NaN NaN How do I flatten the multiple index to single index after pivot? From 1 2 1 1 2 a 2 1 1 b 2 1 0 c 1 0 0 To 1|1 2|1 2|2 a 2 1 1 b 2 1 0 c 1 0 0







未堆叠聚合(即使groupby的结果。gg。) 重塑(类似于Excel中的pivot,在numpy中重塑或在R中pivot_wider)

1. 聚合


数据透视表= groupby + unstack(阅读这里了解更多信息) 交叉表=数据透视表


# equivalently,
df.pivot_table(vals, rows, cols, aggfuncs)

1.1. Crosstab是pivot_table的特殊情况;因此,groupby + unstack


pd。crosstab (df[“可乐”,df[‘colB]) df。标签=“可乐”、标签=“科尔”、aggfunc=“尺寸”、文件价值=0) df。groupby([可乐’,‘colB)大小()。unstack (fill_value = 0)

注意pd。Crosstab的开销要大得多,所以它比pivot_table和groupby + unstack慢得多。事实上,正如这里提到的,pivot_table也比groupby + unstack慢。

2. 重塑


# equivalently, 
df.pivot(rows, cols, vals)

2.1. 如问题10所示增加行/列


"long-to-long": reshape by augmenting the indices Code: df = pd.DataFrame({'A': [1, 1, 1, 2, 2, 2], 'B': [*'xxyyzz'], 'C': [*'CCDCDD'], 'E': [100, 200, 300, 400, 500, 600]}) rows, cols, vals = ['A', 'B'], ['C'], 'E' # using pivot syntax df1 = ( df.assign(ix=df.groupby(rows+cols).cumcount()) .pivot([*rows, 'ix'], cols, vals) .fillna(0, downcast='infer') .droplevel(-1).reset_index().rename_axis(columns=None) ) # equivalently, using set_index + unstack syntax df1 = ( df .set_index([*rows, df.groupby(rows+cols).cumcount(), *cols])[vals] .unstack(fill_value=0) .droplevel(-1).reset_index().rename_axis(columns=None) ) "long-to-wide": reshape by augmenting the columns Code: df1 = ( df.assign(ix=df.groupby(rows+cols).cumcount()) .pivot(rows, [*cols, 'ix'])[vals] .fillna(0, downcast='infer') ) df1 = df1.set_axis([f"{c[0]}_{c[1]}" for c in df1], axis=1).reset_index() # equivalently, using the set_index + unstack syntax df1 = ( df .set_index([*rows, df.groupby(rows+cols).cumcount(), *cols])[vals] .unstack([-1, *range(-2, -len(cols)-2, -1)], fill_value=0) ) df1 = df1.set_axis([f"{c[0]}_{c[1]}" for c in df1], axis=1).reset_index() minimum case using the set_index + unstack syntax: Code: df1 = df.set_index(['A', df.groupby('A').cumcount()])['E'].unstack(fill_value=0).add_prefix('Col').reset_index()

1 pivot_table() aggregates the values and unstacks it. Specifically, it creates a single flat list out of index and columns, calls groupby() with this list as the grouper and aggregates using the passed aggregator methods (the default is mean). Then after aggregation, it calls unstack() by the list of columns. So internally, pivot_table = groupby + unstack. Moreover, if fill_value is passed, fillna() is called. In other words, the method that produces pv_1 is the same as the method that produces gb_1 in the example below. pv_1 = df.pivot_table(index=rows, columns=cols, values=vals, aggfunc=aggfuncs, fill_value=0) # internal operation of `pivot_table()` gb_1 = df.groupby(rows+cols)[vals].agg(aggfuncs).unstack(cols).fillna(0, downcast="infer") pv_1.equals(gb_1) # True 2 crosstab() calls pivot_table(), i.e., crosstab = pivot_table. Specifically, it builds a DataFrame out of the passed arrays of values, filters it by the common indices and calls pivot_table(). It's more limited than pivot_table() because it only allows a one-dimensional array-like as values, unlike pivot_table() that can have multiple columns as values.





d = data = {'A': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 3, 6: 5},
 'B': {0: 'a', 1: 'b', 2: 'c', 3: 'a', 4: 'b', 5: 'a', 6: 'c'}}
df = pd.DataFrame(d)

   A  B
0  1  a
1  1  b
2  1  c
3  2  a
4  2  b
5  3  a
6  5  c


   0     1     2
1  a     b     c
2  a     b  None
3  a  None  None
5  c  None  None


t = df.groupby('A')['B'].apply(list)
out = pd.DataFrame(t.tolist(),index=t.index)
   0     1     2
1  a     b     c
2  a     b  None
3  a  None  None
5  c  None  None

或 使用pd是更好的选择。使用df.squeeze的数据透视表。

t = df.pivot_table(index='A',values='B',aggfunc=list).squeeze()
out = pd.DataFrame(t.tolist(),index=t.index)


pd.DataFrame.pivot_table A glorified version of groupby with more intuitive API. For many people, this is the preferred approach. And it is the intended approach by the developers. Specify row level, column levels, values to be aggregated, and function(s) to perform aggregations. pd.DataFrame.groupby + pd.DataFrame.unstack Good general approach for doing just about any type of pivot You specify all columns that will constitute the pivoted row levels and column levels in one group by. You follow that by selecting the remaining columns you want to aggregate and the function(s) you want to perform the aggregation. Finally, you unstack the levels that you want to be in the column index. pd.DataFrame.set_index + pd.DataFrame.unstack Convenient and intuitive for some (myself included). Cannot handle duplicate grouped keys. Similar to the groupby paradigm, we specify all columns that will eventually be either row or column levels and set those to be the index. We then unstack the levels we want in the columns. If either the remaining index levels or column levels are not unique, this method will fail. pd.DataFrame.pivot Very similar to set_index in that it shares the duplicate key limitation. The API is very limited as well. It only takes scalar values for index, columns, values. Similar to the pivot_table method in that we select rows, columns, and values on which to pivot. However, we cannot aggregate and if either rows or columns are not unique, this method will fail. pd.crosstab This a specialized version of pivot_table and in its purest form is the most intuitive way to perform several tasks. pd.factorize + np.bincount This is a highly advanced technique that is very obscure but is very fast. It cannot be used in all circumstances, but when it can be used and you are comfortable using it, you will reap the performance rewards. pd.get_dummies + pd.DataFrame.dot I use this for cleverly performing cross tabulation.


重塑和透视表- pandas用户指南




df.duplicated(['row', 'col']).any()



df.pivot(index='row', columns='col', values='val0')


df.set_index(['row', 'col'])['val0'].unstack()





pd.DataFrame.pivot_table df.pivot_table ( Values ='val0', index='row', columns='col', aggfunc =“的意思是”) colcol0 col1 col2 col3 col4 行 row0 0.77 0.605 NaN 0.860 0.65 第二行0.13 NaN 0.395 0.500 0.25 第三行NaN 0.310 NaN 0.545 NaN row4 NaN 0.100 0.395 0.760 0.24 aggfunc='mean'是默认值,我不需要设置它。我把它写出来是为了更明确。


pd.DataFrame.pivot_table fill_value is not set by default. I tend to set it appropriately. In this case I set it to 0. df.pivot_table( values='val0', index='row', columns='col', fill_value=0, aggfunc='mean') col col0 col1 col2 col3 col4 row row0 0.77 0.605 0.000 0.860 0.65 row2 0.13 0.000 0.395 0.500 0.25 row3 0.00 0.310 0.000 0.545 0.00 row4 0.00 0.100 0.395 0.760 0.24 pd.DataFrame.groupby df.groupby(['row', 'col'])['val0'].mean().unstack(fill_value=0) pd.crosstab pd.crosstab( index=df['row'], columns=df['col'], values=df['val0'], aggfunc='mean').fillna(0)



pd.DataFrame.pivot_table df.pivot_table( values='val0', index='row', columns='col', fill_value=0, aggfunc='sum') col col0 col1 col2 col3 col4 row row0 0.77 1.21 0.00 0.86 0.65 row2 0.13 0.00 0.79 0.50 0.50 row3 0.00 0.31 0.00 1.09 0.00 row4 0.00 0.10 0.79 1.52 0.24 pd.DataFrame.groupby df.groupby(['row', 'col'])['val0'].sum().unstack(fill_value=0) pd.crosstab pd.crosstab( index=df['row'], columns=df['col'], values=df['val0'], aggfunc='sum').fillna(0)




pd.DataFrame.pivot_table df.pivot_table( values='val0', index='row', columns='col', fill_value=0, aggfunc=[np.size, np.mean]) size mean col col0 col1 col2 col3 col4 col0 col1 col2 col3 col4 row row0 1 2 0 1 1 0.77 0.605 0.000 0.860 0.65 row2 1 0 2 1 2 0.13 0.000 0.395 0.500 0.25 row3 0 1 0 2 0 0.00 0.310 0.000 0.545 0.00 row4 0 1 2 2 1 0.00 0.100 0.395 0.760 0.24 pd.DataFrame.groupby df.groupby(['row', 'col'])['val0'].agg(['size', 'mean']).unstack(fill_value=0) pd.crosstab pd.crosstab( index=df['row'], columns=df['col'], values=df['val0'], aggfunc=[np.size, np.mean]).fillna(0, downcast='infer')



pd.DataFrame.pivot_table we pass values=['val0', 'val1'] but we could've left that off completely df.pivot_table( values=['val0', 'val1'], index='row', columns='col', fill_value=0, aggfunc='mean') val0 val1 col col0 col1 col2 col3 col4 col0 col1 col2 col3 col4 row row0 0.77 0.605 0.000 0.860 0.65 0.01 0.745 0.00 0.010 0.02 row2 0.13 0.000 0.395 0.500 0.25 0.45 0.000 0.34 0.440 0.79 row3 0.00 0.310 0.000 0.545 0.00 0.00 0.230 0.00 0.075 0.00 row4 0.00 0.100 0.395 0.760 0.24 0.00 0.070 0.42 0.300 0.46 pd.DataFrame.groupby df.groupby(['row', 'col'])['val0', 'val1'].mean().unstack(fill_value=0)



pd.DataFrame.pivot_table df.pivot_table( values='val0', index='row', columns=['item', 'col'], fill_value=0, aggfunc='mean') item item0 item1 item2 col col2 col3 col4 col0 col1 col2 col3 col4 col0 col1 col3 col4 row row0 0.00 0.00 0.00 0.77 0.00 0.00 0.00 0.00 0.00 0.605 0.86 0.65 row2 0.35 0.00 0.37 0.00 0.00 0.44 0.00 0.00 0.13 0.000 0.50 0.13 row3 0.00 0.00 0.00 0.00 0.31 0.00 0.81 0.00 0.00 0.000 0.28 0.00 row4 0.15 0.64 0.00 0.00 0.10 0.64 0.88 0.24 0.00 0.000 0.00 0.00 pd.DataFrame.groupby df.groupby( ['row', 'item', 'col'] )['val0'].mean().unstack(['item', 'col']).fillna(0).sort_index(1)



pd.DataFrame.pivot_table df.pivot_table( values='val0', index=['key', 'row'], columns=['item', 'col'], fill_value=0, aggfunc='mean') item item0 item1 item2 col col2 col3 col4 col0 col1 col2 col3 col4 col0 col1 col3 col4 key row key0 row0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.86 0.00 row2 0.00 0.00 0.37 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.50 0.00 row3 0.00 0.00 0.00 0.00 0.31 0.00 0.81 0.00 0.00 0.00 0.00 0.00 row4 0.15 0.64 0.00 0.00 0.00 0.00 0.00 0.24 0.00 0.00 0.00 0.00 key1 row0 0.00 0.00 0.00 0.77 0.00 0.00 0.00 0.00 0.00 0.81 0.00 0.65 row2 0.35 0.00 0.00 0.00 0.00 0.44 0.00 0.00 0.00 0.00 0.00 0.13 row3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.28 0.00 row4 0.00 0.00 0.00 0.00 0.10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 key2 row0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.40 0.00 0.00 row2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.13 0.00 0.00 0.00 row4 0.00 0.00 0.00 0.00 0.00 0.64 0.88 0.00 0.00 0.00 0.00 0.00 pd.DataFrame.groupby df.groupby( ['key', 'row', 'item', 'col'] )['val0'].mean().unstack(['item', 'col']).fillna(0).sort_index(1) pd.DataFrame.set_index because the set of keys are unique for both rows and columns df.set_index( ['key', 'row', 'item', 'col'] )['val0'].unstack(['item', 'col']).fillna(0).sort_index(1)



pd.DataFrame.pivot_table df.pivot_table(index='row', columns='col', fill_value=0, aggfunc='size') col col0 col1 col2 col3 col4 row row0 1 2 0 1 1 row2 1 0 2 1 2 row3 0 1 0 2 0 row4 0 1 2 2 1 pd.DataFrame.groupby df.groupby(['row', 'col'])['val0'].size().unstack(fill_value=0) pd.crosstab pd.crosstab(df['row'], df['col']) pd.factorize + np.bincount # get integer factorization `i` and unique values `r` # for column `'row'` i, r = pd.factorize(df['row'].values) # get integer factorization `j` and unique values `c` # for column `'col'` j, c = pd.factorize(df['col'].values) # `n` will be the number of rows # `m` will be the number of columns n, m = r.size, c.size # `i * m + j` is a clever way of counting the # factorization bins assuming a flat array of length # `n * m`. Which is why we subsequently reshape as `(n, m)` b = np.bincount(i * m + j, minlength=n * m).reshape(n, m) # BTW, whenever I read this, I think 'Bean, Rice, and Cheese' pd.DataFrame(b, r, c) col3 col2 col0 col1 col4 row3 2 0 0 1 0 row2 1 2 1 0 2 row0 1 0 1 2 1 row4 2 2 0 1 1 pd.get_dummies pd.get_dummies(df['row']).T.dot(pd.get_dummies(df['col'])) col0 col1 col2 col3 col4 row0 1 2 0 1 1 row2 1 0 2 1 2 row3 0 1 0 2 0 row4 0 1 2 2 1


我如何转换一个数据帧从长到宽的枢轴上只有两个 列?

DataFrame.pivot The first step is to assign a number to each row - this number will be the row index of that value in the pivoted result. This is done using GroupBy.cumcount: df2.insert(0, 'count', df2.groupby('A').cumcount()) df2 count A B 0 0 a 0 1 1 a 11 2 2 a 2 3 3 a 11 4 0 b 10 5 1 b 10 6 2 b 14 7 0 c 7 The second step is to use the newly created column as the index to call DataFrame.pivot. df2.pivot(*df2) # df2.pivot(index='count', columns='A', values='B') A a b c count 0 0.0 10.0 7.0 1 11.0 10.0 NaN 2 2.0 14.0 NaN 3 11.0 NaN NaN DataFrame.pivot_table Whereas DataFrame.pivot only accepts columns, DataFrame.pivot_table also accepts arrays, so the GroupBy.cumcount can be passed directly as the index without creating an explicit column. df2.pivot_table(index=df2.groupby('A').cumcount(), columns='A', values='B') A a b c 0 0.0 10.0 7.0 1 11.0 10.0 NaN 2 2.0 14.0 NaN 3 11.0 NaN NaN




df.columns = df.columns.map('|'.join)


df.columns = df.columns.map('{0[0]}|{0[1]}'.format)






未堆叠聚合(即使groupby的结果。gg。) 重塑(类似于Excel中的pivot,在numpy中重塑或在R中pivot_wider)

1. 聚合


数据透视表= groupby + unstack(阅读这里了解更多信息) 交叉表=数据透视表


# equivalently,
df.pivot_table(vals, rows, cols, aggfuncs)

1.1. Crosstab是pivot_table的特殊情况;因此,groupby + unstack


pd。crosstab (df[“可乐”,df[‘colB]) df。标签=“可乐”、标签=“科尔”、aggfunc=“尺寸”、文件价值=0) df。groupby([可乐’,‘colB)大小()。unstack (fill_value = 0)

注意pd。Crosstab的开销要大得多,所以它比pivot_table和groupby + unstack慢得多。事实上,正如这里提到的,pivot_table也比groupby + unstack慢。

2. 重塑


# equivalently, 
df.pivot(rows, cols, vals)

2.1. 如问题10所示增加行/列


"long-to-long": reshape by augmenting the indices Code: df = pd.DataFrame({'A': [1, 1, 1, 2, 2, 2], 'B': [*'xxyyzz'], 'C': [*'CCDCDD'], 'E': [100, 200, 300, 400, 500, 600]}) rows, cols, vals = ['A', 'B'], ['C'], 'E' # using pivot syntax df1 = ( df.assign(ix=df.groupby(rows+cols).cumcount()) .pivot([*rows, 'ix'], cols, vals) .fillna(0, downcast='infer') .droplevel(-1).reset_index().rename_axis(columns=None) ) # equivalently, using set_index + unstack syntax df1 = ( df .set_index([*rows, df.groupby(rows+cols).cumcount(), *cols])[vals] .unstack(fill_value=0) .droplevel(-1).reset_index().rename_axis(columns=None) ) "long-to-wide": reshape by augmenting the columns Code: df1 = ( df.assign(ix=df.groupby(rows+cols).cumcount()) .pivot(rows, [*cols, 'ix'])[vals] .fillna(0, downcast='infer') ) df1 = df1.set_axis([f"{c[0]}_{c[1]}" for c in df1], axis=1).reset_index() # equivalently, using the set_index + unstack syntax df1 = ( df .set_index([*rows, df.groupby(rows+cols).cumcount(), *cols])[vals] .unstack([-1, *range(-2, -len(cols)-2, -1)], fill_value=0) ) df1 = df1.set_axis([f"{c[0]}_{c[1]}" for c in df1], axis=1).reset_index() minimum case using the set_index + unstack syntax: Code: df1 = df.set_index(['A', df.groupby('A').cumcount()])['E'].unstack(fill_value=0).add_prefix('Col').reset_index()

1 pivot_table() aggregates the values and unstacks it. Specifically, it creates a single flat list out of index and columns, calls groupby() with this list as the grouper and aggregates using the passed aggregator methods (the default is mean). Then after aggregation, it calls unstack() by the list of columns. So internally, pivot_table = groupby + unstack. Moreover, if fill_value is passed, fillna() is called. In other words, the method that produces pv_1 is the same as the method that produces gb_1 in the example below. pv_1 = df.pivot_table(index=rows, columns=cols, values=vals, aggfunc=aggfuncs, fill_value=0) # internal operation of `pivot_table()` gb_1 = df.groupby(rows+cols)[vals].agg(aggfuncs).unstack(cols).fillna(0, downcast="infer") pv_1.equals(gb_1) # True 2 crosstab() calls pivot_table(), i.e., crosstab = pivot_table. Specifically, it builds a DataFrame out of the passed arrays of values, filters it by the common indices and calls pivot_table(). It's more limited than pivot_table() because it only allows a one-dimensional array-like as values, unlike pivot_table() that can have multiple columns as values.

为了更好地理解函数枢轴是如何工作的,您可以查看Pandas文档中的示例。然而,如果你有重复的索引-列(foo-bar)组合(就像第二个例子中的df), pivot将失败:






pivot_df = pd.pivot(df, index =['Date'], columns ='Country', values =['NewConfirmed'])
## renaming the columns  
pivot_df.columns = df['Country'].sort_values().unique()



Pivot_df = Pivot_df .reset_index()