我有以下DataFrame(df):

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(10, 5))

我通过分配添加更多列:

df['mean'] = df.mean(1)

如何将列的意思移到前面,即将其设置为第一列,而其他列的顺序保持不变?


当前回答

这里有一个非常简单的答案(只有一行)。

在将“n”列添加到df中之后,可以执行以下操作。

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(10, 5))
df['mean'] = df.mean(1)
df
           0           1           2           3           4        mean
0   0.929616    0.316376    0.183919    0.204560    0.567725    0.440439
1   0.595545    0.964515    0.653177    0.748907    0.653570    0.723143
2   0.747715    0.961307    0.008388    0.106444    0.298704    0.424512
3   0.656411    0.809813    0.872176    0.964648    0.723685    0.805347
4   0.642475    0.717454    0.467599    0.325585    0.439645    0.518551
5   0.729689    0.994015    0.676874    0.790823    0.170914    0.672463
6   0.026849    0.800370    0.903723    0.024676    0.491747    0.449473
7   0.526255    0.596366    0.051958    0.895090    0.728266    0.559587
8   0.818350    0.500223    0.810189    0.095969    0.218950    0.488736
9   0.258719    0.468106    0.459373    0.709510    0.178053    0.414752


### here you can add below line and it should work 
# Don't forget the two (()) 'brackets' around columns names.Otherwise, it'll give you an error.

df = df[list(('mean',0, 1, 2,3,4))]
df

        mean           0           1           2           3           4
0   0.440439    0.929616    0.316376    0.183919    0.204560    0.567725
1   0.723143    0.595545    0.964515    0.653177    0.748907    0.653570
2   0.424512    0.747715    0.961307    0.008388    0.106444    0.298704
3   0.805347    0.656411    0.809813    0.872176    0.964648    0.723685
4   0.518551    0.642475    0.717454    0.467599    0.325585    0.439645
5   0.672463    0.729689    0.994015    0.676874    0.790823    0.170914
6   0.449473    0.026849    0.800370    0.903723    0.024676    0.491747
7   0.559587    0.526255    0.596366    0.051958    0.895090    0.728266
8   0.488736    0.818350    0.500223    0.810189    0.095969    0.218950
9   0.414752    0.258719    0.468106    0.459373    0.709510    0.178053

其他回答

您可以执行以下操作(从Aman的答案中借用零件):

cols = df.columns.tolist()
cols.insert(0, cols.pop(-1))

cols
>>>['mean', 0L, 1L, 2L, 3L, 4L]

df = df[cols]

我有一个在panda中重新排序列名的非常具体的用例。有时我在基于现有列的数据帧中创建一个新列。默认情况下,panda将在末尾插入我的新列,但我希望新列插入到它派生的现有列旁边。

def rearrange_list(input_list, input_item_to_move, input_item_insert_here):
    '''
    Helper function to re-arrange the order of items in a list.
    Useful for moving column in pandas dataframe.

    Inputs:
        input_list - list
        input_item_to_move - item in list to move
        input_item_insert_here - item in list, insert before 

    returns:
        output_list
    '''
    # make copy for output, make sure it's a list
    output_list = list(input_list)

    # index of item to move
    idx_move = output_list.index(input_item_to_move)

    # pop off the item to move
    itm_move = output_list.pop(idx_move)

    # index of item to insert here
    idx_insert = output_list.index(input_item_insert_here)

    # insert item to move into here
    output_list.insert(idx_insert, itm_move)

    return output_list


import pandas as pd

# step 1: create sample dataframe
df = pd.DataFrame({
    'motorcycle': ['motorcycle1', 'motorcycle2', 'motorcycle3'],
    'initial_odometer': [101, 500, 322],
    'final_odometer': [201, 515, 463],
    'other_col_1': ['blah', 'blah', 'blah'],
    'other_col_2': ['blah', 'blah', 'blah']
})
print('Step 1: create sample dataframe')
display(df)
print()

# step 2: add new column that is difference between final and initial
df['change_odometer'] = df['final_odometer']-df['initial_odometer']
print('Step 2: add new column')
display(df)
print()

# step 3: rearrange columns
ls_cols = df.columns
ls_cols = rearrange_list(ls_cols, 'change_odometer', 'final_odometer')
df=df[ls_cols]
print('Step 3: rearrange columns')
display(df)

此函数避免了您只需列出数据集中的每个变量来对其中的几个变量进行排序。

def order(frame,var):
    if type(var) is str:
        var = [var] #let the command take a string or list
    varlist =[w for w in frame.columns if w not in var]
    frame = frame[var+varlist]
    return frame 

它需要两个参数,第一个是数据集,第二个是要放到前面的数据集中的列。

所以在我的例子中,我有一个名为Frame的数据集,其中包含变量A1、A2、B1、B2、Total和Date。如果我想把道达尔带到前面,那么我所要做的就是:

frame = order(frame,['Total'])

如果我想将Total和Date带到前台,那么我会:

frame = order(frame,['Total','Date'])

编辑:

另一种有用的使用方法是,如果您有一个不熟悉的表,并且正在查找其中包含特定术语的变量,例如VAR1、VAR2,。。。您可以执行以下操作:

frame = order(frame,[v for v in frame.columns if "VAR" in v])

您可以使用可用于两个轴的重新索引:

df
#           0         1         2         3         4      mean
# 0  0.943825  0.202490  0.071908  0.452985  0.678397  0.469921
# 1  0.745569  0.103029  0.268984  0.663710  0.037813  0.363821
# 2  0.693016  0.621525  0.031589  0.956703  0.118434  0.484254
# 3  0.284922  0.527293  0.791596  0.243768  0.629102  0.495336
# 4  0.354870  0.113014  0.326395  0.656415  0.172445  0.324628
# 5  0.815584  0.532382  0.195437  0.829670  0.019001  0.478415
# 6  0.944587  0.068690  0.811771  0.006846  0.698785  0.506136
# 7  0.595077  0.437571  0.023520  0.772187  0.862554  0.538182
# 8  0.700771  0.413958  0.097996  0.355228  0.656919  0.444974
# 9  0.263138  0.906283  0.121386  0.624336  0.859904  0.555009

df.reindex(['mean', *range(5)], axis=1)

#        mean         0         1         2         3         4
# 0  0.469921  0.943825  0.202490  0.071908  0.452985  0.678397
# 1  0.363821  0.745569  0.103029  0.268984  0.663710  0.037813
# 2  0.484254  0.693016  0.621525  0.031589  0.956703  0.118434
# 3  0.495336  0.284922  0.527293  0.791596  0.243768  0.629102
# 4  0.324628  0.354870  0.113014  0.326395  0.656415  0.172445
# 5  0.478415  0.815584  0.532382  0.195437  0.829670  0.019001
# 6  0.506136  0.944587  0.068690  0.811771  0.006846  0.698785
# 7  0.538182  0.595077  0.437571  0.023520  0.772187  0.862554
# 8  0.444974  0.700771  0.413958  0.097996  0.355228  0.656919
# 9  0.555009  0.263138  0.906283  0.121386  0.624336  0.859904

在您的情况下,

df = df.reindex(columns=['mean',0,1,2,3,4])

会做你想做的事。

在我的情况下(一般形式):

df = df.reindex(columns=sorted(df.columns))
df = df.reindex(columns=(['opened'] + list([a for a in df.columns if a != 'opened']) ))