我有一个这样的数据帧:

print(df)

        0          1     2
0   354.7      April   4.0
1    55.4     August   8.0
2   176.5   December  12.0
3    95.5   February   2.0
4    85.6    January   1.0
5     152       July   7.0
6   238.7       June   6.0
7   104.8      March   3.0
8   283.5        May   5.0
9   278.8   November  11.0
10  249.6    October  10.0
11  212.7  September   9.0

如您所见,月份不是按日历顺序排列的。因此,我创建了第二列来获取每个月对应的月份号(1-12)。在此基础上,如何根据日历月份的顺序对数据帧进行排序?


当前回答

这个对我很有用:

df=df.sort_values(by=[2])

而:

df=df.sort_values(by=['2']) 

不起作用。

其他回答

Panda的sort_values完成了这项工作。

可以传递各种各样的参数,比如升序(bool或bool列表):

升序排序和降序排序。为多个排序顺序指定列表。如果这是一个bool列表,则必须匹配by的长度。

由于默认值是升序的,而OP的目标是升序排序,因此不需要指定该参数(参见下面最后一个注释,了解解决降序的方法),因此可以使用以下方法之一:

Performing the operation in-place, and keeping the same variable name. This requires one to pass inplace=True as follows: df.sort_values(by=['2'], inplace=True) # or df.sort_values(by = '2', inplace = True) # or df.sort_values('2', inplace = True) If doing the operation in-place is not a requirement, one can assign the change (sort) to a variable: With the same name of the original dataframe, df as df = df.sort_values(by=['2']) With a different name, such as df_new, as df_new = df.sort_values(by=['2'])

前面的所有操作都将给出以下输出

        0          1     2
4    85.6    January   1.0
3    95.5   February   2.0
7   104.8      March   3.0
0   354.7      April   4.0
8   283.5        May   5.0
6   238.7       June   6.0
5     152       July   7.0
1    55.4     August   8.0
11  212.7  September   9.0
10  249.6    October  10.0
9   278.8   November  11.0
2   176.5   December  12.0

最后,可以使用pandas.DataFrame重置索引。Reset_index,以获得以下内容

df.reset_index(drop = True, inplace = True)

# or

df = df.reset_index(drop = True)

[Out]:

        0          1     2
0    85.6    January   1.0
1    95.5   February   2.0
2   104.8      March   3.0
3   354.7      April   4.0
4   283.5        May   5.0
5   238.7       June   6.0
6     152       July   7.0
7    55.4     August   8.0
8   212.7  September   9.0
9   249.6    October  10.0
10  278.8   November  11.0
11  176.5   December  12.0

按升序排序并重置索引的一行程序如下所示

df = df.sort_values(by=['2']).reset_index(drop = True)

[Out]:

        0          1     2
0    85.6    January   1.0
1    95.5   February   2.0
2   104.8      March   3.0
3   354.7      April   4.0
4   283.5        May   5.0
5   238.7       June   6.0
6     152       July   7.0
7    55.4     August   8.0
8   212.7  September   9.0
9   249.6    October  10.0
10  278.8   November  11.0
11  176.5   December  12.0

注:

If one is not doing the operation in-place, forgetting the steps mentioned above may lead one (as this user) to not be able to get the expected result. There are strong opinions on using inplace. For that, one might want to read this. One is assuming that the column 2 is not a string. If it is, one will have to convert it: Using pandas.to_numeric df['2'] = pd.to_numeric(df['2']) Using pandas.Series.astype df['2'] = df['2'].astype(float) If one wants in descending order, one needs to pass ascending=False as df = df.sort_values(by=['2'], ascending=False) # or df.sort_values(by = '2', ascending=False, inplace=True) [Out]: 0 1 2 2 176.5 December 12.0 9 278.8 November 11.0 10 249.6 October 10.0 11 212.7 September 9.0 1 55.4 August 8.0 5 152 July 7.0 6 238.7 June 6.0 8 283.5 May 5.0 0 354.7 April 4.0 7 104.8 March 3.0 3 95.5 February 2.0 4 85.6 January 1.0

下面是根据pandas文档的sort_values模板。

DataFrame.sort_values(by, axis=0,
                          ascending=True,
                          inplace=False,
                          kind='quicksort',
                          na_position='last',
                          ignore_index=False, key=None)[source]

在这种情况下,它是这样的。

df.sort_values (= [' 2 '])

API参考pandas.DataFrame.sort_values

这个对我很有用:

df=df.sort_values(by=[2])

而:

df=df.sort_values(by=['2']) 

不起作用。

作为另一种解决方案:

而不是创建第二列,你可以对你的字符串数据(月份名)进行分类,并像这样排序:

df.rename(columns={1:'month'},inplace=True)
df['month'] = pd.Categorical(df['month'],categories=['December','November','October','September','August','July','June','May','April','March','February','January'],ordered=True)
df = df.sort_values('month',ascending=False)

它将按照您在创建Categorical对象时指定的月份名称为您提供有序数据。

如果您想动态排序列,而不是按字母顺序排序。 并且不想使用pd.sort_values()。 你可以试试下面的解决方案。

问题:在这个序列['A', 'C', 'D', 'B']中排序列"col1"

import pandas as pd
import numpy as np

## Sample DataFrame ##
df = pd.DataFrame({'col1': ['A', 'B', 'D', 'C', 'A']})

>>> df
   col1
0    A
1    B
2    D
3    C
4    A
## Solution ##

conditions = []
values = []

for i,j in enumerate(['A','C','D','B']):
    conditions.append((df['col1'] == j))
    values.append(i)

df['col1_Num'] = np.select(conditions, values)

df.sort_values(by='col1_Num',inplace = True)

>>> df

    col1  col1_Num
0    A         0
4    A         0
3    C         1
2    D         2
1    B         3