我有一个这样的数据帧:

print(df)

        0          1     2
0   354.7      April   4.0
1    55.4     August   8.0
2   176.5   December  12.0
3    95.5   February   2.0
4    85.6    January   1.0
5     152       July   7.0
6   238.7       June   6.0
7   104.8      March   3.0
8   283.5        May   5.0
9   278.8   November  11.0
10  249.6    October  10.0
11  212.7  September   9.0

如您所见,月份不是按日历顺序排列的。因此,我创建了第二列来获取每个月对应的月份号(1-12)。在此基础上,如何根据日历月份的顺序对数据帧进行排序?


使用sort_values根据特定列的值对df进行排序:

In [18]:
df.sort_values('2')

Out[18]:
        0          1     2
4    85.6    January   1.0
3    95.5   February   2.0
7   104.8      March   3.0
0   354.7      April   4.0
8   283.5        May   5.0
6   238.7       June   6.0
5   152.0       July   7.0
1    55.4     August   8.0
11  212.7  September   9.0
10  249.6    October  10.0
9   278.8   November  11.0
2   176.5   December  12.0

如果希望按两列排序,则将列标签列表传递给sort_values,其中列标签按照排序优先级排序。如果用df。Sort_values(['2', '0']),则结果将按第2列和第0列排序。当然,这对于这个例子来说没有意义,因为df['2']中的每个值都是唯一的。


只是在数据上增加了一些操作。假设我们有一个数据帧df,我们可以做几个操作来得到想要的输出

ID         cost      tax    label
1       216590      1600    test      
2       523213      1800    test 
3          250      1500    experiment

(df['label'].value_counts().to_frame().reset_index()).sort_values('label', ascending=False)

将给分类输出标签作为一个数据框架

    index   label
0   test        2
1   experiment  1

我尝试了上面的解决方案,但没有达到效果,所以我找到了一个适合我的不同的解决方案。升序=False是将数据帧按降序排列,默认情况下为True。我使用的是python 3.6.6和pandas 0.23.4版本。

final_df = df.sort_values(by=['2'], ascending=False)

你可以在pandas文档中看到更多细节。


作为另一种解决方案:

而不是创建第二列,你可以对你的字符串数据(月份名)进行分类,并像这样排序:

df.rename(columns={1:'month'},inplace=True)
df['month'] = pd.Categorical(df['month'],categories=['December','November','October','September','August','July','June','May','April','March','February','January'],ordered=True)
df = df.sort_values('month',ascending=False)

它将按照您在创建Categorical对象时指定的月份名称为您提供有序数据。


下面是根据pandas文档的sort_values模板。

DataFrame.sort_values(by, axis=0,
                          ascending=True,
                          inplace=False,
                          kind='quicksort',
                          na_position='last',
                          ignore_index=False, key=None)[source]

在这种情况下,它是这样的。

df.sort_values (= [' 2 '])

API参考pandas.DataFrame.sort_values


使用列名对我很有用。

sorted_df = df.sort_values(by=['Column_name'], ascending=True)

这对我很有效

df.sort_values(by='Column_name', inplace=True, ascending=False)

Panda的sort_values完成了这项工作。

可以传递各种各样的参数,比如升序(bool或bool列表):

升序排序和降序排序。为多个排序顺序指定列表。如果这是一个bool列表,则必须匹配by的长度。

由于默认值是升序的,而OP的目标是升序排序,因此不需要指定该参数(参见下面最后一个注释,了解解决降序的方法),因此可以使用以下方法之一:

Performing the operation in-place, and keeping the same variable name. This requires one to pass inplace=True as follows: df.sort_values(by=['2'], inplace=True) # or df.sort_values(by = '2', inplace = True) # or df.sort_values('2', inplace = True) If doing the operation in-place is not a requirement, one can assign the change (sort) to a variable: With the same name of the original dataframe, df as df = df.sort_values(by=['2']) With a different name, such as df_new, as df_new = df.sort_values(by=['2'])

前面的所有操作都将给出以下输出

        0          1     2
4    85.6    January   1.0
3    95.5   February   2.0
7   104.8      March   3.0
0   354.7      April   4.0
8   283.5        May   5.0
6   238.7       June   6.0
5     152       July   7.0
1    55.4     August   8.0
11  212.7  September   9.0
10  249.6    October  10.0
9   278.8   November  11.0
2   176.5   December  12.0

最后,可以使用pandas.DataFrame重置索引。Reset_index,以获得以下内容

df.reset_index(drop = True, inplace = True)

# or

df = df.reset_index(drop = True)

[Out]:

        0          1     2
0    85.6    January   1.0
1    95.5   February   2.0
2   104.8      March   3.0
3   354.7      April   4.0
4   283.5        May   5.0
5   238.7       June   6.0
6     152       July   7.0
7    55.4     August   8.0
8   212.7  September   9.0
9   249.6    October  10.0
10  278.8   November  11.0
11  176.5   December  12.0

按升序排序并重置索引的一行程序如下所示

df = df.sort_values(by=['2']).reset_index(drop = True)

[Out]:

        0          1     2
0    85.6    January   1.0
1    95.5   February   2.0
2   104.8      March   3.0
3   354.7      April   4.0
4   283.5        May   5.0
5   238.7       June   6.0
6     152       July   7.0
7    55.4     August   8.0
8   212.7  September   9.0
9   249.6    October  10.0
10  278.8   November  11.0
11  176.5   December  12.0

注:

If one is not doing the operation in-place, forgetting the steps mentioned above may lead one (as this user) to not be able to get the expected result. There are strong opinions on using inplace. For that, one might want to read this. One is assuming that the column 2 is not a string. If it is, one will have to convert it: Using pandas.to_numeric df['2'] = pd.to_numeric(df['2']) Using pandas.Series.astype df['2'] = df['2'].astype(float) If one wants in descending order, one needs to pass ascending=False as df = df.sort_values(by=['2'], ascending=False) # or df.sort_values(by = '2', ascending=False, inplace=True) [Out]: 0 1 2 2 176.5 December 12.0 9 278.8 November 11.0 10 249.6 October 10.0 11 212.7 September 9.0 1 55.4 August 8.0 5 152 July 7.0 6 238.7 June 6.0 8 283.5 May 5.0 0 354.7 April 4.0 7 104.8 March 3.0 3 95.5 February 2.0 4 85.6 January 1.0


这个对我很有用:

df=df.sort_values(by=[2])

而:

df=df.sort_values(by=['2']) 

不起作用。


例子: 假设你有一个值为1和0的列,你想要分离并只使用一个值,那么:

// furniture is one of the columns in the csv file.
 

allrooms = data.groupby('furniture')['furniture'].agg('count')
allrooms


myrooms1 = pan.DataFrame(allrooms, columns = ['furniture'], index = [1])

myrooms2 = pan.DataFrame(allrooms, columns = ['furniture'], index = [0])

print(myrooms1);print(myrooms2)

你可能需要在排序后重置索引:

df = df.sort_values('2')
df = df.reset_index(drop=True)

只是增加了一些见解

df=raw_df['2'].sort_values() # will sort only one column (i.e 2)

但是,

df =raw_df.sort_values(by=["2"] , ascending = False)  # this  will sort the whole df in decending order on the basis of the column "2"

如果您想动态排序列,而不是按字母顺序排序。 并且不想使用pd.sort_values()。 你可以试试下面的解决方案。

问题:在这个序列['A', 'C', 'D', 'B']中排序列"col1"

import pandas as pd
import numpy as np

## Sample DataFrame ##
df = pd.DataFrame({'col1': ['A', 'B', 'D', 'C', 'A']})

>>> df
   col1
0    A
1    B
2    D
3    C
4    A
## Solution ##

conditions = []
values = []

for i,j in enumerate(['A','C','D','B']):
    conditions.append((df['col1'] == j))
    values.append(i)

df['col1_Num'] = np.select(conditions, values)

df.sort_values(by='col1_Num',inplace = True)

>>> df

    col1  col1_Num
0    A         0
4    A         0
3    C         1
2    D         2
1    B         3