我有一个这样的数据帧:

print(df)

        0          1     2
0   354.7      April   4.0
1    55.4     August   8.0
2   176.5   December  12.0
3    95.5   February   2.0
4    85.6    January   1.0
5     152       July   7.0
6   238.7       June   6.0
7   104.8      March   3.0
8   283.5        May   5.0
9   278.8   November  11.0
10  249.6    October  10.0
11  212.7  September   9.0

如您所见,月份不是按日历顺序排列的。因此,我创建了第二列来获取每个月对应的月份号(1-12)。在此基础上,如何根据日历月份的顺序对数据帧进行排序?


当前回答

Panda的sort_values完成了这项工作。

可以传递各种各样的参数,比如升序(bool或bool列表):

升序排序和降序排序。为多个排序顺序指定列表。如果这是一个bool列表,则必须匹配by的长度。

由于默认值是升序的,而OP的目标是升序排序,因此不需要指定该参数(参见下面最后一个注释,了解解决降序的方法),因此可以使用以下方法之一:

Performing the operation in-place, and keeping the same variable name. This requires one to pass inplace=True as follows: df.sort_values(by=['2'], inplace=True) # or df.sort_values(by = '2', inplace = True) # or df.sort_values('2', inplace = True) If doing the operation in-place is not a requirement, one can assign the change (sort) to a variable: With the same name of the original dataframe, df as df = df.sort_values(by=['2']) With a different name, such as df_new, as df_new = df.sort_values(by=['2'])

前面的所有操作都将给出以下输出

        0          1     2
4    85.6    January   1.0
3    95.5   February   2.0
7   104.8      March   3.0
0   354.7      April   4.0
8   283.5        May   5.0
6   238.7       June   6.0
5     152       July   7.0
1    55.4     August   8.0
11  212.7  September   9.0
10  249.6    October  10.0
9   278.8   November  11.0
2   176.5   December  12.0

最后,可以使用pandas.DataFrame重置索引。Reset_index,以获得以下内容

df.reset_index(drop = True, inplace = True)

# or

df = df.reset_index(drop = True)

[Out]:

        0          1     2
0    85.6    January   1.0
1    95.5   February   2.0
2   104.8      March   3.0
3   354.7      April   4.0
4   283.5        May   5.0
5   238.7       June   6.0
6     152       July   7.0
7    55.4     August   8.0
8   212.7  September   9.0
9   249.6    October  10.0
10  278.8   November  11.0
11  176.5   December  12.0

按升序排序并重置索引的一行程序如下所示

df = df.sort_values(by=['2']).reset_index(drop = True)

[Out]:

        0          1     2
0    85.6    January   1.0
1    95.5   February   2.0
2   104.8      March   3.0
3   354.7      April   4.0
4   283.5        May   5.0
5   238.7       June   6.0
6     152       July   7.0
7    55.4     August   8.0
8   212.7  September   9.0
9   249.6    October  10.0
10  278.8   November  11.0
11  176.5   December  12.0

注:

If one is not doing the operation in-place, forgetting the steps mentioned above may lead one (as this user) to not be able to get the expected result. There are strong opinions on using inplace. For that, one might want to read this. One is assuming that the column 2 is not a string. If it is, one will have to convert it: Using pandas.to_numeric df['2'] = pd.to_numeric(df['2']) Using pandas.Series.astype df['2'] = df['2'].astype(float) If one wants in descending order, one needs to pass ascending=False as df = df.sort_values(by=['2'], ascending=False) # or df.sort_values(by = '2', ascending=False, inplace=True) [Out]: 0 1 2 2 176.5 December 12.0 9 278.8 November 11.0 10 249.6 October 10.0 11 212.7 September 9.0 1 55.4 August 8.0 5 152 July 7.0 6 238.7 June 6.0 8 283.5 May 5.0 0 354.7 April 4.0 7 104.8 March 3.0 3 95.5 February 2.0 4 85.6 January 1.0

其他回答

使用列名对我很有用。

sorted_df = df.sort_values(by=['Column_name'], ascending=True)

例子: 假设你有一个值为1和0的列,你想要分离并只使用一个值,那么:

// furniture is one of the columns in the csv file.
 

allrooms = data.groupby('furniture')['furniture'].agg('count')
allrooms


myrooms1 = pan.DataFrame(allrooms, columns = ['furniture'], index = [1])

myrooms2 = pan.DataFrame(allrooms, columns = ['furniture'], index = [0])

print(myrooms1);print(myrooms2)

使用sort_values根据特定列的值对df进行排序:

In [18]:
df.sort_values('2')

Out[18]:
        0          1     2
4    85.6    January   1.0
3    95.5   February   2.0
7   104.8      March   3.0
0   354.7      April   4.0
8   283.5        May   5.0
6   238.7       June   6.0
5   152.0       July   7.0
1    55.4     August   8.0
11  212.7  September   9.0
10  249.6    October  10.0
9   278.8   November  11.0
2   176.5   December  12.0

如果希望按两列排序,则将列标签列表传递给sort_values,其中列标签按照排序优先级排序。如果用df。Sort_values(['2', '0']),则结果将按第2列和第0列排序。当然,这对于这个例子来说没有意义,因为df['2']中的每个值都是唯一的。

你可能需要在排序后重置索引:

df = df.sort_values('2')
df = df.reset_index(drop=True)

作为另一种解决方案:

而不是创建第二列,你可以对你的字符串数据(月份名)进行分类,并像这样排序:

df.rename(columns={1:'month'},inplace=True)
df['month'] = pd.Categorical(df['month'],categories=['December','November','October','September','August','July','June','May','April','March','February','January'],ordered=True)
df = df.sort_values('month',ascending=False)

它将按照您在创建Categorical对象时指定的月份名称为您提供有序数据。