我如何才能找到某一列的值是最大的行?
df.max()会给我每一列的最大值,我不知道如何得到相应的行。
我如何才能找到某一列的值是最大的行?
df.max()会给我每一列的最大值,我不知道如何得到相应的行。
使用pandas的idxmax函数。这是简单的:
>>> import pandas
>>> import numpy as np
>>> df = pandas.DataFrame(np.random.randn(5,3),columns=['A','B','C'])
>>> df
A B C
0 1.232853 -1.979459 -0.573626
1 0.140767 0.394940 1.068890
2 0.742023 1.343977 -0.579745
3 2.125299 -0.649328 -0.211692
4 -0.187253 1.908618 -1.862934
>>> df['A'].idxmax()
3
>>> df['B'].idxmax()
4
>>> df['C'].idxmax()
1
Alternatively you could also use numpy.argmax, such as numpy.argmax(df['A']) -- it provides the same thing, and appears at least as fast as idxmax in cursory observations. idxmax() returns indices labels, not integers. Example': if you have string values as your index labels, like rows 'a' through 'e', you might want to know that the max occurs in row 4 (not row 'd'). if you want the integer position of that label within the Index you have to get it manually (which can be tricky now that duplicate row labels are allowed).
历史记录:
idxmax() used to be called argmax() prior to 0.11 argmax was deprecated prior to 1.0.0 and removed entirely in 1.0.0 back as of Pandas 0.16, argmax used to exist and perform the same function (though appeared to run more slowly than idxmax). argmax function returned the integer position within the index of the row location of the maximum element. pandas moved to using row labels instead of integer indices. Positional integer indices used to be very common, more common than labels, especially in applications where duplicate row labels are common.
例如,考虑这个玩具DataFrame带有重复的行标签:
In [19]: dfrm
Out[19]:
A B C
a 0.143693 0.653810 0.586007
b 0.623582 0.312903 0.919076
c 0.165438 0.889809 0.000967
d 0.308245 0.787776 0.571195
e 0.870068 0.935626 0.606911
f 0.037602 0.855193 0.728495
g 0.605366 0.338105 0.696460
h 0.000000 0.090814 0.963927
i 0.688343 0.188468 0.352213
i 0.879000 0.105039 0.900260
In [20]: dfrm['A'].idxmax()
Out[20]: 'i'
In [21]: dfrm.iloc[dfrm['A'].idxmax()] # .ix instead of .iloc in older versions of pandas
Out[21]:
A B C
i 0.688343 0.188468 0.352213
i 0.879000 0.105039 0.900260
因此,这里简单地使用idxmax是不够的,而旧形式的argmax可以正确地提供最大行的位置位置(在本例中为位置9)。
This is exactly one of those nasty kinds of bug-prone behaviors in dynamically typed languages that makes this sort of thing so unfortunate, and worth beating a dead horse over. If you are writing systems code and your system suddenly gets used on some data sets that are not cleaned properly before being joined, it's very easy to end up with duplicate row labels, especially string labels like a CUSIP or SEDOL identifier for financial assets. You can't easily use the type system to help you out, and you may not be able to enforce uniqueness on the index without running into unexpectedly missing data.
So you're left with hoping that your unit tests covered everything (they didn't, or more likely no one wrote any tests) -- otherwise (most likely) you're just left waiting to see if you happen to smack into this error at runtime, in which case you probably have to go drop many hours worth of work from the database you were outputting results to, bang your head against the wall in IPython trying to manually reproduce the problem, finally figuring out that it's because idxmax can only report the label of the max row, and then being disappointed that no standard function automatically gets the positions of the max row for you, writing a buggy implementation yourself, editing the code, and praying you don't run into the problem again.
你也可以试试idxmax:
In [5]: df = pandas.DataFrame(np.random.randn(10,3),columns=['A','B','C'])
In [6]: df
Out[6]:
A B C
0 2.001289 0.482561 1.579985
1 -0.991646 -0.387835 1.320236
2 0.143826 -1.096889 1.486508
3 -0.193056 -0.499020 1.536540
4 -2.083647 -3.074591 0.175772
5 -0.186138 -1.949731 0.287432
6 -0.480790 -1.771560 -0.930234
7 0.227383 -0.278253 2.102004
8 -0.002592 1.434192 -1.624915
9 0.404911 -2.167599 -0.452900
In [7]: df.idxmax()
Out[7]:
A 0
B 8
C 7
e.g.
In [8]: df.loc[df['A'].idxmax()]
Out[8]:
A 2.001289
B 0.482561
C 1.579985
如果有多行取最大值,上述两个答案都只返回一个索引。如果你想要所有的行,似乎没有一个函数。 但这并不难做到。下面是一个Series的例子;DataFrame也可以这样做:
In [1]: from pandas import Series, DataFrame
In [2]: s=Series([2,4,4,3],index=['a','b','c','d'])
In [3]: s.idxmax()
Out[3]: 'b'
In [4]: s[s==s.max()]
Out[4]:
b 4
c 4
dtype: int64
DataFrame的idmax返回具有最大值的行的标签索引,argmax的行为取决于pandas的版本(现在它返回一个警告)。如果您想使用位置索引,您可以执行以下操作:
max_row = df['A'].values.argmax()
or
import numpy as np
max_row = np.argmax(df['A'].values)
请注意,如果使用np.argmax(df['A']),其行为与df['A'].argmax()相同。
df.iloc[df['columnX'].argmax()]
argmax()将为columnX提供与max值对应的索引。iloc可以用来获取该索引的DataFrame df的行。
mx.iloc[0].idxmax()
这一行代码将告诉你如何从dataframe中的一行中找到最大值,这里mx是dataframe, iloc[0]表示第0个索引。
直接的“.argmax()”解决方案不适合我。
前面的例子由@ely提供
>>> import pandas
>>> import numpy as np
>>> df = pandas.DataFrame(np.random.randn(5,3),columns=['A','B','C'])
>>> df
A B C
0 1.232853 -1.979459 -0.573626
1 0.140767 0.394940 1.068890
2 0.742023 1.343977 -0.579745
3 2.125299 -0.649328 -0.211692
4 -0.187253 1.908618 -1.862934
>>> df['A'].argmax()
3
>>> df['B'].argmax()
4
>>> df['C'].argmax()
1
返回以下消息:
FutureWarning: 'argmax' is deprecated, use 'idxmax' instead. The behavior of 'argmax'
will be corrected to return the positional maximum in the future.
Use 'series.values.argmax' to get the position of the maximum now.
所以我的解是:
df['A'].values.argmax()
非常简单:我们有如下所示的df,我们想在C中打印一行max值:
A B C
x 1 4
y 2 10
z 5 9
In:
df.loc[df['C'] == df['C'].max()] # condition check
Out:
A B C
y 2 10
如果你想要整行而不仅仅是id,你可以使用df。nbiggest和传递你想要多少“top”行,你也可以传递你想要它的列/列。
df.nlargest(2,['A'])
会给出A的前两个值对应的行。
使用df。最小值为nminimal。
使用query()更紧凑和可读的解决方案是这样的:
import pandas as pd
df = pandas.DataFrame(np.random.randn(5,3),columns=['A','B','C'])
print(df)
# find row with maximum A
df.query('A == A.max()')
它还返回一个DataFrame而不是Series,这对于某些用例来说很方便。
考虑这个数据框架
[In]: df = pd.DataFrame(np.random.randn(4,3),columns=['A','B','C'])
[Out]:
A B C
0 -0.253233 0.226313 1.223688
1 0.472606 1.017674 1.520032
2 1.454875 1.066637 0.381890
3 -0.054181 0.234305 -0.557915
假设一个人想知道列“C”最大的行,下面的工作将完成
[In]: df[df['C']==df['C'].max()])
[Out]:
A B C
1 0.472606 1.017674 1.520032
Use:
data.iloc[data['A'].idxmax()]
data['A'].idxmax() -根据行查找最大值位置 Data.iloc() -返回行
如果最大值中有联系,那么idxmax只返回第一个最大值的索引。例如,在下面的DataFrame中:
A B C
0 1 0 1
1 0 0 1
2 0 0 0
3 0 1 1
4 1 0 0
idxmax回报
A 0
B 3
C 0
dtype: int64
现在,如果我们想要所有的索引都对应于max值,那么我们可以使用max + eq来创建一个布尔DataFrame,然后在df上使用它。Index来过滤索引:
out = df.eq(df.max()).apply(lambda x: df.index[x].tolist())
输出:
A [0, 4]
B [3]
C [0, 1, 3]
dtype: object
对我有用的是: df[df['colX'] == df['colX'].max()
然后得到df中colX最大值的行。
然后,如果你只想要索引,你可以在查询的末尾添加.index。