如何根据Pandas中某列的值从DataFrame中选择行?

在SQL中,我会使用:

SELECT *
FROM table
WHERE column_name = some_value

当前回答

下面是一个简单的例子

from pandas import DataFrame

# Create data set
d = {'Revenue':[100,111,222], 
     'Cost':[333,444,555]}
df = DataFrame(d)


# mask = Return True when the value in column "Revenue" is equal to 111
mask = df['Revenue'] == 111

print mask

# Result:
# 0    False
# 1     True
# 2    False
# Name: Revenue, dtype: bool


# Select * FROM df WHERE Revenue = 111
df[mask]

# Result:
#    Cost    Revenue
# 1  444     111

其他回答

在Pandas的更新版本中,受文档启发(查看数据):

df[df["colume_name"] == some_value] #Scalar, True/False..

df[df["colume_name"] == "some_value"] #String

通过将子句放在括号()中,并用&和|(和/或)组合来组合多个条件。这样地:

df[(df["colume_name"] == "some_value1") & (pd[pd["colume_name"] == "some_value2"])]

其他过滤器

pandas.notna(df["colume_name"]) == True # Not NaN
df['colume_name'].str.contains("text") # Search for "text"
df['colume_name'].str.lower().str.contains("text") # Search for "text", after converting  to lowercase

对于Pandas中给定值的多个列中仅选择特定列:

select col_name1, col_name2 from table where column_name = some_value.

选项位置:

df.loc[df['column_name'] == some_value, [col_name1, col_name2]]

或查询:

df.query('column_name == some_value')[[col_name1, col_name2]]

如果您想重复查询数据帧,并且速度对您很重要,最好的方法是将数据帧转换为字典,然后通过这样做,您可以将查询速度提高数千倍。

my_df = df.set_index(column_name)
my_dict = my_df.to_dict('index')

制作my_dict字典后,您可以浏览:

if some_value in my_dict.keys():
   my_result = my_dict[some_value]

如果column_name中有重复值,则无法创建字典。但您可以使用:

my_result = my_df.loc[some_value]

要选择列值等于标量some_value的行,请使用==:

df.loc[df['column_name'] == some_value]

要选择列值在可迭代的some_values中的行,请使用isin:

df.loc[df['column_name'].isin(some_values)]

将多个条件与&组合:

df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]

注意括号。由于Python的运算符优先级规则,&binding比<=和>=更紧密。因此,最后一个示例中的括号是必要的。没有括号

df['column_name'] >= A & df['column_name'] <= B

解析为

df['column_name'] >= (A & df['column_name']) <= B

这导致序列的真值是模糊错误。


要选择列值不等于some_value的行,请使用!=:

df.loc[df['column_name'] != some_value]

isin返回布尔级数,因此要选择值不在some_values中的行,请使用~:

df.loc[~df['column_name'].isin(some_values)]

例如

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
                   'B': 'one one two three two two one three'.split(),
                   'C': np.arange(8), 'D': np.arange(8) * 2})
print(df)
#      A      B  C   D
# 0  foo    one  0   0
# 1  bar    one  1   2
# 2  foo    two  2   4
# 3  bar  three  3   6
# 4  foo    two  4   8
# 5  bar    two  5  10
# 6  foo    one  6  12
# 7  foo  three  7  14

print(df.loc[df['A'] == 'foo'])

产量

     A      B  C   D
0  foo    one  0   0
2  foo    two  2   4
4  foo    two  4   8
6  foo    one  6  12
7  foo  three  7  14

如果要包含多个值,请将它们放入列出(或更一般地,任何可迭代的)并使用isin:

print(df.loc[df['B'].isin(['one','three'])])

产量

     A      B  C   D
0  foo    one  0   0
1  bar    one  1   2
3  bar  three  3   6
6  foo    one  6  12
7  foo  three  7  14

但是,请注意,如果您希望多次这样做首先创建索引,然后使用df.loc:

df = df.set_index(['B'])
print(df.loc['one'])

产量

       A  C   D
B              
one  foo  0   0
one  bar  1   2
one  foo  6  12

或者,要包含索引中的多个值,请使用df.index.isin:

df.loc[df.index.isin(['one','two'])]

产量

       A  C   D
B              
one  foo  0   0
one  bar  1   2
two  foo  2   4
two  foo  4   8
two  bar  5  10
one  foo  6  12

要添加:您还可以执行df.groupby('column_name').get_group('column_desired_value').reset_index()以生成具有特定值的指定列的新数据帧。例如。,

import pandas as pd
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
                   'B': 'one one two three two two one three'.split()})
print("Original dataframe:")
print(df)

b_is_two_dataframe = pd.DataFrame(df.groupby('B').get_group('two').reset_index()).drop('index', axis = 1) 
#NOTE: the final drop is to remove the extra index column returned by groupby object
print('Sub dataframe where B is two:')
print(b_is_two_dataframe)

运行此命令可以:

Original dataframe:
     A      B
0  foo    one
1  bar    one
2  foo    two
3  bar  three
4  foo    two
5  bar    two
6  foo    one
7  foo  three
Sub dataframe where B is two:
     A    B
0  foo  two
1  foo  two
2  bar  two