按子字符串条件筛选panda DataFrame

我有一个熊猫DataFrame，其中包含一列字符串值。我需要根据部分字符串匹配来选择行。

类似于这个成语：

re.search(pattern, cell_in_question)

返回布尔值。我熟悉df[df['A']==“helloworld”]的语法，但似乎找不到一种方法来处理部分字符串匹配，比如“hello”。

当前回答

也许您想在Pandas数据帧的所有列中搜索一些文本，而不仅仅是在它们的子集中。在这种情况下，以下代码将有所帮助。

df[df.apply(lambda row: row.astype(str).str.contains('String To Find').any(), axis=1)]

警告这种方法虽然方便，但速度相对较慢。

2020-02-20 13:06:07

其他回答

快速提示：如果要基于索引中包含的部分字符串进行选择，请尝试以下操作：

df['stridx']=df.index
df[df['stridx'].str.contains("Hello|Britain")]

2014-04-10 15:36:14

矢量化字符串方法（即Series.str）允许您执行以下操作：

df[df['A'].str.contains("hello")]

这在熊猫0.8.1及以上版本中可用。

2012-07-17 21:52:18

如果有人想知道如何执行相关问题：“按部分字符串选择列”

Use:

df.filter(like='hello')  # select columns which contain the word hello

若要通过部分字符串匹配来选择行，请将axis=0传递到筛选器：

# selects rows which contain the word hello in their index label
df.filter(like='hello', axis=0)

2016-10-12 21:04:32

假设您有以下DataFrame：

>>> df = pd.DataFrame([['hello', 'hello world'], ['abcd', 'defg']], columns=['a','b'])
>>> df
       a            b
0  hello  hello world
1   abcd         defg

您始终可以在lambda表达式中使用in运算符来创建筛选器。

>>> df.apply(lambda x: x['a'] in x['b'], axis=1)
0     True
1    False
dtype: bool

这里的技巧是在apply中使用axis=1选项，将元素逐行传递给lambda函数，而不是逐列传递。

2014-11-10 19:26:27

我在ipython笔记本电脑的macos上使用熊猫0.14.1。我尝试了上面的建议行：

df[df["A"].str.contains("Hello|Britain")]

并得到一个错误：

无法使用包含NA/NaN值的矢量进行索引

但当添加了“==True”条件时，效果非常好，如下所示：

df[df['A'].str.contains("Hello|Britain")==True]

2014-11-10 17:05:17

按子字符串条件筛选panda DataFrame

推荐文章

最新文章

标签