按子字符串条件筛选panda DataFrame

我有一个熊猫DataFrame，其中包含一列字符串值。我需要根据部分字符串匹配来选择行。

类似于这个成语：

re.search(pattern, cell_in_question)

返回布尔值。我熟悉df[df['A']==“helloworld”]的语法，但似乎找不到一种方法来处理部分字符串匹配，比如“hello”。

当前回答

对于包含特殊字符的字符串，使用contains效果不佳。尽管找到了工作。

df[df['A'].str.find("hello") != -1]

2019-11-20 13:22:49

其他回答

一个更一般的例子-如果在字符串中查找单词或特定单词的部分：

df = pd.DataFrame([('cat andhat', 1000.0), ('hat', 2000000.0), ('the small dog', 1000.0), ('fog', 330000.0),('pet', 330000.0)], columns=['col1', 'col2'])

句子或单词的特定部分：

searchfor = '.*cat.*hat.*|.*the.*dog.*'

创建显示受影响行的列（可以根据需要过滤掉）

df["TrueFalse"]=df['col1'].str.contains(searchfor, regex=True)

    col1             col2           TrueFalse
0   cat andhat       1000.0         True
1   hat              2000000.0      False
2   the small dog    1000.0         True
3   fog              330000.0       False
4   pet 3            30000.0        False

2021-02-16 09:41:59

您可以尝试将它们视为字符串：

df[df['A'].astype(str).str.contains("Hello|Britain")]

2021-05-29 08:16:45

假设您有以下DataFrame：

>>> df = pd.DataFrame([['hello', 'hello world'], ['abcd', 'defg']], columns=['a','b'])
>>> df
       a            b
0  hello  hello world
1   abcd         defg

您始终可以在lambda表达式中使用in运算符来创建筛选器。

>>> df.apply(lambda x: x['a'] in x['b'], axis=1)
0     True
1    False
dtype: bool

这里的技巧是在apply中使用axis=1选项，将元素逐行传递给lambda函数，而不是逐列传递。

2014-11-10 19:26:27

这是我最后为部分字符串匹配所做的。如果有人有更有效的方法，请告诉我。

def stringSearchColumn_DataFrame(df, colName, regex):
    newdf = DataFrame()
    for idx, record in df[colName].iteritems():

        if re.search(regex, record):
            newdf = concat([df[df[colName] == record], newdf], ignore_index=True)

    return newdf

2012-07-06 17:08:46

对于包含特殊字符的字符串，使用contains效果不佳。尽管找到了工作。

df[df['A'].str.find("hello") != -1]

2019-11-20 13:22:49

按子字符串条件筛选panda DataFrame

推荐文章

最新文章

标签