如何检查熊猫数据帧是否为空?在我的情况下,我想打印一些消息在终端,如果DataFrame是空的。


当前回答

要查看一个数据框架是否为空,我认为应该测试一个数据框架的列索引的长度:

if len(df.columns) == 0: 1

原因:

根据Pandas参考API,有以下区别:

一个空的数据框架,0行0列 一个空的数据框架,包含NaN的行,因此至少有一列

可以说,它们是不一样的。其他的答案在df中是不精确的。empty, len(df),或len(df.index)没有区别,返回index为0,在这两种情况下empty为True。

例子

例1:一个空的数据框架,0行0列

In [1]: import pandas as pd
        df1 = pd.DataFrame()
        df1
Out[1]: Empty DataFrame
        Columns: []
        Index: []

In [2]: len(df1.index)  # or len(df1)
Out[2]: 0

In [3]: df1.empty
Out[3]: True

例2:一个数据帧被清空到0行,但仍然保留n列

In [4]: df2 = pd.DataFrame({'AA' : [1, 2, 3], 'BB' : [11, 22, 33]})
        df2
Out[4]:    AA  BB
        0   1  11
        1   2  22
        2   3  33

In [5]: df2 = df2[df2['AA'] == 5]
        df2
Out[5]: Empty DataFrame
        Columns: [AA, BB]
        Index: []

In [6]: len(df2.index)  # or len(df2)
Out[6]: 0

In [7]: df2.empty
Out[7]: True

现在,基于前面的例子,其中索引为0,空为True。当读取第一个加载的数据帧df1的列索引的长度时,它返回0列以证明它确实是空的。

In [8]: len(df1.columns)
Out[8]: 0

In [9]: len(df2.columns)
Out[9]: 2

关键是,虽然第二个数据框架df2不包含任何数据,但它并不是完全空的,因为它返回了保留的空列的数量。

为什么这很重要

让我们在这些数据框架中添加一个新列来理解其含义:

# As expected, the empty column displays 1 series
In [10]: df1['CC'] = [111, 222, 333]
         df1
Out[10]:    CC
         0 111
         1 222
         2 333
In [11]: len(df1.columns)
Out[11]: 1

# Note the persisting series with rows containing `NaN` values in df2
In [12]: df2['CC'] = [111, 222, 333]
         df2
Out[12]:    AA  BB   CC
         0 NaN NaN  111
         1 NaN NaN  222
         2 NaN NaN  333
In [13]: len(df2.columns)
Out[13]: 3

很明显,df2中的原始列已经重新出现。因此,谨慎的做法是使用len(pandas.core.frame.DataFrame.columns)来读取列索引的长度,看看数据帧是否为空。

实用的解决方案

# New dataframe df
In [1]: df = pd.DataFrame({'AA' : [1, 2, 3], 'BB' : [11, 22, 33]})
        df
Out[1]:    AA  BB
        0   1  11
        1   2  22
        2   3  33

# This data manipulation approach results in an empty df
# because of a subset of values that are not available (`NaN`)
In [2]: df = df[df['AA'] == 5]
        df
Out[2]: Empty DataFrame
        Columns: [AA, BB]
        Index: []

# NOTE: the df is empty, BUT the columns are persistent
In [3]: len(df.columns)
Out[3]: 2

# And accordingly, the other answers on this page
In [4]: len(df.index)  # or len(df)
Out[4]: 0

In [5]: df.empty
Out[5]: True
# SOLUTION: conditionally check for empty columns
In [6]: if len(df.columns) != 0:  # <--- here
            # Do something, e.g. 
            # drop any columns containing rows with `NaN`
            # to make the df really empty
            df = df.dropna(how='all', axis=1)
        df
Out[6]: Empty DataFrame
        Columns: []
        Index: []

# Testing shows it is indeed empty now
In [7]: len(df.columns)
Out[7]: 0

添加一个新的数据序列可以像预期的那样工作,而不需要重新显示空列(实际上,不需要任何只包含NaN行的序列):

In [8]: df['CC'] = [111, 222, 333]
         df
Out[8]:    CC
         0 111
         1 222
         2 333
In [9]: len(df.columns)
Out[9]: 1

其他回答

您可以使用属性df。Empty检查是否为空:

if df.empty:
    print('DataFrame is empty!')

来源:熊猫文献

要查看一个数据框架是否为空,我认为应该测试一个数据框架的列索引的长度:

if len(df.columns) == 0: 1

原因:

根据Pandas参考API,有以下区别:

一个空的数据框架,0行0列 一个空的数据框架,包含NaN的行,因此至少有一列

可以说,它们是不一样的。其他的答案在df中是不精确的。empty, len(df),或len(df.index)没有区别,返回index为0,在这两种情况下empty为True。

例子

例1:一个空的数据框架,0行0列

In [1]: import pandas as pd
        df1 = pd.DataFrame()
        df1
Out[1]: Empty DataFrame
        Columns: []
        Index: []

In [2]: len(df1.index)  # or len(df1)
Out[2]: 0

In [3]: df1.empty
Out[3]: True

例2:一个数据帧被清空到0行,但仍然保留n列

In [4]: df2 = pd.DataFrame({'AA' : [1, 2, 3], 'BB' : [11, 22, 33]})
        df2
Out[4]:    AA  BB
        0   1  11
        1   2  22
        2   3  33

In [5]: df2 = df2[df2['AA'] == 5]
        df2
Out[5]: Empty DataFrame
        Columns: [AA, BB]
        Index: []

In [6]: len(df2.index)  # or len(df2)
Out[6]: 0

In [7]: df2.empty
Out[7]: True

现在,基于前面的例子,其中索引为0,空为True。当读取第一个加载的数据帧df1的列索引的长度时,它返回0列以证明它确实是空的。

In [8]: len(df1.columns)
Out[8]: 0

In [9]: len(df2.columns)
Out[9]: 2

关键是,虽然第二个数据框架df2不包含任何数据,但它并不是完全空的,因为它返回了保留的空列的数量。

为什么这很重要

让我们在这些数据框架中添加一个新列来理解其含义:

# As expected, the empty column displays 1 series
In [10]: df1['CC'] = [111, 222, 333]
         df1
Out[10]:    CC
         0 111
         1 222
         2 333
In [11]: len(df1.columns)
Out[11]: 1

# Note the persisting series with rows containing `NaN` values in df2
In [12]: df2['CC'] = [111, 222, 333]
         df2
Out[12]:    AA  BB   CC
         0 NaN NaN  111
         1 NaN NaN  222
         2 NaN NaN  333
In [13]: len(df2.columns)
Out[13]: 3

很明显,df2中的原始列已经重新出现。因此,谨慎的做法是使用len(pandas.core.frame.DataFrame.columns)来读取列索引的长度,看看数据帧是否为空。

实用的解决方案

# New dataframe df
In [1]: df = pd.DataFrame({'AA' : [1, 2, 3], 'BB' : [11, 22, 33]})
        df
Out[1]:    AA  BB
        0   1  11
        1   2  22
        2   3  33

# This data manipulation approach results in an empty df
# because of a subset of values that are not available (`NaN`)
In [2]: df = df[df['AA'] == 5]
        df
Out[2]: Empty DataFrame
        Columns: [AA, BB]
        Index: []

# NOTE: the df is empty, BUT the columns are persistent
In [3]: len(df.columns)
Out[3]: 2

# And accordingly, the other answers on this page
In [4]: len(df.index)  # or len(df)
Out[4]: 0

In [5]: df.empty
Out[5]: True
# SOLUTION: conditionally check for empty columns
In [6]: if len(df.columns) != 0:  # <--- here
            # Do something, e.g. 
            # drop any columns containing rows with `NaN`
            # to make the df really empty
            df = df.dropna(how='all', axis=1)
        df
Out[6]: Empty DataFrame
        Columns: []
        Index: []

# Testing shows it is indeed empty now
In [7]: len(df.columns)
Out[7]: 0

添加一个新的数据序列可以像预期的那样工作,而不需要重新显示空列(实际上,不需要任何只包含NaN行的序列):

In [8]: df['CC'] = [111, 222, 333]
         df
Out[8]:    CC
         0 111
         1 222
         2 333
In [9]: len(df.columns)
Out[9]: 1

我使用len函数。它比空的快多了。Len (df.index)甚至更快。

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(10000, 4), columns=list('ABCD'))

def empty(df):
    return df.empty

def lenz(df):
    return len(df) == 0

def lenzi(df):
    return len(df.index) == 0

'''
%timeit empty(df)
%timeit lenz(df)
%timeit lenzi(df)

10000 loops, best of 3: 13.9 µs per loop
100000 loops, best of 3: 2.34 µs per loop
1000000 loops, best of 3: 695 ns per loop

len on index seems to be faster
'''

If a DataFrame has got Nan and Non Null values and you want to find whether the DataFrame is empty or not then try this code. when this situation can happen? This situation happens when a single function is used to plot more than one DataFrame which are passed as parameter.In such a situation the function try to plot the data even when a DataFrame is empty and thus plot an empty figure!. It will make sense if simply display 'DataFrame has no data' message. why? if a DataFrame is empty(i.e. contain no data at all.Mind you DataFrame with Nan values is considered non empty) then it is desirable not to plot but put out a message : Suppose we have two DataFrames df1 and df2. The function myfunc takes any DataFrame(df1 and df2 in this case) and print a message if a DataFrame is empty(instead of plotting): df1 df2 col1 col2 col1 col2 Nan 2 Nan Nan 2 Nan Nan Nan

函数:

def myfunc(df):
  if (df.count().sum())>0: ##count the total number of non Nan values.Equal to 0 if DataFrame is empty
     print('not empty')
     df.plot(kind='barh')
  else:
     display a message instead of plotting if it is empty
     print('empty')

我更喜欢走长线。以下是我避免使用try-except子句-的检查方法

检查变量是否为None 然后检查它是否是一个数据框架和 确保它不是空的

在这里,DATA是可疑变量-

DATA is not None and isinstance(DATA, pd.DataFrame) and not DATA.empty