如何检查熊猫数据帧是否为空?在我的情况下,我想打印一些消息在终端,如果DataFrame是空的。
我使用len函数。它比空的快多了。Len (df.index)甚至更快。
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10000, 4), columns=list('ABCD'))
def empty(df):
return df.empty
def lenz(df):
return len(df) == 0
def lenzi(df):
return len(df.index) == 0
'''
%timeit empty(df)
%timeit lenz(df)
%timeit lenzi(df)
10000 loops, best of 3: 13.9 µs per loop
100000 loops, best of 3: 2.34 µs per loop
1000000 loops, best of 3: 695 ns per loop
len on index seems to be faster
'''
我更喜欢走长线。以下是我避免使用try-except子句-的检查方法
检查变量是否为None 然后检查它是否是一个数据框架和 确保它不是空的
在这里,DATA是可疑变量-
DATA is not None and isinstance(DATA, pd.DataFrame) and not DATA.empty
要查看一个数据框架是否为空,我认为应该测试一个数据框架的列索引的长度:
if len(df.columns) == 0: 1
原因:
根据Pandas参考API,有以下区别:
一个空的数据框架,0行0列 一个空的数据框架,包含NaN的行,因此至少有一列
可以说,它们是不一样的。其他的答案在df中是不精确的。empty, len(df),或len(df.index)没有区别,返回index为0,在这两种情况下empty为True。
例子
例1:一个空的数据框架,0行0列
In [1]: import pandas as pd
df1 = pd.DataFrame()
df1
Out[1]: Empty DataFrame
Columns: []
Index: []
In [2]: len(df1.index) # or len(df1)
Out[2]: 0
In [3]: df1.empty
Out[3]: True
例2:一个数据帧被清空到0行,但仍然保留n列
In [4]: df2 = pd.DataFrame({'AA' : [1, 2, 3], 'BB' : [11, 22, 33]})
df2
Out[4]: AA BB
0 1 11
1 2 22
2 3 33
In [5]: df2 = df2[df2['AA'] == 5]
df2
Out[5]: Empty DataFrame
Columns: [AA, BB]
Index: []
In [6]: len(df2.index) # or len(df2)
Out[6]: 0
In [7]: df2.empty
Out[7]: True
现在,基于前面的例子,其中索引为0,空为True。当读取第一个加载的数据帧df1的列索引的长度时,它返回0列以证明它确实是空的。
In [8]: len(df1.columns)
Out[8]: 0
In [9]: len(df2.columns)
Out[9]: 2
关键是,虽然第二个数据框架df2不包含任何数据,但它并不是完全空的,因为它返回了保留的空列的数量。
为什么这很重要
让我们在这些数据框架中添加一个新列来理解其含义:
# As expected, the empty column displays 1 series
In [10]: df1['CC'] = [111, 222, 333]
df1
Out[10]: CC
0 111
1 222
2 333
In [11]: len(df1.columns)
Out[11]: 1
# Note the persisting series with rows containing `NaN` values in df2
In [12]: df2['CC'] = [111, 222, 333]
df2
Out[12]: AA BB CC
0 NaN NaN 111
1 NaN NaN 222
2 NaN NaN 333
In [13]: len(df2.columns)
Out[13]: 3
很明显,df2中的原始列已经重新出现。因此,谨慎的做法是使用len(pandas.core.frame.DataFrame.columns)来读取列索引的长度,看看数据帧是否为空。
实用的解决方案
# New dataframe df
In [1]: df = pd.DataFrame({'AA' : [1, 2, 3], 'BB' : [11, 22, 33]})
df
Out[1]: AA BB
0 1 11
1 2 22
2 3 33
# This data manipulation approach results in an empty df
# because of a subset of values that are not available (`NaN`)
In [2]: df = df[df['AA'] == 5]
df
Out[2]: Empty DataFrame
Columns: [AA, BB]
Index: []
# NOTE: the df is empty, BUT the columns are persistent
In [3]: len(df.columns)
Out[3]: 2
# And accordingly, the other answers on this page
In [4]: len(df.index) # or len(df)
Out[4]: 0
In [5]: df.empty
Out[5]: True
# SOLUTION: conditionally check for empty columns
In [6]: if len(df.columns) != 0: # <--- here
# Do something, e.g.
# drop any columns containing rows with `NaN`
# to make the df really empty
df = df.dropna(how='all', axis=1)
df
Out[6]: Empty DataFrame
Columns: []
Index: []
# Testing shows it is indeed empty now
In [7]: len(df.columns)
Out[7]: 0
添加一个新的数据序列可以像预期的那样工作,而不需要重新显示空列(实际上,不需要任何只包含NaN行的序列):
In [8]: df['CC'] = [111, 222, 333]
df
Out[8]: CC
0 111
1 222
2 333
In [9]: len(df.columns)
Out[9]: 1
If a DataFrame has got Nan and Non Null values and you want to find whether the DataFrame is empty or not then try this code. when this situation can happen? This situation happens when a single function is used to plot more than one DataFrame which are passed as parameter.In such a situation the function try to plot the data even when a DataFrame is empty and thus plot an empty figure!. It will make sense if simply display 'DataFrame has no data' message. why? if a DataFrame is empty(i.e. contain no data at all.Mind you DataFrame with Nan values is considered non empty) then it is desirable not to plot but put out a message : Suppose we have two DataFrames df1 and df2. The function myfunc takes any DataFrame(df1 and df2 in this case) and print a message if a DataFrame is empty(instead of plotting): df1 df2 col1 col2 col1 col2 Nan 2 Nan Nan 2 Nan Nan Nan
函数:
def myfunc(df):
if (df.count().sum())>0: ##count the total number of non Nan values.Equal to 0 if DataFrame is empty
print('not empty')
df.plot(kind='barh')
else:
display a message instead of plotting if it is empty
print('empty')
推荐文章
- 有没有办法在python中做HTTP PUT
- “foo Is None”和“foo == None”之间有什么区别吗?
- 类没有对象成员
- Django模型“没有显式声明app_label”
- 熊猫能自动从CSV文件中读取日期吗?
- 在python中zip的逆函数是什么?
- 有效的方法应用多个过滤器的熊猫数据框架或系列
- 如何检索插入id后插入行在SQLite使用Python?
- 我如何在Django中添加一个CharField占位符?
- 如何在Python中获取当前执行文件的路径?
- 我如何得到“id”后插入到MySQL数据库与Python?
- super()失败,错误:TypeError "参数1必须是类型,而不是classobj"当父不继承对象
- Python内存泄漏
- 实现嵌套字典的最佳方法是什么?
- 如何在tensorflow中获得当前可用的gpu ?