如何检查熊猫数据帧是否为空?在我的情况下,我想打印一些消息在终端,如果DataFrame是空的。
我使用len函数。它比空的快多了。Len (df.index)甚至更快。
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10000, 4), columns=list('ABCD'))
def empty(df):
return df.empty
def lenz(df):
return len(df) == 0
def lenzi(df):
return len(df.index) == 0
'''
%timeit empty(df)
%timeit lenz(df)
%timeit lenzi(df)
10000 loops, best of 3: 13.9 µs per loop
100000 loops, best of 3: 2.34 µs per loop
1000000 loops, best of 3: 695 ns per loop
len on index seems to be faster
'''
我更喜欢走长线。以下是我避免使用try-except子句-的检查方法
检查变量是否为None 然后检查它是否是一个数据框架和 确保它不是空的
在这里,DATA是可疑变量-
DATA is not None and isinstance(DATA, pd.DataFrame) and not DATA.empty
要查看一个数据框架是否为空,我认为应该测试一个数据框架的列索引的长度:
if len(df.columns) == 0: 1
原因:
根据Pandas参考API,有以下区别:
一个空的数据框架,0行0列 一个空的数据框架,包含NaN的行,因此至少有一列
可以说,它们是不一样的。其他的答案在df中是不精确的。empty, len(df),或len(df.index)没有区别,返回index为0,在这两种情况下empty为True。
例子
例1:一个空的数据框架,0行0列
In [1]: import pandas as pd
df1 = pd.DataFrame()
df1
Out[1]: Empty DataFrame
Columns: []
Index: []
In [2]: len(df1.index) # or len(df1)
Out[2]: 0
In [3]: df1.empty
Out[3]: True
例2:一个数据帧被清空到0行,但仍然保留n列
In [4]: df2 = pd.DataFrame({'AA' : [1, 2, 3], 'BB' : [11, 22, 33]})
df2
Out[4]: AA BB
0 1 11
1 2 22
2 3 33
In [5]: df2 = df2[df2['AA'] == 5]
df2
Out[5]: Empty DataFrame
Columns: [AA, BB]
Index: []
In [6]: len(df2.index) # or len(df2)
Out[6]: 0
In [7]: df2.empty
Out[7]: True
现在,基于前面的例子,其中索引为0,空为True。当读取第一个加载的数据帧df1的列索引的长度时,它返回0列以证明它确实是空的。
In [8]: len(df1.columns)
Out[8]: 0
In [9]: len(df2.columns)
Out[9]: 2
关键是,虽然第二个数据框架df2不包含任何数据,但它并不是完全空的,因为它返回了保留的空列的数量。
为什么这很重要
让我们在这些数据框架中添加一个新列来理解其含义:
# As expected, the empty column displays 1 series
In [10]: df1['CC'] = [111, 222, 333]
df1
Out[10]: CC
0 111
1 222
2 333
In [11]: len(df1.columns)
Out[11]: 1
# Note the persisting series with rows containing `NaN` values in df2
In [12]: df2['CC'] = [111, 222, 333]
df2
Out[12]: AA BB CC
0 NaN NaN 111
1 NaN NaN 222
2 NaN NaN 333
In [13]: len(df2.columns)
Out[13]: 3
很明显,df2中的原始列已经重新出现。因此,谨慎的做法是使用len(pandas.core.frame.DataFrame.columns)来读取列索引的长度,看看数据帧是否为空。
实用的解决方案
# New dataframe df
In [1]: df = pd.DataFrame({'AA' : [1, 2, 3], 'BB' : [11, 22, 33]})
df
Out[1]: AA BB
0 1 11
1 2 22
2 3 33
# This data manipulation approach results in an empty df
# because of a subset of values that are not available (`NaN`)
In [2]: df = df[df['AA'] == 5]
df
Out[2]: Empty DataFrame
Columns: [AA, BB]
Index: []
# NOTE: the df is empty, BUT the columns are persistent
In [3]: len(df.columns)
Out[3]: 2
# And accordingly, the other answers on this page
In [4]: len(df.index) # or len(df)
Out[4]: 0
In [5]: df.empty
Out[5]: True
# SOLUTION: conditionally check for empty columns
In [6]: if len(df.columns) != 0: # <--- here
# Do something, e.g.
# drop any columns containing rows with `NaN`
# to make the df really empty
df = df.dropna(how='all', axis=1)
df
Out[6]: Empty DataFrame
Columns: []
Index: []
# Testing shows it is indeed empty now
In [7]: len(df.columns)
Out[7]: 0
添加一个新的数据序列可以像预期的那样工作,而不需要重新显示空列(实际上,不需要任何只包含NaN行的序列):
In [8]: df['CC'] = [111, 222, 333]
df
Out[8]: CC
0 111
1 222
2 333
In [9]: len(df.columns)
Out[9]: 1
If a DataFrame has got Nan and Non Null values and you want to find whether the DataFrame is empty or not then try this code. when this situation can happen? This situation happens when a single function is used to plot more than one DataFrame which are passed as parameter.In such a situation the function try to plot the data even when a DataFrame is empty and thus plot an empty figure!. It will make sense if simply display 'DataFrame has no data' message. why? if a DataFrame is empty(i.e. contain no data at all.Mind you DataFrame with Nan values is considered non empty) then it is desirable not to plot but put out a message : Suppose we have two DataFrames df1 and df2. The function myfunc takes any DataFrame(df1 and df2 in this case) and print a message if a DataFrame is empty(instead of plotting): df1 df2 col1 col2 col1 col2 Nan 2 Nan Nan 2 Nan Nan Nan
函数:
def myfunc(df):
if (df.count().sum())>0: ##count the total number of non Nan values.Equal to 0 if DataFrame is empty
print('not empty')
df.plot(kind='barh')
else:
display a message instead of plotting if it is empty
print('empty')
推荐文章
- 在Python中获取大文件的MD5哈希值
- 在Python格式字符串中%s是什么意思?
- 如何循环通过所有但最后一项的列表?
- python用什么方法避免默认参数为空列表?
- ValueError: numpy。Ndarray大小改变,可能表示二进制不兼容。期望从C头得到88,从PyObject得到80
- Anaconda /conda -安装特定的软件包版本
- 我在哪里调用Keras的BatchNormalization函数?
- 打印测试执行时间并使用py.test锁定缓慢的测试
- 插入一行到熊猫数据框架
- 要列出Pandas DataFrame列
- 在Django模型中存储电话号码的最佳方法是什么?
- 从导入的模块中模拟函数
- 滚动或滑动窗口迭代器?
- python的方法找到最大值和它的索引在一个列表?
- 如何读取文件的前N行?