我如何计算在熊猫数据框架一列的NaN值?

我想找出我的数据的每一列中NaN的数量。

当前回答

另一个尚未被建议的简单选项是，为了只计算NaN，将在形状中添加以返回具有NaN的行数。

df[df['col_name'].isnull()]['col_name'].shape

2019-04-22 19:57:13

其他回答

对于第一部分，我们有多种方法计算NaN。

方法1计数，由于计数将忽略与大小不同的NaN

print(len(df) - df.count())

方法2:isnull / isna chain with sum

print(df.isnull().sum())
#print(df.isna().sum())

方法3 describe / info:注意这将输出' notull '值计数

print(df.describe())
#print(df.info())

方法。

print(np.count_nonzero(np.isnan(df.values),axis=0))

对于问题的第二部分，如果我们想要在thresh中删除列，我们可以尝试dropna

thresh, optional要求多个非na值。

Thresh = n # no null value require, you can also get the by int(x% * len(df))
df = df.dropna(thresh = Thresh, axis = 1)

2020-10-15 15:30:37

我使用这个循环来计算每一列的缺失值:

# check missing values
import numpy as np, pandas as pd
for col in df:
      print(col +': '+ np.str(df[col].isna().sum()))

2021-06-26 16:37:29

可以使用df.iteritems()对数据帧进行循环。在for循环中设置一个条件来计算每列的NaN值百分比，并删除那些包含NaN值超过设置阈值的值:

for col, val in df.iteritems():
    if (df[col].isnull().sum() / len(val) * 100) > 30:
        df.drop(columns=col, inplace=True)

2022-02-13 17:01:42

根据给出的答案和一些改进，这是我的方法

def PercentageMissin(Dataset):
    """this function will return the percentage of missing values in a dataset """
    if isinstance(Dataset,pd.DataFrame):
        adict={} #a dictionary conatin keys columns names and values percentage of missin value in the columns
        for col in Dataset.columns:
            adict[col]=(np.count_nonzero(Dataset[col].isnull())*100)/len(Dataset[col])
        return pd.DataFrame(adict,index=['% of missing'],columns=adict.keys())
    else:
        raise TypeError("can only be used with panda dataframe")

2017-05-23 16:42:40

你可以从非nan值的计数中减去总长度:

count_nan = len(df) - df.count()

你应该根据你的数据计算时间。与isnull解相比，小级数的速度提高了3倍。

2014-10-08 21:05:52

我如何计算在熊猫数据框架一列的NaN值?

推荐文章

最新文章

标签