如何从NumPy数组中删除NaN值?
[1, 2, NaN, 4, NaN, 8] ⟶ [1, 2, 4, 8]
如何从NumPy数组中删除NaN值?
[1, 2, NaN, 4, NaN, 8] ⟶ [1, 2, 4, 8]
当前回答
Pandas引入了一个将所有数据类型转换为缺失值的选项。
https://pandas.pydata.org/docs/user_guide/missing_data.html
np.isnan()函数不兼容所有的数据类型,例如:
>>> import numpy as np
>>> values = [np.nan, "x", "y"]
>>> np.isnan(values)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
pd.isna()和pd.notna()函数与许多数据类型兼容,pandas引入了pd.notna()函数。NA值:
>>> import numpy as np
>>> import pandas as pd
>>> values = pd.Series([np.nan, "x", "y"])
>>> values
0 NaN
1 x
2 y
dtype: object
>>> values.loc[pd.isna(values)]
0 NaN
dtype: object
>>> values.loc[pd.isna(values)] = pd.NA
>>> values.loc[pd.isna(values)]
0 <NA>
dtype: object
>>> values
0 <NA>
1 x
2 y
dtype: object
#
# using map with lambda, or a list comprehension
#
>>> values = [np.nan, "x", "y"]
>>> list(map(lambda x: pd.NA if pd.isna(x) else x, values))
[<NA>, 'x', 'y']
>>> [pd.NA if pd.isna(x) else x for x in values]
[<NA>, 'x', 'y']
其他回答
如果你使用numpy
# first get the indices where the values are finite
ii = np.isfinite(x)
# second get the values
x = x[ii]
对我来说,@jmetz的答案不工作,但是使用pandas isnull()做到了。
x = x[~pd.isnull(x)]
接受的答案改变了2d数组的形状。 我在这里提供了一个解决方案,使用Pandas dropna()功能。 它适用于1D和2D数组。在2D情况下,可以选择weather删除包含np.nan的行或列。
import pandas as pd
import numpy as np
def dropna(arr, *args, **kwarg):
assert isinstance(arr, np.ndarray)
dropped=pd.DataFrame(arr).dropna(*args, **kwarg).values
if arr.ndim==1:
dropped=dropped.flatten()
return dropped
x = np.array([1400, 1500, 1600, np.nan, np.nan, np.nan ,1700])
y = np.array([[1400, 1500, 1600], [np.nan, 0, np.nan] ,[1700,1800,np.nan]] )
print('='*20+' 1D Case: ' +'='*20+'\nInput:\n',x,sep='')
print('\ndropna:\n',dropna(x),sep='')
print('\n\n'+'='*20+' 2D Case: ' +'='*20+'\nInput:\n',y,sep='')
print('\ndropna (rows):\n',dropna(y),sep='')
print('\ndropna (columns):\n',dropna(y,axis=1),sep='')
print('\n\n'+'='*20+' x[np.logical_not(np.isnan(x))] for 2D: ' +'='*20+'\nInput:\n',y,sep='')
print('\ndropna:\n',x[np.logical_not(np.isnan(x))],sep='')
结果:
==================== 1D Case: ====================
Input:
[1400. 1500. 1600. nan nan nan 1700.]
dropna:
[1400. 1500. 1600. 1700.]
==================== 2D Case: ====================
Input:
[[1400. 1500. 1600.]
[ nan 0. nan]
[1700. 1800. nan]]
dropna (rows):
[[1400. 1500. 1600.]]
dropna (columns):
[[1500.]
[ 0.]
[1800.]]
==================== x[np.logical_not(np.isnan(x))] for 2D: ====================
Input:
[[1400. 1500. 1600.]
[ nan 0. nan]
[1700. 1800. nan]]
dropna:
[1400. 1500. 1600. 1700.]
@jmetz的答案可能是大多数人需要的;然而,它会产生一个一维数组,例如,使其无法用于删除矩阵中的整行或整列。
为此,应该将逻辑数组缩减为一维,然后对目标数组进行索引。例如,下面将删除至少有一个NaN值的行:
x = x[~numpy.isnan(x).any(axis=1)]
点击这里查看更多细节。
Pandas引入了一个将所有数据类型转换为缺失值的选项。
https://pandas.pydata.org/docs/user_guide/missing_data.html
np.isnan()函数不兼容所有的数据类型,例如:
>>> import numpy as np
>>> values = [np.nan, "x", "y"]
>>> np.isnan(values)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
pd.isna()和pd.notna()函数与许多数据类型兼容,pandas引入了pd.notna()函数。NA值:
>>> import numpy as np
>>> import pandas as pd
>>> values = pd.Series([np.nan, "x", "y"])
>>> values
0 NaN
1 x
2 y
dtype: object
>>> values.loc[pd.isna(values)]
0 NaN
dtype: object
>>> values.loc[pd.isna(values)] = pd.NA
>>> values.loc[pd.isna(values)]
0 <NA>
dtype: object
>>> values
0 <NA>
1 x
2 y
dtype: object
#
# using map with lambda, or a list comprehension
#
>>> values = [np.nan, "x", "y"]
>>> list(map(lambda x: pd.NA if pd.isna(x) else x, values))
[<NA>, 'x', 'y']
>>> [pd.NA if pd.isna(x) else x for x in values]
[<NA>, 'x', 'y']