我有一个熊猫数据框架如下:

      itm Date                  Amount 
67    420 2012-09-30 00:00:00   65211
68    421 2012-09-09 00:00:00   29424
69    421 2012-09-16 00:00:00   29877
70    421 2012-09-23 00:00:00   30990
71    421 2012-09-30 00:00:00   61303
72    485 2012-09-09 00:00:00   71781
73    485 2012-09-16 00:00:00     NaN
74    485 2012-09-23 00:00:00   11072
75    485 2012-09-30 00:00:00  113702
76    489 2012-09-09 00:00:00   64731
77    489 2012-09-16 00:00:00     NaN

当我尝试应用一个函数到金额列,我得到以下错误:

ValueError: cannot convert float NaN to integer

我尝试使用数学模块中的.isnan应用一个函数 我已经尝试了pandas .replace属性 我尝试了pandas 0.9中的.sparse data属性 我还尝试了在函数中if NaN == NaN语句。 我也看了这篇文章我如何替换NA值与零在一个R数据框架?同时看一些其他的文章。 我尝试过的所有方法都不起作用或不能识别NaN。 任何提示或解决方案将不胜感激。


当前回答

将所有nan替换为0

df = df.fillna(0)

其他回答

下面的代码适合我。

import pandas

df = pandas.read_csv('somefile.txt')

df = df.fillna(0)

替换熊猫中的na值

df['column_name'].fillna(value_to_be_replaced,inplace=True)

如果inplace = False,它不会更新df (dataframe),而是返回修改后的值。

如果要将其转换为pandas数据框架,也可以使用fillna来完成。

import numpy as np
df=np.array([[1,2,3, np.nan]])

import pandas as pd
df=pd.DataFrame(df)
df.fillna(0)

这将返回以下内容:

     0    1    2   3
0  1.0  2.0  3.0 NaN
>>> df.fillna(0)
     0    1    2    3
0  1.0  2.0  3.0  0.0

我认为这也是值得提及和解释的 fillna()的参数配置 如方法,轴,限制等。

从我们的文档来看:

Series.fillna(value=None, method=None, axis=None, 
                 inplace=False, limit=None, downcast=None)
Fill NA/NaN values using the specified method.

参数

value [scalar, dict, Series, or DataFrame] Value to use to 
 fill holes (e.g. 0), alternately a dict/Series/DataFrame 
 of values specifying which value to use for each index 
 (for a Series) or column (for a DataFrame). Values not in 
 the dict/Series/DataFrame will not be filled. This 
 value cannot be a list.

method [{‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, 
 default None] Method to use for filling holes in 
 reindexed Series pad / ffill: propagate last valid 
 observation forward to next valid backfill / bfill: 
 use next valid observation to fill gap axis 
 [{0 or ‘index’}] Axis along which to fill missing values.

inplace [bool, default False] If True, fill 
 in-place. Note: this will modify any other views
 on this object (e.g., a no-copy slice for a 
 column in a DataFrame).

limit [int,defaultNone] If method is specified, 
 this is the maximum number of consecutive NaN 
 values to forward/backward fill. In other words, 
 if there is a gap with more than this number of 
 consecutive NaNs, it will only be partially filled. 
 If method is not specified, this is the maximum 
 number of entries along the entire axis where NaNs
 will be filled. Must be greater than 0 if not None.

downcast [dict, default is None] A dict of item->dtype 
 of what to downcast if possible, or the string ‘infer’ 
 which will try to downcast to an appropriate equal 
 type (e.g. float64 to int64 if possible).

好的。让我们从方法= Parameter this开始 有正向填充(ffill)和反向填充(bfill) Ffill正在复制前面的内容 无缺失值。

例如:

import pandas as pd
import numpy as np
inp = [{'c1':10, 'c2':np.nan, 'c3':200}, {'c1':np.nan,'c2':110, 'c3':210}, {'c1':12,'c2':np.nan, 'c3':220},{'c1':12,'c2':130, 'c3':np.nan},{'c1':12,'c2':np.nan, 'c3':240}]
df = pd.DataFrame(inp)

  c1       c2      c3
0   10.0     NaN      200.0
1   NaN   110.0 210.0
2   12.0     NaN      220.0
3   12.0     130.0 NaN
4   12.0     NaN      240.0

填充:

df.fillna(method="ffill")

    c1     c2      c3
0   10.0      NaN 200.0
1   10.0    110.0   210.0
2   12.0    110.0   220.0
3   12.0    130.0   220.0
4   12.0    130.0   240.0

向后填:

df.fillna(method="bfill")

    c1      c2     c3
0   10.0    110.0   200.0
1   12.0    110.0   210.0
2   12.0    130.0   220.0
3   12.0    130.0   240.0
4   12.0      NaN   240.0

Axis参数帮助我们选择填充的方向:

填补方向:

填充:

Axis = 1 
Method = 'ffill'
----------->
  direction 

df.fillna(method="ffill", axis=1)

       c1   c2      c3
0   10.0     10.0   200.0
1    NaN    110.0   210.0
2   12.0     12.0   220.0
3   12.0    130.0   130.0
4   12.0    12.0    240.0

Axis = 0 # by default 
Method = 'ffill'
|
|       # direction 
|
V
e.g: # This is the ffill default
df.fillna(method="ffill", axis=0)

    c1     c2      c3
0   10.0      NaN   200.0
1   10.0    110.0   210.0
2   12.0    110.0   220.0
3   12.0    130.0   220.0
4   12.0    130.0   240.0

bfill:

axis= 0
method = 'bfill'
^
|
|
|
df.fillna(method="bfill", axis=0)

    c1     c2      c3
0   10.0    110.0   200.0
1   12.0    110.0   210.0
2   12.0    130.0   220.0
3   12.0    130.0   240.0
4   12.0      NaN   240.0

axis = 1
method = 'bfill'
<-----------
df.fillna(method="bfill", axis=1)
        c1     c2       c3
0    10.0   200.0   200.0
1   110.0   110.0   210.0
2    12.0   220.0   220.0
3    12.0   130.0     NaN
4    12.0   240.0   240.0

# alias:
#  'fill' == 'pad' 
#   bfill == backfill

极限参数:

df
    c1     c2      c3
0   10.0      NaN   200.0
1    NaN    110.0   210.0
2   12.0      NaN   220.0
3   12.0    130.0     NaN
4   12.0      NaN   240.0

只替换跨列的第一个NaN元素:

df.fillna(value = 'Unavailable', limit=1)
            c1           c2          c3
0          10.0 Unavailable       200.0
1   Unavailable       110.0       210.0
2          12.0         NaN       220.0
3          12.0       130.0 Unavailable
4          12.0         NaN       240.0

df.fillna(value = 'Unavailable', limit=2)

           c1            c2          c3
0          10.0 Unavailable       200.0
1   Unavailable       110.0       210.0
2          12.0 Unavailable       220.0
3          12.0       130.0 Unavailable
4          12.0         NaN       240.0

低垂的参数:

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   c1      4 non-null      float64
 1   c2      2 non-null      float64
 2   c3      4 non-null      float64
dtypes: float64(3)
memory usage: 248.0 bytes

df.fillna(method="ffill",downcast='infer').info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   c1      5 non-null      int64  
 1   c2      4 non-null      float64
 2   c3      5 non-null      int64  
dtypes: float64(1), int64(2)
memory usage: 248.0 bytes

考虑到上表中的特定列Amount是整数类型。以下是一个解决方案:

df['Amount'] = df.Amount.fillna(0).astype(int)

类似地,你可以用各种数据类型来填充它,比如float, str等等。

特别地,我会考虑datatype来比较同一列的不同值。