NumPy或Pandas:保持数组类型为整数，同时具有NaN值

是否有一种首选的方法来保持numpy数组的数据类型固定为int(或int64或其他)，同时仍然有一个元素列在numpy. nan中?

特别是，我正在将一个内部数据结构转换为Pandas DataFrame。在我们的结构中，我们有整数类型的列，这些列仍然有NaN(但是列的dtype是int)。如果我们把它设为DataFrame，它似乎把所有东西都重铸为浮点数，但我们希望它是int。

想法吗?

试过的东西:

我尝试在pandas下使用from_records()函数。DataFrame，与coerce_float=False，这没有帮助。我还尝试使用NumPy掩码数组，使用NaN fill_value，这也没有工作。所有这些都会导致列数据类型变成浮点数。

当前回答

从版本0.24开始，这个功能已经被添加到pandas中。

此时，它需要使用扩展dtype 'Int64'(大写)，而不是默认的dtype 'Int64'(小写)。

2018-08-24 03:36:20

其他回答

从版本0.24开始，这个功能已经被添加到pandas中。

此时，它需要使用扩展dtype 'Int64'(大写)，而不是默认的dtype 'Int64'(小写)。

2018-08-24 03:36:20

熊猫v1.00 +的新功能

您没有(也不能)使用numpy。再也不会了。现在你有熊猫了。

请阅读:https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html

IntegerArray is currently experimental. Its API or implementation may change without warning. Changed in version 1.0.0: Now uses pandas.NA as the missing value rather than numpy.nan. In Working with missing data, we saw that pandas primarily uses NaN to represent missing data. Because NaN is a float, this forces an array of integers with any missing values to become floating point. In some cases, this may not matter much. But if your integer column is, say, an identifier, casting to float can be problematic. Some integers cannot even be represented as floating point numbers.

2021-04-26 16:35:03

我知道OP只要求NumPy或Pandas，但我认为值得一提的是polar作为支持所要求的功能的替代方案。

在polar中，整数列中任何缺失的值都是空值，并且该列仍然是整数列。

更多信息请参见polar -用户指南>来自Pandas。

2022-08-18 14:00:00

这不是所有情况下的解决方案，但我(基因组坐标)已经使用0作为NaN

a3['MapInfo'] = a3['MapInfo'].fillna(0).astype(int)

这至少允许使用适当的“本机”列类型，像减法，比较等操作可以正常工作

2018-01-12 13:08:53

如果文本数据中有空格，则通常为整数的列将被转换为float64 dtype类型的浮点数，因为int64 dtype不能处理空值。这可能导致模式不一致，如果您加载多个文件，其中一些带有空格(最终将为float64，而其他没有空格的文件将最终为int64

这段代码将尝试将任何数字类型的列转换为Int64(而不是Int64)，因为Int64可以处理空值

import pandas as pd
import numpy as np

#show datatypes before transformation
mydf.dtypes

for c in mydf.select_dtypes(np.number).columns:
    try:
        mydf[c] = mydf[c].astype('Int64')
        print('casted {} as Int64'.format(c))
    except:
        print('could not cast {} to Int64'.format(c))

#show datatypes after transformation
mydf.dtypes

2020-06-17 14:33:46

NumPy或Pandas:保持数组类型为整数，同时具有NaN值

推荐文章

最新文章

标签