是否有一种首选的方法来保持numpy数组的数据类型固定为int(或int64或其他),同时仍然有一个元素列在numpy. nan中?
特别是,我正在将一个内部数据结构转换为Pandas DataFrame。在我们的结构中,我们有整数类型的列,这些列仍然有NaN(但是列的dtype是int)。如果我们把它设为DataFrame,它似乎把所有东西都重铸为浮点数,但我们希望它是int。
想法吗?
试过的东西:
我尝试在pandas下使用from_records()函数。DataFrame,与coerce_float=False,这没有帮助。我还尝试使用NumPy掩码数组,使用NaN fill_value,这也没有工作。所有这些都会导致列数据类型变成浮点数。
熊猫v1.00 +的新功能
您没有(也不能)使用numpy。再也不会了。
现在你有熊猫了。
请阅读:https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html
IntegerArray is currently experimental. Its API or implementation may
change without warning.
Changed in version 1.0.0: Now uses pandas.NA as the missing value
rather than numpy.nan.
In Working with missing data, we saw that pandas primarily uses NaN to
represent missing data. Because NaN is a float, this forces an array
of integers with any missing values to become floating point. In some
cases, this may not matter much. But if your integer column is, say,
an identifier, casting to float can be problematic. Some integers
cannot even be represented as floating point numbers.