将浮点数转换为整数在熊猫?

我一直在处理从CSV导入的数据。Pandas将一些列更改为浮点数，所以现在这些列中的数字显示为浮点数!但是，我需要将它们显示为整数或不带逗号。是否有方法将它们转换为整数或不显示逗号?

当前回答

使用'Int64'支持NaN

Astype (int)和Astype ('int64')不能处理缺失值(numpy int) astype('Int64')(注意大写I)可以处理缺失值(pandas int)

df['A'] = df['A'].astype('Int64') # capital I

这假设您希望将缺失的值保留为NaN。如果你打算归因他们，你可以按照Ryan的建议先填写na。

'Int64'(大写I)的例子

If the floats are already rounded, just use astype: df = pd.DataFrame({'A': [99.0, np.nan, 42.0]}) df['A'] = df['A'].astype('Int64') # A # 0 99 # 1 <NA> # 2 42 If the floats are not rounded yet, round before astype: df = pd.DataFrame({'A': [3.14159, np.nan, 1.61803]}) df['A'] = df['A'].round().astype('Int64') # A # 0 3 # 1 <NA> # 2 2 To read int+NaN data from a file, use dtype='Int64' to avoid the need for converting at all: csv = io.StringIO(''' id,rating foo,5 bar, baz,2 ''') df = pd.read_csv(csv, dtype={'rating': 'Int64'}) # id rating # 0 foo 5 # 1 bar <NA> # 2 baz 2

笔记

'Int64'是Int64Dtype的别名: df['A'] = df['A'].astype(pd.Int64Dtype()) #与astype('Int64')相同大小/签名别名可用: 下界上界 “Int8” -128年 127 “Int16” -32768年 32767年 “Int32” -2147483648年 2147483647年 “Int64” -9223372036854775808年 9223372036854775807年 “UInt8” 0 255 “UInt16” 0 65535年 “UInt32” 0 4294967295年 “UInt64” 0 18446744073709551615年

2022-01-01 12:13:04

其他回答

使用pandas. datafframe .astype(<type>)函数来操作列的dtypes。

>>> df = pd.DataFrame(np.random.rand(3,4), columns=list("ABCD"))
>>> df
          A         B         C         D
0  0.542447  0.949988  0.669239  0.879887
1  0.068542  0.757775  0.891903  0.384542
2  0.021274  0.587504  0.180426  0.574300
>>> df[list("ABCD")] = df[list("ABCD")].astype(int)
>>> df
   A  B  C  D
0  0  0  0  0
1  0  0  0  0
2  0  0  0  0

编辑:

处理缺失值:

>>> df
          A         B     C         D
0  0.475103  0.355453  0.66  0.869336
1  0.260395  0.200287   NaN  0.617024
2  0.517692  0.735613  0.18  0.657106
>>> df[list("ABCD")] = df[list("ABCD")].fillna(0.0).astype(int)
>>> df
   A  B  C  D
0  0  0  0  0
1  0  0  0  0
2  0  0  0  0

2014-01-22 18:49:11

需要转换为int的列也可以在字典中提到，如下所示

df = df.astype({'col1': 'int', 'col2': 'int', 'col3': 'int'})

2020-06-11 07:27:38

将所有浮点列转换为int

>>> df = pd.DataFrame(np.random.rand(5, 4) * 10, columns=list('PQRS'))
>>> print(df)
...     P           Q           R           S
... 0   4.395994    0.844292    8.543430    1.933934
... 1   0.311974    9.519054    6.171577    3.859993
... 2   2.056797    0.836150    5.270513    3.224497
... 3   3.919300    8.562298    6.852941    1.415992
... 4   9.958550    9.013425    8.703142    3.588733

>>> float_col = df.select_dtypes(include=['float64']) # This will select float columns only
>>> # list(float_col.columns.values)

>>> for col in float_col.columns.values:
...     df[col] = df[col].astype('int64')

>>> print(df)
...     P   Q   R   S
... 0   4   0   8   1
... 1   0   9   6   3
... 2   2   0   5   3
... 3   3   8   6   1
... 4   9   9   8   3

2019-03-22 12:24:43

在问题的文本中解释了数据来自csv。Só，我认为显示选项，使转换时，数据读取，而不是之后，是相关的主题。

当在数据框架中导入电子表格或csv时，“只有整数列”通常会转换为浮点数，因为excel将所有数值存储为浮点数，以及底层库的工作方式。

当使用read_excel或read_csv读取文件时，有几个选项可以避免导入后转换:

参数dtype允许传递一个包含列名和目标类型的字典，例如dtype = {"my_column": "Int64"} 参数转换器可以用来传递进行转换的函数，例如用0改变NaN。转换= {"my_column": lambda x: int(x) if x else 0} parameter convert_float将“整型浮点数转换为int(即1.0 - > 1)”，但要注意像NaN这样的极端情况。该参数仅在read_excel中有效

要在现有的数据帧中进行转换，其他注释中已经给出了几种替代方法，但由于v1.0.0 pandas有一个有趣的函数:convert_dtypes，即“使用支持pd.NA的dtypes将列转换为最佳的dtypes”。

为例:

In [3]: import numpy as np                                                                                                                                                                                         

In [4]: import pandas as pd                                                                                                                                                                                        

In [5]: df = pd.DataFrame( 
   ...:     { 
   ...:         "a": pd.Series([1, 2, 3], dtype=np.dtype("int64")), 
   ...:         "b": pd.Series([1.0, 2.0, 3.0], dtype=np.dtype("float")), 
   ...:         "c": pd.Series([1.0, np.nan, 3.0]), 
   ...:         "d": pd.Series([1, np.nan, 3]), 
   ...:     } 
   ...: )                                                                                                                                                                                                          

In [6]: df                                                                                                                                                                                                         
Out[6]: 
   a    b    c    d
0  1  1.0  1.0  1.0
1  2  2.0  NaN  NaN
2  3  3.0  3.0  3.0

In [7]: df.dtypes                                                                                                                                                                                                  
Out[7]: 
a      int64
b    float64
c    float64
d    float64
dtype: object

In [8]: converted = df.convert_dtypes()                                                                                                                                                                            

In [9]: converted.dtypes                                                                                                                                                                                           
Out[9]: 
a    Int64
b    Int64
c    Int64
d    Int64
dtype: object

In [10]: converted                                                                                                                                                                                                 
Out[10]: 
   a  b     c     d
0  1  1     1     1
1  2  2  <NA>  <NA>
2  3  3     3     3

2021-07-01 16:59:47

扩展@Ryan G提到的pandas. datafame .astype(<type>)方法的使用，可以使用errors=ignore参数只转换那些不会产生错误的列，这明显简化了语法。显然，在忽略错误时应该谨慎，但对于这个任务，它非常方便。

>>> df = pd.DataFrame(np.random.rand(3, 4), columns=list('ABCD'))
>>> df *= 10
>>> print(df)
...           A       B       C       D
... 0   2.16861 8.34139 1.83434 6.91706
... 1   5.85938 9.71712 5.53371 4.26542
... 2   0.50112 4.06725 1.99795 4.75698

>>> df['E'] = list('XYZ')
>>> df.astype(int, errors='ignore')
>>> print(df)
...     A   B   C   D   E
... 0   2   8   1   6   X
... 1   5   9   5   4   Y
... 2   0   4   1   4   Z

来自pandas. datafframe .astype文档:

错误:{' raise '， ' ignore '}，默认' raise ' 控制对所提供的dtype的无效数据引发异常。 Raise:允许抛出异常 Ignore:抑制异常。错误时返回原始对象 0.20.0新版功能。

2019-04-02 15:28:57

将浮点数转换为整数在熊猫?

推荐文章

最新文章

标签