我有一个熊猫的数据框架,我想写一个CSV文件。
我使用:
df.to_csv('out.csv')
并得到以下错误:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u03b1' in position 20: ordinal not in range(128)
有没有什么方法可以很容易地解决这个问题(即我的数据帧中有unicode字符)?
是否有一种方法来写一个标签分隔文件,而不是一个CSV使用例如'to tab'方法(我不认为存在)?
当您使用to_csv方法将DataFrame对象存储到csv文件中时,您可能不需要存储DataFrame对象的每一行的前面索引。
你可以通过给索引参数传递一个False布尔值来避免这种情况。
有点像:
df.to_csv(file_name, encoding='utf-8', index=False)
如果你的DataFrame对象是这样的:
Color Number
0 red 22
1 blue 10
csv文件将存储:
Color,Number
red,22
blue,10
而不是(当传递默认值True时的情况)
,Color,Number
0,red,22
1,blue,10
要将pandas DataFrame写入CSV文件,您需要DataFrame.to_csv。此函数提供了许多具有合理默认值的参数,您通常需要重写这些参数以适应特定的用例。例如,您可能希望使用不同的分隔符、更改datetime格式或在写入时删除索引。To_csv具有参数,您可以传递这些参数来满足这些需求。
下面的表格列出了写入CSV文件的一些常见场景,以及您可以为它们使用的相应参数。
Footnotes
The default separator is assumed to be a comma (','). Don't change this unless you know you need to.
By default, the index of df is written as the first column. If your DataFrame does not have an index (IOW, the df.index is the default RangeIndex), then you will want to set index=False when writing. To explain this in a different way, if your data DOES have an index, you can (and should) use index=True or just leave it out completely (as the default is True).
It would be wise to set this parameter if you are writing string data so that other applications know how to read your data. This will also avoid any potential UnicodeEncodeErrors you might encounter while saving.
Compression is recommended if you are writing large DataFrames (>100K rows) to disk as it will result in much smaller output files.
OTOH, it will mean the write time will increase (and consequently, the
read time since the file will need to be decompressed).