标记数据错误

我试图使用熊猫操作.csv文件，但我得到这个错误:

pandas.parser.CParserError:标记数据错误。C错误:第3行有2个字段，见12

我试着读过熊猫的文件，但一无所获。

我的代码很简单:

path = 'GOOG Key Ratios.csv'
#print(open(path).read())
data = pd.read_csv(path)

我该如何解决这个问题?我应该使用csv模块还是其他语言?

文件来自晨星公司

当前回答

我有同样的问题，当read_csv: ParserError:错误标记数据。我只是把旧的csv文件保存为一个新的csv文件。问题解决了!

2018-11-26 13:32:41

其他回答

标记数据错误。C错误:第3行有2个字段，见12

这个错误给出了解决问题“Expected 2 fields in line 3, saw 12”的线索，saw 12表示第二行长度为12，第一行长度为2。

当您有如下所示的数据时，如果您跳过行，那么大部分数据将被跳过

data = """1,2,3
1,2,3,4
1,2,3,4,5
1,2
1,2,3,4"""

如果您不想跳过任何行，请执行以下操作

#First lets find the maximum column for all the rows
with open("file_name.csv", 'r') as temp_f:
    # get No of columns in each line
    col_count = [ len(l.split(",")) for l in temp_f.readlines() ]

### Generate column names  (names will be 0, 1, 2, ..., maximum columns - 1)
column_names = [i for i in range(max(col_count))] 

import pandas as pd
# inside range set the maximum value you can see in "Expected 4 fields in line 2, saw 8"
# here will be 8 
data = pd.read_csv("file_name.csv",header = None,names=column_names )

使用range而不是手动设置名称，因为当您有很多列时，这样做会很麻烦。

此外，如果需要使用均匀的数据长度，可以将NaN值填充为0。如。对于聚类(k-means)

new_data = data.fillna(0)

2020-02-16 09:58:45

我自己也遇到过几次这样的问题。几乎每次，原因都是我试图打开的文件一开始就不是一个正确保存的CSV。这里的“适当”是指每一行都有相同数量的分隔符或列。

通常发生这种情况是因为我在Excel中打开了CSV，然后不恰当地保存了它。尽管文件扩展名仍然是. CSV，但纯CSV格式已经被改变了。

任何以pandas to_csv保存的文件都将被正确格式化，不应该有这个问题。但如果你用另一个程序打开它，它可能会改变结构。

希望这能有所帮助。

2016-07-07 17:22:00

大多数有用的答案已经提到了，但是我建议将pandas数据框架保存为parquet文件。Parquet文件没有这个问题，同时它们是内存高效的。

2019-06-11 09:47:59

在我的例子中，分隔符不是默认的“，”，而是Tab。

pd.read_csv(file_name.csv, sep='\\t',lineterminator='\\r', engine='python', header='infer')

注意:“\t”并不像某些来源所建议的那样有效。“\\t”是必需的。

2020-05-04 18:27:09

在我的例子中，这是因为csv文件的第一行和最后两行格式与文件的中间内容不同。

因此，我所做的是将csv文件作为字符串打开，解析字符串的内容，然后使用read_csv获取数据帧。

import io
import pandas as pd

file = open(f'{file_path}/{file_name}', 'r')
content = file.read()

# change new line character from '\r\n' to '\n'
lines = content.replace('\r', '').split('\n')

# Remove the first and last 2 lines of the file
# StringIO can be considered as a file stored in memory
df = pd.read_csv(StringIO("\n".join(lines[2:-2])), header=None)

2019-11-27 01:13:44

标记数据错误

推荐文章

最新文章

标签