标记数据错误

我试图使用熊猫操作.csv文件，但我得到这个错误:

pandas.parser.CParserError:标记数据错误。C错误:第3行有2个字段，见12

我试着读过熊猫的文件，但一无所获。

我的代码很简单:

path = 'GOOG Key Ratios.csv'
#print(open(path).read())
data = pd.read_csv(path)

我该如何解决这个问题?我应该使用csv模块还是其他语言?

文件来自晨星公司

当前回答

我遇到了这个问题，我试图在不传递列名的情况下读取CSV。

df = pd.read_csv(filename, header=None)

我事先在一个列表中指定了列名，然后将它们传递到名称中，它立即解决了这个问题。如果您没有设置列名，您可以创建与数据中可能存在的最大列数量一样多的占位符名称。

col_names = ["col1", "col2", "col3", ...]
df = pd.read_csv(filename, names=col_names)

2019-01-08 18:57:22

其他回答

在我的例子中，这是因为csv文件的第一行和最后两行格式与文件的中间内容不同。

因此，我所做的是将csv文件作为字符串打开，解析字符串的内容，然后使用read_csv获取数据帧。

import io
import pandas as pd

file = open(f'{file_path}/{file_name}', 'r')
content = file.read()

# change new line character from '\r\n' to '\n'
lines = content.replace('\r', '').split('\n')

# Remove the first and last 2 lines of the file
# StringIO can be considered as a file stored in memory
df = pd.read_csv(StringIO("\n".join(lines[2:-2])), header=None)

2019-11-27 01:13:44

你可以试试;

data = pd.read_csv('file1.csv', sep='\t')

2020-09-08 15:58:01

对于那些在linux操作系统上使用Python 3有类似问题的人。

pandas.errors.ParserError: Error tokenizing data. C error: Calling
read(nbytes) on source failed. Try engine='python'.

试一试:

df.read_csv('file.csv', encoding='utf8', engine='python')

2019-10-14 14:54:07

我有一个已有行号的数据集，我使用index_col:

pd.read_csv('train.csv', index_col=0)

2017-06-20 05:28:30

以下是对我有用的(我张贴了这个答案，因为我在谷歌协作笔记本中特别遇到了这个问题):

df = pd.read_csv("/path/foo.csv", delimiter=';', skiprows=0, low_memory=False)

2019-08-20 09:37:20

标记数据错误

推荐文章

最新文章

标签