我需要在c++中加载和使用CSV文件数据。在这一点上,它实际上只是一个以逗号分隔的解析器(即不用担心转义新行和逗号)。主要需要的是逐行解析器,它将在每次调用方法时为下一行返回一个向量。
我发现这篇文章看起来很有前途: http://www.boost.org/doc/libs/1_35_0/libs/spirit/example/fundamental/list_parser.cpp
我从未使用过Boost's Spirit,但我愿意尝试一下。但前提是我忽略了一个更直接的解决方案。
我需要在c++中加载和使用CSV文件数据。在这一点上,它实际上只是一个以逗号分隔的解析器(即不用担心转义新行和逗号)。主要需要的是逐行解析器,它将在每次调用方法时为下一行返回一个向量。
我发现这篇文章看起来很有前途: http://www.boost.org/doc/libs/1_35_0/libs/spirit/example/fundamental/list_parser.cpp
我从未使用过Boost's Spirit,但我愿意尝试一下。但前提是我忽略了一个更直接的解决方案。
当前回答
另一个CSV I/O库可以在这里找到:
http://code.google.com/p/fast-cpp-csv-parser/
#include "csv.h"
int main(){
io::CSVReader<3> in("ram.csv");
in.read_header(io::ignore_extra_column, "vendor", "size", "speed");
std::string vendor; int size; double speed;
while(in.read_row(vendor, size, speed)){
// do stuff with the data
}
}
其他回答
您需要做的第一件事是确保文件存在。来完成 这你只需要尝试打开文件流的路径。在你 打开文件流使用stream.fail()查看它是否如预期的那样工作, 与否。
bool fileExists(string fileName)
{
ifstream test;
test.open(fileName.c_str());
if (test.fail())
{
test.close();
return false;
}
else
{
test.close();
return true;
}
}
您还必须验证所提供的文件是正确的文件类型。 要做到这一点,您需要查看提供的文件路径直到 您可以找到文件扩展名。一旦你有了文件扩展名,请确保 它是一个。csv文件。
bool verifyExtension(string filename)
{
int period = 0;
for (unsigned int i = 0; i < filename.length(); i++)
{
if (filename[i] == '.')
period = i;
}
string extension;
for (unsigned int i = period; i < filename.length(); i++)
extension += filename[i];
if (extension == ".csv")
return true;
else
return false;
}
此函数将返回稍后在错误消息中使用的文件扩展名。
string getExtension(string filename)
{
int period = 0;
for (unsigned int i = 0; i < filename.length(); i++)
{
if (filename[i] == '.')
period = i;
}
string extension;
if (period != 0)
{
for (unsigned int i = period; i < filename.length(); i++)
extension += filename[i];
}
else
extension = "NO FILE";
return extension;
}
这个函数实际上会调用上面创建的错误检查,然后解析文件。
void parseFile(string fileName)
{
if (fileExists(fileName) && verifyExtension(fileName))
{
ifstream fs;
fs.open(fileName.c_str());
string fileCommand;
while (fs.good())
{
string temp;
getline(fs, fileCommand, '\n');
for (unsigned int i = 0; i < fileCommand.length(); i++)
{
if (fileCommand[i] != ',')
temp += fileCommand[i];
else
temp += " ";
}
if (temp != "\0")
{
// Place your code here to run the file.
}
}
fs.close();
}
else if (!fileExists(fileName))
{
cout << "Error: The provided file does not exist: " << fileName << endl;
if (!verifyExtension(fileName))
{
if (getExtension(fileName) != "NO FILE")
cout << "\tCheck the file extension." << endl;
else
cout << "\tThere is no file in the provided path." << endl;
}
}
else if (!verifyExtension(fileName))
{
if (getExtension(fileName) != "NO FILE")
cout << "Incorrect file extension provided: " << getExtension(fileName) << endl;
else
cout << "There is no file in the following path: " << fileName << endl;
}
}
下面是Unicode CSV解析器的另一个实现(使用wchar_t)。我写了一部分,乔纳森·莱弗勒写了剩下的部分。
注意:此解析器旨在尽可能地复制Excel的行为,特别是在导入损坏或格式错误的CSV文件时。
这是最初的问题-用多行字段和转义双引号解析CSV文件
这是作为SSCCE(简短,自包含,正确示例)的代码。
#include <stdbool.h>
#include <wchar.h>
#include <wctype.h>
extern const wchar_t *nextCsvField(const wchar_t *p, wchar_t sep, bool *newline);
// Returns a pointer to the start of the next field,
// or zero if this is the last field in the CSV
// p is the start position of the field
// sep is the separator used, i.e. comma or semicolon
// newline says whether the field ends with a newline or with a comma
const wchar_t *nextCsvField(const wchar_t *p, wchar_t sep, bool *newline)
{
// Parse quoted sequences
if ('"' == p[0]) {
p++;
while (1) {
// Find next double-quote
p = wcschr(p, L'"');
// If we don't find it or it's the last symbol
// then this is the last field
if (!p || !p[1])
return 0;
// Check for "", it is an escaped double-quote
if (p[1] != '"')
break;
// Skip the escaped double-quote
p += 2;
}
}
// Find next newline or comma.
wchar_t newline_or_sep[4] = L"\n\r ";
newline_or_sep[2] = sep;
p = wcspbrk(p, newline_or_sep);
// If no newline or separator, this is the last field.
if (!p)
return 0;
// Check if we had newline.
*newline = (p[0] == '\r' || p[0] == '\n');
// Handle "\r\n", otherwise just increment
if (p[0] == '\r' && p[1] == '\n')
p += 2;
else
p++;
return p;
}
static wchar_t *csvFieldData(const wchar_t *fld_s, const wchar_t *fld_e, wchar_t *buffer, size_t buflen)
{
wchar_t *dst = buffer;
wchar_t *end = buffer + buflen - 1;
const wchar_t *src = fld_s;
if (*src == L'"')
{
const wchar_t *p = src + 1;
while (p < fld_e && dst < end)
{
if (p[0] == L'"' && p+1 < fld_s && p[1] == L'"')
{
*dst++ = p[0];
p += 2;
}
else if (p[0] == L'"')
{
p++;
break;
}
else
*dst++ = *p++;
}
src = p;
}
while (src < fld_e && dst < end)
*dst++ = *src++;
if (dst >= end)
return 0;
*dst = L'\0';
return(buffer);
}
static void dissect(const wchar_t *line)
{
const wchar_t *start = line;
const wchar_t *next;
bool eol;
wprintf(L"Input %3zd: [%.*ls]\n", wcslen(line), wcslen(line)-1, line);
while ((next = nextCsvField(start, L',', &eol)) != 0)
{
wchar_t buffer[1024];
wprintf(L"Raw Field: [%.*ls] (eol = %d)\n", (next - start - eol), start, eol);
if (csvFieldData(start, next-1, buffer, sizeof(buffer)/sizeof(buffer[0])) != 0)
wprintf(L"Field %3zd: [%ls]\n", wcslen(buffer), buffer);
start = next;
}
}
static const wchar_t multiline[] =
L"First field of first row,\"This field is multiline\n"
"\n"
"but that's OK because it's enclosed in double quotes, and this\n"
"is an escaped \"\" double quote\" but this one \"\" is not\n"
" \"This is second field of second row, but it is not multiline\n"
" because it doesn't start \n"
" with an immediate double quote\"\n"
;
int main(void)
{
wchar_t line[1024];
while (fgetws(line, sizeof(line)/sizeof(line[0]), stdin))
dissect(line);
dissect(multiline);
return 0;
}
这是一个旧线程,但它仍然在搜索结果的顶部,所以我添加我的解决方案使用std::stringstream和一个简单的字符串替换方法由Yves Baumes我在这里找到。
下面的例子将逐行读取文件,忽略以//开头的注释行,并将其他行解析为字符串、int和double的组合。Stringstream进行解析,但希望字段由空格分隔,因此我使用stringreplace首先将逗号转换为空格。它可以处理制表符,但不处理带引号的字符串。
坏的或丢失的输入被简单地忽略,这可能是好事,也可能不是好事,这取决于您的情况。
#include <string>
#include <sstream>
#include <fstream>
void StringReplace(std::string& str, const std::string& oldStr, const std::string& newStr)
// code by Yves Baumes
// http://stackoverflow.com/questions/1494399/how-do-i-search-find-and-replace-in-a-standard-string
{
size_t pos = 0;
while((pos = str.find(oldStr, pos)) != std::string::npos)
{
str.replace(pos, oldStr.length(), newStr);
pos += newStr.length();
}
}
void LoadCSV(std::string &filename) {
std::ifstream stream(filename);
std::string in_line;
std::string Field;
std::string Chan;
int ChanType;
double Scale;
int Import;
while (std::getline(stream, in_line)) {
StringReplace(in_line, ",", " ");
std::stringstream line(in_line);
line >> Field >> Chan >> ChanType >> Scale >> Import;
if (Field.substr(0,2)!="//") {
// do your stuff
// this is CBuilder code for demonstration, sorry
ShowMessage((String)Field.c_str() + "\n" + Chan.c_str() + "\n" + IntToStr(ChanType) + "\n" +FloatToStr(Scale) + "\n" +IntToStr(Import));
}
}
}
不好意思,但是为了隐藏几行代码,这似乎是非常复杂的语法。
为什么不这样呢:
/**
Read line from a CSV file
@param[in] fp file pointer to open file
@param[in] vls reference to vector of strings to hold next line
*/
void readCSV( FILE *fp, std::vector<std::string>& vls )
{
vls.clear();
if( ! fp )
return;
char buf[10000];
if( ! fgets( buf,999,fp) )
return;
std::string s = buf;
int p,q;
q = -1;
// loop over columns
while( 1 ) {
p = q;
q = s.find_first_of(",\n",p+1);
if( q == -1 )
break;
vls.push_back( s.substr(p+1,q-p-1) );
}
}
int _tmain(int argc, _TCHAR* argv[])
{
std::vector<std::string> vls;
FILE * fp = fopen( argv[1], "r" );
if( ! fp )
return 1;
readCSV( fp, vls );
readCSV( fp, vls );
readCSV( fp, vls );
std::cout << "row 3, col 4 is " << vls[3].c_str() << "\n";
return 0;
}
您可以使用fopen,fscanf函数打开和读取.csv文件,但重要的是解析数据。使用分隔符解析数据的最简单方法。对于.csv,分隔符为','。
假设你的data1.csv文件如下所示:
A,45,76,01
B,77,67,02
C,63,76,03
D,65,44,04
您可以标记数据并存储在字符数组中,然后使用atoi()等函数进行适当的转换
FILE *fp;
char str1[10], str2[10], str3[10], str4[10];
fp = fopen("G:\\data1.csv", "r");
if(NULL == fp)
{
printf("\nError in opening file.");
return 0;
}
while(EOF != fscanf(fp, " %[^,], %[^,], %[^,], %s, %s, %s, %s ", str1, str2, str3, str4))
{
printf("\n%s %s %s %s", str1, str2, str3, str4);
}
fclose(fp);
[^,], ^ -它颠倒了逻辑,意思是匹配任何不包含逗号的字符串,然后最后,表示匹配终止前一个字符串的逗号。