我需要在c++中加载和使用CSV文件数据。在这一点上,它实际上只是一个以逗号分隔的解析器(即不用担心转义新行和逗号)。主要需要的是逐行解析器,它将在每次调用方法时为下一行返回一个向量。

我发现这篇文章看起来很有前途: http://www.boost.org/doc/libs/1_35_0/libs/spirit/example/fundamental/list_parser.cpp

我从未使用过Boost's Spirit,但我愿意尝试一下。但前提是我忽略了一个更直接的解决方案。


当前回答

另一个CSV I/O库可以在这里找到:

http://code.google.com/p/fast-cpp-csv-parser/

#include "csv.h"

int main(){
  io::CSVReader<3> in("ram.csv");
  in.read_header(io::ignore_extra_column, "vendor", "size", "speed");
  std::string vendor; int size; double speed;
  while(in.read_row(vendor, size, speed)){
    // do stuff with the data
  }
}

其他回答

另一个CSV I/O库可以在这里找到:

http://code.google.com/p/fast-cpp-csv-parser/

#include "csv.h"

int main(){
  io::CSVReader<3> in("ram.csv");
  in.read_header(io::ignore_extra_column, "vendor", "size", "speed");
  std::string vendor; int size; double speed;
  while(in.read_row(vendor, size, speed)){
    // do stuff with the data
  }
}

我写了一个只有头文件的c++ 11 CSV解析器。它经过了良好的测试,快速,支持整个CSV规范(带引号的字段,引号中的分隔符/结束符,引号转义等),并且可以配置为不符合规范的CSV。

配置是通过一个流畅的接口完成的:

// constructor accepts any input stream
CsvParser parser = CsvParser(std::cin)
  .delimiter(';')    // delimited by ; instead of ,
  .quote('\'')       // quoted fields use ' instead of "
  .terminator('\0'); // terminated by \0 instead of by \r\n, \n, or \r

解析只是一个基于范围的for循环:

#include <iostream>
#include "../parser.hpp"

using namespace aria::csv;

int main() {
  std::ifstream f("some_file.csv");
  CsvParser parser(f);

  for (auto& row : parser) {
    for (auto& field : row) {
      std::cout << field << " | ";
    }
    std::cout << std::endl;
  }
}

我需要一个易于使用的c++库来解析CSV文件,但找不到任何可用的库,所以我最终构建了一个。 Rapidcsv是一个c++ 11的纯头库,它可以直接访问已解析的列(或行),作为选择的数据类型的向量。例如:

#include <iostream>
#include <vector>
#include <rapidcsv.h>

int main()
{
  rapidcsv::Document doc("../tests/msft.csv");

  std::vector<float> close = doc.GetColumn<float>("Close");
  std::cout << "Read " << close.size() << " values." << std::endl;
}

如果您正在使用Visual Studio / MFC,下面的解决方案可能会使您的工作更轻松。它支持Unicode和MBCS,有注释,除了CString之外没有其他依赖项,对我来说工作得很好。它不支持在带引号的字符串中嵌入换行符,但我不在乎,只要它在这种情况下不崩溃,它不会崩溃。

总体策略是,将带引号的字符串和空字符串作为特殊情况处理,其余使用Tokenize。对于带引号的字符串,策略是找到真正的结束引号,跟踪是否遇到了连续的引号对。如果是,则使用Replace将成对转换为单个。毫无疑问,有更有效的方法,但在我的案例中,性能还不够重要,不足以证明进一步优化的合理性。

class CParseCSV {
public:
// Construction
    CParseCSV(const CString& sLine);

// Attributes
    bool    GetString(CString& sDest);

protected:
    CString m_sLine;    // line to extract tokens from
    int     m_nLen;     // line length in characters
    int     m_iPos;     // index of current position
};

CParseCSV::CParseCSV(const CString& sLine) : m_sLine(sLine)
{
    m_nLen = m_sLine.GetLength();
    m_iPos = 0;
}

bool CParseCSV::GetString(CString& sDest)
{
    if (m_iPos < 0 || m_iPos > m_nLen)  // if position out of range
        return false;
    if (m_iPos == m_nLen) { // if at end of string
        sDest.Empty();  // return empty token
        m_iPos = -1;    // really done now
        return true;
    }
    if (m_sLine[m_iPos] == '\"') {  // if current char is double quote
        m_iPos++;   // advance to next char
        int iTokenStart = m_iPos;
        bool    bHasEmbeddedQuotes = false;
        while (m_iPos < m_nLen) {   // while more chars to parse
            if (m_sLine[m_iPos] == '\"') {  // if current char is double quote
                // if next char exists and is also double quote
                if (m_iPos < m_nLen - 1 && m_sLine[m_iPos + 1] == '\"') {
                    // found pair of consecutive double quotes
                    bHasEmbeddedQuotes = true;  // request conversion
                    m_iPos++;   // skip first quote in pair
                } else  // next char doesn't exist or is normal
                    break;  // found closing quote; exit loop
            }
            m_iPos++;   // advance to next char
        }
        sDest = m_sLine.Mid(iTokenStart, m_iPos - iTokenStart);
        if (bHasEmbeddedQuotes) // if string contains embedded quote pairs
            sDest.Replace(_T("\"\""), _T("\""));    // convert pairs to singles
        m_iPos += 2;    // skip closing quote and trailing delimiter if any
    } else if (m_sLine[m_iPos] == ',') {    // else if char is comma
        sDest.Empty();  // return empty token
        m_iPos++;   // advance to next char
    } else {    // else get next comma-delimited token
        sDest = m_sLine.Tokenize(_T(","), m_iPos);
    }
    return true;
}

// calling code should look something like this:

    CStdioFile  fIn(pszPath, CFile::modeRead);
    CString sLine, sToken;
    while (fIn.ReadString(sLine)) { // for each line of input file
        if (!sLine.IsEmpty()) { // ignore blank lines
            CParseCSV   csv(sLine);
            while (csv.GetString(sToken)) {
                // do something with sToken here
            }
        }
    }

我的版本只使用标准c++ 11库。它很好地处理Excel CSV引用:

spam eggs,"foo,bar","""fizz buzz"""
1.23,4.567,-8.00E+09

代码是作为有限状态机编写的,每次只消耗一个字符。我认为这更容易解释。

#include <istream>
#include <string>
#include <vector>

enum class CSVState {
    UnquotedField,
    QuotedField,
    QuotedQuote
};

std::vector<std::string> readCSVRow(const std::string &row) {
    CSVState state = CSVState::UnquotedField;
    std::vector<std::string> fields {""};
    size_t i = 0; // index of the current field
    for (char c : row) {
        switch (state) {
            case CSVState::UnquotedField:
                switch (c) {
                    case ',': // end of field
                              fields.push_back(""); i++;
                              break;
                    case '"': state = CSVState::QuotedField;
                              break;
                    default:  fields[i].push_back(c);
                              break; }
                break;
            case CSVState::QuotedField:
                switch (c) {
                    case '"': state = CSVState::QuotedQuote;
                              break;
                    default:  fields[i].push_back(c);
                              break; }
                break;
            case CSVState::QuotedQuote:
                switch (c) {
                    case ',': // , after closing quote
                              fields.push_back(""); i++;
                              state = CSVState::UnquotedField;
                              break;
                    case '"': // "" -> "
                              fields[i].push_back('"');
                              state = CSVState::QuotedField;
                              break;
                    default:  // end of quote
                              state = CSVState::UnquotedField;
                              break; }
                break;
        }
    }
    return fields;
}

/// Read CSV file, Excel dialect. Accept "quoted fields ""with quotes"""
std::vector<std::vector<std::string>> readCSV(std::istream &in) {
    std::vector<std::vector<std::string>> table;
    std::string row;
    while (!in.eof()) {
        std::getline(in, row);
        if (in.bad() || in.fail()) {
            break;
        }
        auto fields = readCSVRow(row);
        table.push_back(fields);
    }
    return table;
}