我需要在c++中加载和使用CSV文件数据。在这一点上,它实际上只是一个以逗号分隔的解析器(即不用担心转义新行和逗号)。主要需要的是逐行解析器,它将在每次调用方法时为下一行返回一个向量。
我发现这篇文章看起来很有前途: http://www.boost.org/doc/libs/1_35_0/libs/spirit/example/fundamental/list_parser.cpp
我从未使用过Boost's Spirit,但我愿意尝试一下。但前提是我忽略了一个更直接的解决方案。
我需要在c++中加载和使用CSV文件数据。在这一点上,它实际上只是一个以逗号分隔的解析器(即不用担心转义新行和逗号)。主要需要的是逐行解析器,它将在每次调用方法时为下一行返回一个向量。
我发现这篇文章看起来很有前途: http://www.boost.org/doc/libs/1_35_0/libs/spirit/example/fundamental/list_parser.cpp
我从未使用过Boost's Spirit,但我愿意尝试一下。但前提是我忽略了一个更直接的解决方案。
当前回答
使用Spirit来解析csv并不过分。Spirit非常适合微解析任务。例如,使用Spirit 2.1,它就像:
bool r = phrase_parse(first, last,
// Begin grammar
(
double_ % ','
)
,
// End grammar
space, v);
向量v被值填满了。在刚刚与Boost 1.41一起发布的新的Spirit 2.1文档中,有一系列教程涉及到这一点。
本教程从简单到复杂。CSV解析器呈现在中间的某个位置,并涉及使用Spirit的各种技术。生成的代码与手写代码一样紧凑。检查生成的汇编程序!
其他回答
您还可以看看Qt库的功能。
它有正则表达式支持,QString类有很好的方法,例如split()返回QStringList,通过使用提供的分隔符分割原始字符串获得的字符串列表。应该足以为csv文件..
要获得具有给定标题名的列,我使用以下方法:c++继承Qt问题qstring
可以使用std::regex。
根据文件大小和可用内存,可以逐行读取,也可以完全在std::string中读取。
读取文件可以使用:
std::ifstream t("file.txt");
std::string sin((std::istreambuf_iterator<char>(t)),
std::istreambuf_iterator<char>());
然后你可以和这个相匹配,它实际上是根据你的需要定制的。
std::regex word_regex(",\\s]+");
auto what =
std::sregex_iterator(sin.begin(), sin.end(), word_regex);
auto wend = std::sregex_iterator();
std::vector<std::string> v;
for (;what!=wend ; wend) {
std::smatch match = *what;
v.push_back(match.str());
}
我写了一个很好的解析CSV文件的方法,我认为我应该把它作为一个答案:
#include <algorithm>
#include <fstream>
#include <iostream>
#include <stdlib.h>
#include <stdio.h>
struct CSVDict
{
std::vector< std::string > inputImages;
std::vector< double > inputLabels;
};
/**
\brief Splits the string
\param str String to split
\param delim Delimiter on the basis of which splitting is to be done
\return results Output in the form of vector of strings
*/
std::vector<std::string> stringSplit( const std::string &str, const std::string &delim )
{
std::vector<std::string> results;
for (size_t i = 0; i < str.length(); i++)
{
std::string tempString = "";
while ((str[i] != *delim.c_str()) && (i < str.length()))
{
tempString += str[i];
i++;
}
results.push_back(tempString);
}
return results;
}
/**
\brief Parse the supplied CSV File and obtain Row and Column information.
Assumptions:
1. Header information is in first row
2. Delimiters are only used to differentiate cell members
\param csvFileName The full path of the file to parse
\param inputColumns The string of input columns which contain the data to be used for further processing
\param inputLabels The string of input labels based on which further processing is to be done
\param delim The delimiters used in inputColumns and inputLabels
\return Vector of Vector of strings: Collection of rows and columns
*/
std::vector< CSVDict > parseCSVFile( const std::string &csvFileName, const std::string &inputColumns, const std::string &inputLabels, const std::string &delim )
{
std::vector< CSVDict > return_CSVDict;
std::vector< std::string > inputColumnsVec = stringSplit(inputColumns, delim), inputLabelsVec = stringSplit(inputLabels, delim);
std::vector< std::vector< std::string > > returnVector;
std::ifstream inFile(csvFileName.c_str());
int row = 0;
std::vector< size_t > inputColumnIndeces, inputLabelIndeces;
for (std::string line; std::getline(inFile, line, '\n');)
{
CSVDict tempDict;
std::vector< std::string > rowVec;
line.erase(std::remove(line.begin(), line.end(), '"'), line.end());
rowVec = stringSplit(line, delim);
// for the first row, record the indeces of the inputColumns and inputLabels
if (row == 0)
{
for (size_t i = 0; i < rowVec.size(); i++)
{
for (size_t j = 0; j < inputColumnsVec.size(); j++)
{
if (rowVec[i] == inputColumnsVec[j])
{
inputColumnIndeces.push_back(i);
}
}
for (size_t j = 0; j < inputLabelsVec.size(); j++)
{
if (rowVec[i] == inputLabelsVec[j])
{
inputLabelIndeces.push_back(i);
}
}
}
}
else
{
for (size_t i = 0; i < inputColumnIndeces.size(); i++)
{
tempDict.inputImages.push_back(rowVec[inputColumnIndeces[i]]);
}
for (size_t i = 0; i < inputLabelIndeces.size(); i++)
{
double test = std::atof(rowVec[inputLabelIndeces[i]].c_str());
tempDict.inputLabels.push_back(std::atof(rowVec[inputLabelIndeces[i]].c_str()));
}
return_CSVDict.push_back(tempDict);
}
row++;
}
return return_CSVDict;
}
另一个类似于Loki Astari的答案的解决方案,在c++ 11中。这里的行是给定类型的std::元组。代码扫描一行,然后扫描到每个分隔符,然后将值直接转换并转储到元组中(使用一些模板代码)。
for (auto row : csv<std::string, int, float>(file, ',')) {
std::cout << "first col: " << std::get<0>(row) << std::endl;
}
优势:
非常干净,使用简单,只有c++ 11。 自动类型转换为std::tuple<t1,…>通过算子>>。
缺少什么:
转义和引用 没有错误处理的情况下畸形的CSV。
主要代码:
#include <iterator>
#include <sstream>
#include <string>
namespace csvtools {
/// Read the last element of the tuple without calling recursively
template <std::size_t idx, class... fields>
typename std::enable_if<idx >= std::tuple_size<std::tuple<fields...>>::value - 1>::type
read_tuple(std::istream &in, std::tuple<fields...> &out, const char delimiter) {
std::string cell;
std::getline(in, cell, delimiter);
std::stringstream cell_stream(cell);
cell_stream >> std::get<idx>(out);
}
/// Read the @p idx-th element of the tuple and then calls itself with @p idx + 1 to
/// read the next element of the tuple. Automatically falls in the previous case when
/// reaches the last element of the tuple thanks to enable_if
template <std::size_t idx, class... fields>
typename std::enable_if<idx < std::tuple_size<std::tuple<fields...>>::value - 1>::type
read_tuple(std::istream &in, std::tuple<fields...> &out, const char delimiter) {
std::string cell;
std::getline(in, cell, delimiter);
std::stringstream cell_stream(cell);
cell_stream >> std::get<idx>(out);
read_tuple<idx + 1, fields...>(in, out, delimiter);
}
}
/// Iterable csv wrapper around a stream. @p fields the list of types that form up a row.
template <class... fields>
class csv {
std::istream &_in;
const char _delim;
public:
typedef std::tuple<fields...> value_type;
class iterator;
/// Construct from a stream.
inline csv(std::istream &in, const char delim) : _in(in), _delim(delim) {}
/// Status of the underlying stream
/// @{
inline bool good() const {
return _in.good();
}
inline const std::istream &underlying_stream() const {
return _in;
}
/// @}
inline iterator begin();
inline iterator end();
private:
/// Reads a line into a stringstream, and then reads the line into a tuple, that is returned
inline value_type read_row() {
std::string line;
std::getline(_in, line);
std::stringstream line_stream(line);
std::tuple<fields...> retval;
csvtools::read_tuple<0, fields...>(line_stream, retval, _delim);
return retval;
}
};
/// Iterator; just calls recursively @ref csv::read_row and stores the result.
template <class... fields>
class csv<fields...>::iterator {
csv::value_type _row;
csv *_parent;
public:
typedef std::input_iterator_tag iterator_category;
typedef csv::value_type value_type;
typedef std::size_t difference_type;
typedef csv::value_type * pointer;
typedef csv::value_type & reference;
/// Construct an empty/end iterator
inline iterator() : _parent(nullptr) {}
/// Construct an iterator at the beginning of the @p parent csv object.
inline iterator(csv &parent) : _parent(parent.good() ? &parent : nullptr) {
++(*this);
}
/// Read one row, if possible. Set to end if parent is not good anymore.
inline iterator &operator++() {
if (_parent != nullptr) {
_row = _parent->read_row();
if (!_parent->good()) {
_parent = nullptr;
}
}
return *this;
}
inline iterator operator++(int) {
iterator copy = *this;
++(*this);
return copy;
}
inline csv::value_type const &operator*() const {
return _row;
}
inline csv::value_type const *operator->() const {
return &_row;
}
bool operator==(iterator const &other) {
return (this == &other) or (_parent == nullptr and other._parent == nullptr);
}
bool operator!=(iterator const &other) {
return not (*this == other);
}
};
template <class... fields>
typename csv<fields...>::iterator csv<fields...>::begin() {
return iterator(*this);
}
template <class... fields>
typename csv<fields...>::iterator csv<fields...>::end() {
return iterator();
}
我在GitHub上放了一个小的工作示例;我一直用它来解析一些数值数据,它达到了它的目的。
c++ String工具箱库(StrTk)有一个令牌网格类,它允许你从文本文件、字符串或字符缓冲区加载数据,并以行-列的方式解析/处理它们。
您可以指定行分隔符和列分隔符,或者只使用默认值。
void foo()
{
std::string data = "1,2,3,4,5\n"
"0,2,4,6,8\n"
"1,3,5,7,9\n";
strtk::token_grid grid(data,data.size(),",");
for(std::size_t i = 0; i < grid.row_count(); ++i)
{
strtk::token_grid::row_type r = grid.row(i);
for(std::size_t j = 0; j < r.size(); ++j)
{
std::cout << r.get<int>(j) << "\t";
}
std::cout << std::endl;
}
std::cout << std::endl;
}
更多的例子可以在这里找到