如何迭代由空格分隔的单词组成的字符串中的单词?

注意,我对C字符串函数或那种字符操作/访问不感兴趣。比起效率,我更喜欢优雅。我当前的解决方案:

#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main() {
    string s = "Somewhere down the road";
    istringstream iss(s);

    do {
        string subs;
        iss >> subs;
        cout << "Substring: " << subs << endl;
    } while (iss);
}

当前回答

最小的解决方案是一个函数,它将std::字符串和一组分隔符(作为std::string)作为输入,并返回std:::字符串的std::向量。

#include <string>
#include <vector>

std::vector<std::string>
tokenize(const std::string& str, const std::string& delimiters)
{
  using ssize_t = std::string::size_type;
  const ssize_t str_ln = str.length();
  ssize_t last_pos = 0;

  // container for the extracted tokens
  std::vector<std::string> tokens;

  while (last_pos < str_ln) {
      // find the position of the next delimiter
      ssize_t pos = str.find_first_of(delimiters, last_pos);

      // if no delimiters found, set the position to the length of string
      if (pos == std::string::npos)
         pos = str_ln;

      // if the substring is nonempty, store it in the container
      if (pos != last_pos)
         tokens.emplace_back(str.substr(last_pos, pos - last_pos));

      // scan past the previous substring
      last_pos = pos + 1;
  }

  return tokens;
}

用法示例:

#include <iostream>

int main()
{
    std::string input_str = "one + two * (three - four)!!---! ";
    const char* delimiters = "! +- (*)";
    std::vector<std::string> tokens = tokenize(input_str, delimiters);

    std::cout << "input = '" << input_str << "'\n"
              << "delimiters = '" << delimiters << "'\n"
              << "nr of tokens found = " << tokens.size() << std::endl;
    for (const std::string& tk : tokens) {
        std::cout << "token = '" << tk << "'\n";
    }

  return 0;
}

其他回答

这类似于堆栈溢出问题:如何在C++中标记字符串?。需要Boost外部库

#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>

using namespace std;
using namespace boost;

int main(int argc, char** argv)
{
    string text = "token  test\tstring";

    char_separator<char> sep(" \t");
    tokenizer<char_separator<char>> tokens(text, sep);
    for (const string& t : tokens)
    {
        cout << t << "." << endl;
    }
}

我使用这个simpleton是因为我们得到了字符串类“特殊”(即非标准):

void splitString(const String &s, const String &delim, std::vector<String> &result) {
    const int l = delim.length();
    int f = 0;
    int i = s.indexOf(delim,f);
    while (i>=0) {
        String token( i-f > 0 ? s.substring(f,i-f) : "");
        result.push_back(token);
        f=i+l;
        i = s.indexOf(delim,f);
    }
    String token = s.substring(f);
    result.push_back(token);
}

我编写了以下代码。您可以指定分隔符,它可以是字符串。结果类似于Java的String.split,结果中包含空字符串。

例如,如果我们调用split(“ABCPICKABCANYABCTWO:ABC”,“ABC”),结果如下:

0  <len:0>
1 PICK <len:4>
2 ANY <len:3>
3 TWO: <len:4>
4  <len:0>

代码:

vector <string> split(const string& str, const string& delimiter = " ") {
    vector <string> tokens;

    string::size_type lastPos = 0;
    string::size_type pos = str.find(delimiter, lastPos);

    while (string::npos != pos) {
        // Found a token, add it to the vector.
        cout << str.substr(lastPos, pos - lastPos) << endl;
        tokens.push_back(str.substr(lastPos, pos - lastPos));
        lastPos = pos + delimiter.size();
        pos = str.find(delimiter, lastPos);
    }

    tokens.push_back(str.substr(lastPos, str.size() - lastPos));
    return tokens;
}
#include <iostream>
#include <regex>

using namespace std;

int main() {
   string s = "foo bar  baz";
   regex e("\\s+");
   regex_token_iterator<string::iterator> i(s.begin(), s.end(), e, -1);
   regex_token_iterator<string::iterator> end;
   while (i != end)
      cout << " [" << *i++ << "]";
}

IMO,这是最接近python的re.split()。有关regex_token_iterator的更多信息,请参阅cplusplus.com。-1(regex_token_iterator tor中的第四个参数)是序列中不匹配的部分,使用匹配作为分隔符。

这是我解决这个问题的方法:

vector<string> get_tokens(string str) {
    vector<string> dt;
    stringstream ss;
    string tmp; 
    ss << str;
    for (size_t i; !ss.eof(); ++i) {
        ss >> tmp;
        dt.push_back(tmp);
    }
    return dt;
}

此函数返回字符串向量。