如何迭代由空格分隔的单词组成的字符串中的单词?

注意,我对C字符串函数或那种字符操作/访问不感兴趣。比起效率,我更喜欢优雅。我当前的解决方案:

#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main() {
    string s = "Somewhere down the road";
    istringstream iss(s);

    do {
        string subs;
        iss >> subs;
        cout << "Substring: " << subs << endl;
    } while (iss);
}

当前回答

我使用以下方法

void split(string in, vector<string>& parts, char separator) {
    string::iterator  ts, curr;
    ts = curr = in.begin();
    for(; curr <= in.end(); curr++ ) {
        if( (curr == in.end() || *curr == separator) && curr > ts )
               parts.push_back( string( ts, curr ));
        if( curr == in.end() )
               break;
        if( *curr == separator ) ts = curr + 1; 
    }
}

PlasmaHH,我忘记包含删除带有空格的标记的额外检查(curr>ts)。

其他回答

并不是说我们需要更多的答案,但这是我受到埃文·特兰启发后想到的。

std::vector <std::string> split(const string &input, auto delimiter, bool skipEmpty=true) {
  /*
  Splits a string at each delimiter and returns these strings as a string vector.
  If the delimiter is not found then nothing is returned.
  If skipEmpty is true then strings between delimiters that are 0 in length will be skipped.
  */
  bool delimiterFound = false;
  int pos=0, pPos=0;
  std::vector <std::string> result;
  while (true) {
    pos = input.find(delimiter,pPos);
    if (pos != std::string::npos) {
      if (skipEmpty==false or pos-pPos > 0) // if empty values are to be kept or not
        result.push_back(input.substr(pPos,pos-pPos));
      delimiterFound = true;
    } else {
      if (pPos < input.length() and delimiterFound) {
        if (skipEmpty==false or input.length()-pPos > 0) // if empty values are to be kept or not
          result.push_back(input.substr(pPos,input.length()-pPos));
      }
      break;
    }
    pPos = pos+1;
  }
  return result;
}

STL还没有这样的方法。

但是,您可以通过使用std::string::C_str()成员来使用C的strtok()函数,也可以编写自己的函数。下面是我在快速谷歌搜索(“STL字符串分割”)后找到的代码示例:

void Tokenize(const string& str,
              vector<string>& tokens,
              const string& delimiters = " ")
{
    // Skip delimiters at beginning.
    string::size_type lastPos = str.find_first_not_of(delimiters, 0);
    // Find first "non-delimiter".
    string::size_type pos     = str.find_first_of(delimiters, lastPos);

    while (string::npos != pos || string::npos != lastPos)
    {
        // Found a token, add it to the vector.
        tokens.push_back(str.substr(lastPos, pos - lastPos));
        // Skip delimiters.  Note the "not_of"
        lastPos = str.find_first_not_of(delimiters, pos);
        // Find next "non-delimiter"
        pos = str.find_first_of(delimiters, lastPos);
    }
}

摘自:http://oopweb.com/CPP/Documents/CPPHOWTO/Volume/C++编程-HOWTO-7.html

如果您对代码示例有疑问,请留下评论,我会解释。

仅仅因为它没有实现称为迭代器的typedef或重载<<运算符,并不意味着它是错误的代码。我经常使用C函数。例如,printf和scanf都比std::cin和std::cout快(很明显),fopen语法对二进制类型更友好,它们也倾向于生成更小的EXE。

不要被这种“优雅胜过性能”的交易所吸引。

下面的代码使用strtok()将字符串拆分为标记,并将标记存储在向量中。

#include <iostream>
#include <algorithm>
#include <vector>
#include <string>

using namespace std;


char one_line_string[] = "hello hi how are you nice weather we are having ok then bye";
char seps[]   = " ,\t\n";
char *token;



int main()
{
   vector<string> vec_String_Lines;
   token = strtok( one_line_string, seps );

   cout << "Extracting and storing data in a vector..\n\n\n";

   while( token != NULL )
   {
      vec_String_Lines.push_back(token);
      token = strtok( NULL, seps );
   }
     cout << "Displaying end result in vector line storage..\n\n";

    for ( int i = 0; i < vec_String_Lines.size(); ++i)
    cout << vec_String_Lines[i] << "\n";
    cout << "\n\n\n";


return 0;
}

使用vector作为基类的快速版本,可完全访问其所有运算符:

    // Split string into parts.
    class Split : public std::vector<std::string>
    {
        public:
            Split(const std::string& str, char* delimList)
            {
               size_t lastPos = 0;
               size_t pos = str.find_first_of(delimList);

               while (pos != std::string::npos)
               {
                    if (pos != lastPos)
                        push_back(str.substr(lastPos, pos-lastPos));
                    lastPos = pos + 1;
                    pos = str.find_first_of(delimList, lastPos);
               }
               if (lastPos < str.length())
                   push_back(str.substr(lastPos, pos-lastPos));
            }
    };

用于填充STL集的示例:

std::set<std::string> words;
Split split("Hello,World", ",");
words.insert(split.begin(), split.end());

这类似于堆栈溢出问题:如何在C++中标记字符串?。需要Boost外部库

#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>

using namespace std;
using namespace boost;

int main(int argc, char** argv)
{
    string text = "token  test\tstring";

    char_separator<char> sep(" \t");
    tokenizer<char_separator<char>> tokens(text, sep);
    for (const string& t : tokens)
    {
        cout << t << "." << endl;
    }
}