如何迭代字符串的单词？

如何迭代由空格分隔的单词组成的字符串中的单词？

注意，我对C字符串函数或那种字符操作/访问不感兴趣。比起效率，我更喜欢优雅。我当前的解决方案：

#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main() {
    string s = "Somewhere down the road";
    istringstream iss(s);

    do {
        string subs;
        iss >> subs;
        cout << "Substring: " << subs << endl;
    } while (iss);
}

当前回答

#include <vector>
#include <string>
#include <sstream>

int main()
{
    std::string str("Split me by whitespaces");
    std::string buf;                 // Have a buffer string
    std::stringstream ss(str);       // Insert the string into a stream

    std::vector<std::string> tokens; // Create vector to hold our words

    while (ss >> buf)
        tokens.push_back(buf);

    return 0;
}

2011-03-06 05:52:15

其他回答

这类似于堆栈溢出问题：如何在C++中标记字符串？。需要Boost外部库

#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>

using namespace std;
using namespace boost;

int main(int argc, char** argv)
{
    string text = "token  test\tstring";

    char_separator<char> sep(" \t");
    tokenizer<char_separator<char>> tokens(text, sep);
    for (const string& t : tokens)
    {
        cout << t << "." << endl;
    }
}

2008-10-25 10:58:25

我用这个分隔符分隔字符串。第一个将结果放入预先构建的向量中，第二个返回新向量。

#include <string>
#include <sstream>
#include <vector>
#include <iterator>

template <typename Out>
void split(const std::string &s, char delim, Out result) {
    std::istringstream iss(s);
    std::string item;
    while (std::getline(iss, item, delim)) {
        *result++ = item;
    }
}

std::vector<std::string> split(const std::string &s, char delim) {
    std::vector<std::string> elems;
    split(s, delim, std::back_inserter(elems));
    return elems;
}

请注意，此解决方案不会跳过空令牌，因此下面将找到4项，其中一项为空：

std::vector<std::string> x = split("one:two::three", ':');

2008-10-25 18:21:27

这里有一个只使用标准正则表达式库的简单解决方案

#include <regex>
#include <string>
#include <vector>

std::vector<string> Tokenize( const string str, const std::regex regex )
{
    using namespace std;

    std::vector<string> result;

    sregex_token_iterator it( str.begin(), str.end(), regex, -1 );
    sregex_token_iterator reg_end;

    for ( ; it != reg_end; ++it ) {
        if ( !it->str().empty() ) //token could be empty:check
            result.emplace_back( it->str() );
    }

    return result;
}

正则表达式参数允许检查多个参数（空格、逗号等）

我通常只选中空格和逗号分隔，所以我也有这个默认函数：

std::vector<string> TokenizeDefault( const string str )
{
    using namespace std;

    regex re( "[\\s,]+" );

    return Tokenize( str, re );
}

“[\\s，]+”检查空格（\\s）和逗号（，）。

注意，如果要拆分wstring而不是string，

将所有std:：regex更改为std:：wregex将所有sregex_token_iterator更改为wsregex_token_idterator

注意，根据编译器的不同，您可能还希望引用字符串参数。

2014-05-06 05:49:21

对于那些不愿意为代码大小牺牲所有效率并将“高效”视为一种优雅的人来说，以下内容应该是一个最佳选择（我认为模板容器类是一个非常优雅的添加）：

template < class ContainerT >
void tokenize(const std::string& str, ContainerT& tokens,
              const std::string& delimiters = " ", bool trimEmpty = false)
{
   std::string::size_type pos, lastPos = 0, length = str.length();

   using value_type = typename ContainerT::value_type;
   using size_type  = typename ContainerT::size_type;

   while(lastPos < length + 1)
   {
      pos = str.find_first_of(delimiters, lastPos);
      if(pos == std::string::npos)
      {
         pos = length;
      }

      if(pos != lastPos || !trimEmpty)
         tokens.push_back(value_type(str.data()+lastPos,
               (size_type)pos-lastPos ));

      lastPos = pos + 1;
   }
}

我通常选择使用std:：vector<std:：string>类型作为第二个参数（ContainerT）。。。但在不需要直接访问的情况下，list<>比vector<>快得多，而且您甚至可以创建自己的字符串类，并使用std:：list<subString>之类的方法，其中subString不进行任何复制，从而提高了惊人的速度。

它的速度是这个页面上最快的tokenize的两倍多，几乎是其他页面的5倍。此外，使用完美的参数类型，您可以消除所有字符串和列表副本，以提高速度。

此外，它不执行结果的返回（效率极低），而是将令牌作为引用传递，因此也允许您根据需要使用多个调用来构建令牌。

最后，它允许您指定是否通过最后一个可选参数从结果中删除空标记。

它只需要std:：string。。。其余的是可选的。它不使用流或boost库，但足够灵活，能够自然地接受这些外来类型。

2009-09-29 15:12:11

这是我的方法，切割和分割：

string cut (string& str, const string& del)
{
    string f = str;

    if (in.find_first_of(del) != string::npos)
    {
        f = str.substr(0,str.find_first_of(del));
        str = str.substr(str.find_first_of(del)+del.length());
    }

    return f;
}

vector<string> split (const string& in, const string& del=" ")
{
    vector<string> out();
    string t = in;

    while (t.length() > del.length())
        out.push_back(cut(t,del));

    return out;
}

顺便说一下，如果我能做些什么来优化这个。。

2014-05-21 07:31:01

如何迭代字符串的单词？

推荐文章

最新文章

标签