如何迭代由空格分隔的单词组成的字符串中的单词?

注意,我对C字符串函数或那种字符操作/访问不感兴趣。比起效率,我更喜欢优雅。我当前的解决方案:

#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main() {
    string s = "Somewhere down the road";
    istringstream iss(s);

    do {
        string subs;
        iss >> subs;
        cout << "Substring: " << subs << endl;
    } while (iss);
}

当前回答

#include <iostream>
#include <regex>

using namespace std;

int main() {
   string s = "foo bar  baz";
   regex e("\\s+");
   regex_token_iterator<string::iterator> i(s.begin(), s.end(), e, -1);
   regex_token_iterator<string::iterator> end;
   while (i != end)
      cout << " [" << *i++ << "]";
}

IMO,这是最接近python的re.split()。有关regex_token_iterator的更多信息,请参阅cplusplus.com。-1(regex_token_iterator tor中的第四个参数)是序列中不匹配的部分,使用匹配作为分隔符。

其他回答

这是我的方法,切割和分割:

string cut (string& str, const string& del)
{
    string f = str;

    if (in.find_first_of(del) != string::npos)
    {
        f = str.substr(0,str.find_first_of(del));
        str = str.substr(str.find_first_of(del)+del.length());
    }

    return f;
}

vector<string> split (const string& in, const string& del=" ")
{
    vector<string> out();
    string t = in;

    while (t.length() > del.length())
        out.push_back(cut(t,del));

    return out;
}

顺便说一下,如果我能做些什么来优化这个。。

C++20终于为我们提供了一个分裂函数。或者更确切地说,是一个范围适配器。螺栓连杆。

#include <iostream>
#include <ranges>
#include <string_view>

namespace ranges = std::ranges;
namespace views = std::views;

using str = std::string_view;

constexpr auto view =
    "Multiple words"
    | views::split(' ')
    | views::transform([](auto &&r) -> str {
        return {
            &*r.begin(),
            static_cast<str::size_type>(ranges::distance(r))
        };
    });

auto main() -> int {
    for (str &&sv : view) {
        std::cout << sv << '\n';
    }
}

没有任何内存分配的C++17版本(std::函数除外)

void iter_words(const std::string_view& input, const std::function<void(std::string_view)>& process_word) {

    auto itr = input.begin();

    auto consume_whitespace = [&]() {
        for(; itr != input.end(); ++itr) {
            if(!isspace(*itr))
                return;
        }
    };

    auto consume_letters = [&]() {
        for(; itr != input.end(); ++itr) {
            if(isspace(*itr))
                return;
        }
    };

    while(true) {
        consume_whitespace();
        if(itr == input.end())
            return;
        auto word_start = itr - input.begin();
        consume_letters();
        auto word_end = itr - input.begin();
        process_word(input.substr(word_start, word_end - word_start));
    }
}

int main() {
    iter_words("foo bar", [](std::string_view sv) {
        std::cout << "Got word: " <<  sv << '\n';
    });
    return 0;
}

LazyString拆分器:

#include <string>
#include <algorithm>
#include <unordered_set>

using namespace std;

class LazyStringSplitter
{
    string::const_iterator start, finish;
    unordered_set<char> chop;

public:

    // Empty Constructor
    explicit LazyStringSplitter()
    {}

    explicit LazyStringSplitter (const string cstr, const string delims)
        : start(cstr.begin())
        , finish(cstr.end())
        , chop(delims.begin(), delims.end())
    {}

    void operator () (const string cstr, const string delims)
    {
        chop.insert(delims.begin(), delims.end());
        start = cstr.begin();
        finish = cstr.end();
    }

    bool empty() const { return (start >= finish); }

    string next()
    {
        // return empty string
        // if ran out of characters
        if (empty())
            return string("");

        auto runner = find_if(start, finish, [&](char c) {
            return chop.count(c) == 1;
        });

        // construct next string
        string ret(start, runner);
        start = runner + 1;

        // Never return empty string
        // + tail recursion makes this method efficient
        return !ret.empty() ? ret : next();
    }
};

我将此方法称为LazyStringSplitter是因为一个原因——它不会一次性拆分字符串。本质上,它的行为类似于python生成器它公开了一个名为next的方法,该方法返回从原始字符串拆分的下一个字符串我使用了c++11STL中的无序集,因此查找分隔符的速度要快得多下面是它的工作原理

测试程序

#include <iostream>
using namespace std;

int main()
{
    LazyStringSplitter splitter;

    // split at the characters ' ', '!', '.', ','
    splitter("This, is a string. And here is another string! Let's test and see how well this does.", " !.,");

    while (!splitter.empty())
        cout << splitter.next() << endl;
    return 0;
}

输出,输出

This
is
a
string
And
here
is
another
string
Let's
test
and
see
how
well
this
does

改进这一点的下一个计划是实施开始和结束方法,以便可以执行以下操作:

vector<string> split_string(splitter.begin(), splitter.end());

每个人都回答了预定义的字符串输入。我认为这个答案将帮助某人进行扫描输入。

我使用令牌向量来保存字符串令牌。这是可选的。

#include <bits/stdc++.h>

using namespace std ;
int main()
{
    string str, token ;
    getline(cin, str) ; // get the string as input
    istringstream ss(str); // insert the string into tokenizer

    vector<string> tokens; // vector tokens holds the tokens

    while (ss >> token) tokens.push_back(token); // splits the tokens
    for(auto x : tokens) cout << x << endl ; // prints the tokens

    return 0;
}


样本输入:

port city international university

样本输出:

port
city
international
university

注意,默认情况下,这将仅适用于空格作为分隔符。您可以使用自定义分隔符。为此,您定制了代码。让分隔符为“,”。所以使用

char delimiter = ',' ;
while(getline(ss, token, delimiter)) tokens.push_back(token) ;

而不是

while (ss >> token) tokens.push_back(token);