如何迭代字符串的单词？

如何迭代由空格分隔的单词组成的字符串中的单词？

注意，我对C字符串函数或那种字符操作/访问不感兴趣。比起效率，我更喜欢优雅。我当前的解决方案：

#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main() {
    string s = "Somewhere down the road";
    istringstream iss(s);

    do {
        string subs;
        iss >> subs;
        cout << "Substring: " << subs << endl;
    } while (iss);
}

当前回答

虽然有一些答案提供了C++20解决方案，但自从发布以来，已经做了一些更改，并将其作为缺陷报告应用于C++20。正因为如此，解决方案变得更短、更好：

#include <iostream>
#include <ranges>
#include <string_view>

namespace views = std::views;
using str = std::string_view;

constexpr str text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit.";

auto splitByWords(str input) {
    return input
    | views::split(' ')
    | views::transform([](auto &&r) -> str {
        return {r.begin(), r.end()};
    });
}

auto main() -> int {
    for (str &&word : splitByWords(text)) {
        std::cout << word << '\n';
    }
}

到今天为止，它仍然只在GCC的主干分支（Godbolt链接）上可用。它基于两个更改：P1391迭代器构造函数用于std:：string_view和P2210 DR修复std:：views:：split以保留范围类型。

在C++23中，不需要任何转换样板，因为P1989向std:：string_view:添加了一个范围构造函数

#include <iostream>
#include <ranges>
#include <string_view>

namespace views = std::views;

constexpr std::string_view text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit.";

auto main() -> int {
    for (std::string_view&& word : text | views::split(' ')) {
        std::cout << word << '\n';
    }
}

（螺栓连杆）

2021-08-04 17:42:09

其他回答

下面的代码使用strtok（）将字符串拆分为标记，并将标记存储在向量中。

#include <iostream>
#include <algorithm>
#include <vector>
#include <string>

using namespace std;


char one_line_string[] = "hello hi how are you nice weather we are having ok then bye";
char seps[]   = " ,\t\n";
char *token;



int main()
{
   vector<string> vec_String_Lines;
   token = strtok( one_line_string, seps );

   cout << "Extracting and storing data in a vector..\n\n\n";

   while( token != NULL )
   {
      vec_String_Lines.push_back(token);
      token = strtok( NULL, seps );
   }
     cout << "Displaying end result in vector line storage..\n\n";

    for ( int i = 0; i < vec_String_Lines.size(); ++i)
    cout << vec_String_Lines[i] << "\n";
    cout << "\n\n\n";


return 0;
}

2011-12-10 13:58:22

我有一种与其他解决方案非常不同的方法，它提供了很多其他解决方案所缺乏的价值，但当然也有其缺点。这是一个工作实现，示例是在单词周围放置＜tag＞＜/tag＞。

首先，这个问题可以通过一个循环解决，不需要额外的内存，只需考虑四种逻辑情况。从概念上讲，我们对边界感兴趣。我们的代码应该反映出这一点：让我们遍历字符串，一次查看两个字符，记住字符串的开头和结尾都有特殊情况。

缺点是我们必须编写实现，这有点冗长，但大多是方便的样板。

好处是我们编写了实现，因此很容易根据特定的需要定制它，例如区分左和写单词边界，使用任何一组分隔符，或处理其他情况，例如无边界或错误位置。

using namespace std;

#include <iostream>
#include <string>

#include <cctype>

typedef enum boundary_type_e {
    E_BOUNDARY_TYPE_ERROR = -1,
    E_BOUNDARY_TYPE_NONE,
    E_BOUNDARY_TYPE_LEFT,
    E_BOUNDARY_TYPE_RIGHT,
} boundary_type_t;

typedef struct boundary_s {
    boundary_type_t type;
    int pos;
} boundary_t;

bool is_delim_char(int c) {
    return isspace(c); // also compare against any other chars you want to use as delimiters
}

bool is_word_char(int c) {
    return ' ' <= c && c <= '~' && !is_delim_char(c);
}

boundary_t maybe_word_boundary(string str, int pos) {
    int len = str.length();
    if (pos < 0 || pos >= len) {
        return (boundary_t){.type = E_BOUNDARY_TYPE_ERROR};
    } else {
        if (pos == 0 && is_word_char(str[pos])) {
            // if the first character is word-y, we have a left boundary at the beginning
            return (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos};
        } else if (pos == len - 1 && is_word_char(str[pos])) {
            // if the last character is word-y, we have a right boundary left of the null terminator
            return (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1};
        } else if (!is_word_char(str[pos]) && is_word_char(str[pos + 1])) {
            // if we have a delimiter followed by a word char, we have a left boundary left of the word char
            return (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos + 1};
        } else if (is_word_char(str[pos]) && !is_word_char(str[pos + 1])) {
            // if we have a word char followed by a delimiter, we have a right boundary right of the word char
            return (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1};
        }
        return (boundary_t){.type = E_BOUNDARY_TYPE_NONE};
    }
}

int main() {
    string str;
    getline(cin, str);

    int len = str.length();
    for (int i = 0; i < len; i++) {
        boundary_t boundary = maybe_word_boundary(str, i);
        if (boundary.type == E_BOUNDARY_TYPE_LEFT) {
            // whatever
        } else if (boundary.type == E_BOUNDARY_TYPE_RIGHT) {
            // whatever
        }
    }
}

正如您所看到的，代码非常容易理解和微调，代码的实际使用非常简短和简单。使用C++不应阻止我们编写最简单、最容易定制的代码，即使这意味着不使用STL。我认为这是Linus Torvalds所说的“品味”的一个例子，因为我们已经消除了所有不需要的逻辑，而写作风格自然允许在需要处理的时候处理更多的案件。

可以改进此代码的可能是使用enum类，在maybe_word_boundary中接受指向is_word_char的函数指针，而不是直接调用is_word_char，并传递lambda。

2019-01-16 15:14:15

我用这个分隔符分隔字符串。第一个将结果放入预先构建的向量中，第二个返回新向量。

#include <string>
#include <sstream>
#include <vector>
#include <iterator>

template <typename Out>
void split(const std::string &s, char delim, Out result) {
    std::istringstream iss(s);
    std::string item;
    while (std::getline(iss, item, delim)) {
        *result++ = item;
    }
}

std::vector<std::string> split(const std::string &s, char delim) {
    std::vector<std::string> elems;
    split(s, delim, std::back_inserter(elems));
    return elems;
}

请注意，此解决方案不会跳过空令牌，因此下面将找到4项，其中一项为空：

std::vector<std::string> x = split("one:two::three", ':');

2008-10-25 18:21:27

有一个名为strtok的函数。

#include<string>
using namespace std;

vector<string> split(char* str,const char* delim)
{
    char* saveptr;
    char* token = strtok_r(str,delim,&saveptr);

    vector<string> result;

    while(token != NULL)
    {
        result.push_back(token);
        token = strtok_r(NULL,delim,&saveptr);
    }
    return result;
}

2010-06-14 12:17:08

这类似于堆栈溢出问题：如何在C++中标记字符串？。需要Boost外部库

#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>

using namespace std;
using namespace boost;

int main(int argc, char** argv)
{
    string text = "token  test\tstring";

    char_separator<char> sep(" \t");
    tokenizer<char_separator<char>> tokens(text, sep);
    for (const string& t : tokens)
    {
        cout << t << "." << endl;
    }
}

2008-10-25 10:58:25

如何迭代字符串的单词？

推荐文章

最新文章

标签