如何迭代由空格分隔的单词组成的字符串中的单词?

注意,我对C字符串函数或那种字符操作/访问不感兴趣。比起效率,我更喜欢优雅。我当前的解决方案:

#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main() {
    string s = "Somewhere down the road";
    istringstream iss(s);

    do {
        string subs;
        iss >> subs;
        cout << "Substring: " << subs << endl;
    } while (iss);
}

当前回答

如果您需要通过非空格符号解析字符串,则字符串流可能很方便:

string s = "Name:JAck; Spouse:Susan; ...";
string dummy, name, spouse;

istringstream iss(s);
getline(iss, dummy, ':');
getline(iss, name, ';');
getline(iss, dummy, ':');
getline(iss, spouse, ';')

其他回答

我使用这个simpleton是因为我们得到了字符串类“特殊”(即非标准):

void splitString(const String &s, const String &delim, std::vector<String> &result) {
    const int l = delim.length();
    int f = 0;
    int i = s.indexOf(delim,f);
    while (i>=0) {
        String token( i-f > 0 ? s.substring(f,i-f) : "");
        result.push_back(token);
        f=i+l;
        i = s.indexOf(delim,f);
    }
    String token = s.substring(f);
    result.push_back(token);
}

我有一种与其他解决方案非常不同的方法,它提供了很多其他解决方案所缺乏的价值,但当然也有其缺点。这是一个工作实现,示例是在单词周围放置<tag></tag>。

首先,这个问题可以通过一个循环解决,不需要额外的内存,只需考虑四种逻辑情况。从概念上讲,我们对边界感兴趣。我们的代码应该反映出这一点:让我们遍历字符串,一次查看两个字符,记住字符串的开头和结尾都有特殊情况。

缺点是我们必须编写实现,这有点冗长,但大多是方便的样板。

好处是我们编写了实现,因此很容易根据特定的需要定制它,例如区分左和写单词边界,使用任何一组分隔符,或处理其他情况,例如无边界或错误位置。

using namespace std;

#include <iostream>
#include <string>

#include <cctype>

typedef enum boundary_type_e {
    E_BOUNDARY_TYPE_ERROR = -1,
    E_BOUNDARY_TYPE_NONE,
    E_BOUNDARY_TYPE_LEFT,
    E_BOUNDARY_TYPE_RIGHT,
} boundary_type_t;

typedef struct boundary_s {
    boundary_type_t type;
    int pos;
} boundary_t;

bool is_delim_char(int c) {
    return isspace(c); // also compare against any other chars you want to use as delimiters
}

bool is_word_char(int c) {
    return ' ' <= c && c <= '~' && !is_delim_char(c);
}

boundary_t maybe_word_boundary(string str, int pos) {
    int len = str.length();
    if (pos < 0 || pos >= len) {
        return (boundary_t){.type = E_BOUNDARY_TYPE_ERROR};
    } else {
        if (pos == 0 && is_word_char(str[pos])) {
            // if the first character is word-y, we have a left boundary at the beginning
            return (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos};
        } else if (pos == len - 1 && is_word_char(str[pos])) {
            // if the last character is word-y, we have a right boundary left of the null terminator
            return (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1};
        } else if (!is_word_char(str[pos]) && is_word_char(str[pos + 1])) {
            // if we have a delimiter followed by a word char, we have a left boundary left of the word char
            return (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos + 1};
        } else if (is_word_char(str[pos]) && !is_word_char(str[pos + 1])) {
            // if we have a word char followed by a delimiter, we have a right boundary right of the word char
            return (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1};
        }
        return (boundary_t){.type = E_BOUNDARY_TYPE_NONE};
    }
}

int main() {
    string str;
    getline(cin, str);

    int len = str.length();
    for (int i = 0; i < len; i++) {
        boundary_t boundary = maybe_word_boundary(str, i);
        if (boundary.type == E_BOUNDARY_TYPE_LEFT) {
            // whatever
        } else if (boundary.type == E_BOUNDARY_TYPE_RIGHT) {
            // whatever
        }
    }
}

正如您所看到的,代码非常容易理解和微调,代码的实际使用非常简短和简单。使用C++不应阻止我们编写最简单、最容易定制的代码,即使这意味着不使用STL。我认为这是Linus Torvalds所说的“品味”的一个例子,因为我们已经消除了所有不需要的逻辑,而写作风格自然允许在需要处理的时候处理更多的案件。

可以改进此代码的可能是使用enum类,在maybe_word_boundary中接受指向is_word_char的函数指针,而不是直接调用is_word_char,并传递lambda。

谢谢@Jairo Abdiel Toribio Cisneros。它对我有效,但您的函数返回一些空元素。因此,对于没有空的返回,我编辑了以下内容:

std::vector<std::string> split(std::string str, const char* delim) {
    std::vector<std::string> v;
    std::string tmp;

    for(std::string::const_iterator i = str.begin(); i <= str.end(); ++i) {
        if(*i != *delim && i != str.end()) {
            tmp += *i;
        } else {
            if (tmp.length() > 0) {
                v.push_back(tmp);
            }
            tmp = "";
        }
    }

    return v;
}

使用:

std::string s = "one:two::three";
std::string delim = ":";
std::vector<std::string> vv = split(s, delim.c_str());

仅为方便:

template<class V, typename T>
bool in(const V &v, const T &el) {
    return std::find(v.begin(), v.end(), el) != v.end();
}

基于多个分隔符的实际拆分:

std::vector<std::string> split(const std::string &s,
                               const std::vector<char> &delims) {
    std::vector<std::string> res;
    auto stuff = [&delims](char c) { return !in(delims, c); };
    auto space = [&delims](char c) { return in(delims, c); };
    auto first = std::find_if(s.begin(), s.end(), stuff);
    while (first != s.end()) {
        auto last = std::find_if(first, s.end(), space);
        res.push_back(std::string(first, last));
        first = std::find_if(last + 1, s.end(), stuff);
    }
    return res;
}

用法:

int main() {
    std::string s = "   aaa,  bb  cc ";
    for (auto el: split(s, {' ', ','}))
        std::cout << el << std::endl;
    return 0;
}

这里有一个仅使用标准正则表达式库的正则表达式解决方案。(我有点生疏,所以可能会有一些语法错误,但这至少是一般的想法)

#include <regex.h>
#include <string.h>
#include <vector.h>

using namespace std;

vector<string> split(string s){
    regex r ("\\w+"); //regex matches whole words, (greedy, so no fragment words)
    regex_iterator<string::iterator> rit ( s.begin(), s.end(), r );
    regex_iterator<string::iterator> rend; //iterators to iterate thru words
    vector<string> result<regex_iterator>(rit, rend);
    return result;  //iterates through the matches to fill the vector
}