如何迭代由空格分隔的单词组成的字符串中的单词?
注意,我对C字符串函数或那种字符操作/访问不感兴趣。比起效率,我更喜欢优雅。我当前的解决方案:
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
int main() {
string s = "Somewhere down the road";
istringstream iss(s);
do {
string subs;
iss >> subs;
cout << "Substring: " << subs << endl;
} while (iss);
}
根据Galik的回答,我做了这个。这大部分都在这里,所以我不必一遍又一遍地写。C++仍然没有原生拆分函数,这真是太疯狂了。特征:
应该很快。容易理解(我认为)。合并空节。使用多个分隔符(例如“\r\n”)很简单
#include <string>
#include <vector>
#include <algorithm>
std::vector<std::string> split(const std::string& s, const std::string& delims)
{
using namespace std;
vector<string> v;
// Start of an element.
size_t elemStart = 0;
// We start searching from the end of the previous element, which
// initially is the start of the string.
size_t elemEnd = 0;
// Find the first non-delim, i.e. the start of an element, after the end of the previous element.
while((elemStart = s.find_first_not_of(delims, elemEnd)) != string::npos)
{
// Find the first delem, i.e. the end of the element (or if this fails it is the end of the string).
elemEnd = s.find_first_of(delims, elemStart);
// Add it.
v.emplace_back(s, elemStart, elemEnd == string::npos ? string::npos : elemEnd - elemStart);
}
// When there are no more non-spaces, we are done.
return v;
}
每个人都回答了预定义的字符串输入。我认为这个答案将帮助某人进行扫描输入。
我使用令牌向量来保存字符串令牌。这是可选的。
#include <bits/stdc++.h>
using namespace std ;
int main()
{
string str, token ;
getline(cin, str) ; // get the string as input
istringstream ss(str); // insert the string into tokenizer
vector<string> tokens; // vector tokens holds the tokens
while (ss >> token) tokens.push_back(token); // splits the tokens
for(auto x : tokens) cout << x << endl ; // prints the tokens
return 0;
}
样本输入:
port city international university
样本输出:
port
city
international
university
注意,默认情况下,这将仅适用于空格作为分隔符。您可以使用自定义分隔符。为此,您定制了代码。让分隔符为“,”。所以使用
char delimiter = ',' ;
while(getline(ss, token, delimiter)) tokens.push_back(token) ;
而不是
while (ss >> token) tokens.push_back(token);
我使用以下方法
void split(string in, vector<string>& parts, char separator) {
string::iterator ts, curr;
ts = curr = in.begin();
for(; curr <= in.end(); curr++ ) {
if( (curr == in.end() || *curr == separator) && curr > ts )
parts.push_back( string( ts, curr ));
if( curr == in.end() )
break;
if( *curr == separator ) ts = curr + 1;
}
}
PlasmaHH,我忘记包含删除带有空格的标记的额外检查(curr>ts)。
另一种灵活快速的方式
template<typename Operator>
void tokenize(Operator& op, const char* input, const char* delimiters) {
const char* s = input;
const char* e = s;
while (*e != 0) {
e = s;
while (*e != 0 && strchr(delimiters, *e) == 0) ++e;
if (e - s > 0) {
op(s, e - s);
}
s = e + 1;
}
}
要将其与字符串向量一起使用(编辑:由于有人指出不继承STL类…hrmf;):
template<class ContainerType>
class Appender {
public:
Appender(ContainerType& container) : container_(container) {;}
void operator() (const char* s, unsigned length) {
container_.push_back(std::string(s,length));
}
private:
ContainerType& container_;
};
std::vector<std::string> strVector;
Appender v(strVector);
tokenize(v, "A number of words to be tokenized", " \t");
就是这样!这只是使用tokenizer的一种方式,比如如何计数单词:
class WordCounter {
public:
WordCounter() : noOfWords(0) {}
void operator() (const char*, unsigned) {
++noOfWords;
}
unsigned noOfWords;
};
WordCounter wc;
tokenize(wc, "A number of words to be counted", " \t");
ASSERT( wc.noOfWords == 7 );
受限于想象力;)
这里有一个仅使用标准正则表达式库的正则表达式解决方案。(我有点生疏,所以可能会有一些语法错误,但这至少是一般的想法)
#include <regex.h>
#include <string.h>
#include <vector.h>
using namespace std;
vector<string> split(string s){
regex r ("\\w+"); //regex matches whole words, (greedy, so no fragment words)
regex_iterator<string::iterator> rit ( s.begin(), s.end(), r );
regex_iterator<string::iterator> rend; //iterators to iterate thru words
vector<string> result<regex_iterator>(rit, rend);
return result; //iterates through the matches to fill the vector
}