如何迭代由空格分隔的单词组成的字符串中的单词?
注意,我对C字符串函数或那种字符操作/访问不感兴趣。比起效率,我更喜欢优雅。我当前的解决方案:
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
int main() {
string s = "Somewhere down the road";
istringstream iss(s);
do {
string subs;
iss >> subs;
cout << "Substring: " << subs << endl;
} while (iss);
}
这里有一个只使用标准正则表达式库的简单解决方案
#include <regex>
#include <string>
#include <vector>
std::vector<string> Tokenize( const string str, const std::regex regex )
{
using namespace std;
std::vector<string> result;
sregex_token_iterator it( str.begin(), str.end(), regex, -1 );
sregex_token_iterator reg_end;
for ( ; it != reg_end; ++it ) {
if ( !it->str().empty() ) //token could be empty:check
result.emplace_back( it->str() );
}
return result;
}
正则表达式参数允许检查多个参数(空格、逗号等)
我通常只选中空格和逗号分隔,所以我也有这个默认函数:
std::vector<string> TokenizeDefault( const string str )
{
using namespace std;
regex re( "[\\s,]+" );
return Tokenize( str, re );
}
“[\\s,]+”检查空格(\\s)和逗号(,)。
注意,如果要拆分wstring而不是string,
将所有std::regex更改为std::wregex将所有sregex_token_iterator更改为wsregex_token_idterator
注意,根据编译器的不同,您可能还希望引用字符串参数。
另一种灵活快速的方式
template<typename Operator>
void tokenize(Operator& op, const char* input, const char* delimiters) {
const char* s = input;
const char* e = s;
while (*e != 0) {
e = s;
while (*e != 0 && strchr(delimiters, *e) == 0) ++e;
if (e - s > 0) {
op(s, e - s);
}
s = e + 1;
}
}
要将其与字符串向量一起使用(编辑:由于有人指出不继承STL类…hrmf;):
template<class ContainerType>
class Appender {
public:
Appender(ContainerType& container) : container_(container) {;}
void operator() (const char* s, unsigned length) {
container_.push_back(std::string(s,length));
}
private:
ContainerType& container_;
};
std::vector<std::string> strVector;
Appender v(strVector);
tokenize(v, "A number of words to be tokenized", " \t");
就是这样!这只是使用tokenizer的一种方式,比如如何计数单词:
class WordCounter {
public:
WordCounter() : noOfWords(0) {}
void operator() (const char*, unsigned) {
++noOfWords;
}
unsigned noOfWords;
};
WordCounter wc;
tokenize(wc, "A number of words to be counted", " \t");
ASSERT( wc.noOfWords == 7 );
受限于想象力;)
我已经使用strtok滚动了自己的代码,并使用boost拆分了一个字符串。我找到的最好的方法是C++字符串工具包库。它非常灵活和快速。
#include <iostream>
#include <vector>
#include <string>
#include <strtk.hpp>
const char *whitespace = " \t\r\n\f";
const char *whitespace_and_punctuation = " \t\r\n\f;,=";
int main()
{
{ // normal parsing of a string into a vector of strings
std::string s("Somewhere down the road");
std::vector<std::string> result;
if( strtk::parse( s, whitespace, result ) )
{
for(size_t i = 0; i < result.size(); ++i )
std::cout << result[i] << std::endl;
}
}
{ // parsing a string into a vector of floats with other separators
// besides spaces
std::string s("3.0, 3.14; 4.0");
std::vector<float> values;
if( strtk::parse( s, whitespace_and_punctuation, values ) )
{
for(size_t i = 0; i < values.size(); ++i )
std::cout << values[i] << std::endl;
}
}
{ // parsing a string into specific variables
std::string s("angle = 45; radius = 9.9");
std::string w1, w2;
float v1, v2;
if( strtk::parse( s, whitespace_and_punctuation, w1, v1, w2, v2) )
{
std::cout << "word " << w1 << ", value " << v1 << std::endl;
std::cout << "word " << w2 << ", value " << v2 << std::endl;
}
}
return 0;
}
该工具包比这个简单示例显示的灵活性要高得多,但它在将字符串解析为有用元素方面的实用性令人难以置信。
并不是说我们需要更多的答案,但这是我受到埃文·特兰启发后想到的。
std::vector <std::string> split(const string &input, auto delimiter, bool skipEmpty=true) {
/*
Splits a string at each delimiter and returns these strings as a string vector.
If the delimiter is not found then nothing is returned.
If skipEmpty is true then strings between delimiters that are 0 in length will be skipped.
*/
bool delimiterFound = false;
int pos=0, pPos=0;
std::vector <std::string> result;
while (true) {
pos = input.find(delimiter,pPos);
if (pos != std::string::npos) {
if (skipEmpty==false or pos-pPos > 0) // if empty values are to be kept or not
result.push_back(input.substr(pPos,pos-pPos));
delimiterFound = true;
} else {
if (pPos < input.length() and delimiterFound) {
if (skipEmpty==false or input.length()-pPos > 0) // if empty values are to be kept or not
result.push_back(input.substr(pPos,input.length()-pPos));
}
break;
}
pPos = pos+1;
}
return result;
}
这是我使用C++11和STL的解决方案。它应该是合理有效的:
#include <vector>
#include <string>
#include <cstring>
#include <iostream>
#include <algorithm>
#include <functional>
std::vector<std::string> split(const std::string& s)
{
std::vector<std::string> v;
const auto end = s.end();
auto to = s.begin();
decltype(to) from;
while((from = std::find_if(to, end,
[](char c){ return !std::isspace(c); })) != end)
{
to = std::find_if(from, end, [](char c){ return std::isspace(c); });
v.emplace_back(from, to);
}
return v;
}
int main()
{
std::string s = "this is the string to split";
auto v = split(s);
for(auto&& s: v)
std::cout << s << '\n';
}
输出:
this
is
the
string
to
split