如何迭代由空格分隔的单词组成的字符串中的单词?
注意,我对C字符串函数或那种字符操作/访问不感兴趣。比起效率,我更喜欢优雅。我当前的解决方案:
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
int main() {
string s = "Somewhere down the road";
istringstream iss(s);
do {
string subs;
iss >> subs;
cout << "Substring: " << subs << endl;
} while (iss);
}
我对string和u32string~的一般实现,使用boost::algorithm::split签名。
template<typename CharT, typename UnaryPredicate>
void split(std::vector<std::basic_string<CharT>>& split_result,
const std::basic_string<CharT>& s,
UnaryPredicate predicate)
{
using ST = std::basic_string<CharT>;
using std::swap;
std::vector<ST> tmp_result;
auto iter = s.cbegin(),
end_iter = s.cend();
while (true)
{
/**
* edge case: empty str -> push an empty str and exit.
*/
auto find_iter = find_if(iter, end_iter, predicate);
tmp_result.emplace_back(iter, find_iter);
if (find_iter == end_iter) { break; }
iter = ++find_iter;
}
swap(tmp_result, split_result);
}
template<typename CharT>
void split(std::vector<std::basic_string<CharT>>& split_result,
const std::basic_string<CharT>& s,
const std::basic_string<CharT>& char_candidate)
{
std::unordered_set<CharT> candidate_set(char_candidate.cbegin(),
char_candidate.cend());
auto predicate = [&candidate_set](const CharT& c) {
return candidate_set.count(c) > 0U;
};
return split(split_result, s, predicate);
}
template<typename CharT>
void split(std::vector<std::basic_string<CharT>>& split_result,
const std::basic_string<CharT>& s,
const CharT* literals)
{
return split(split_result, s, std::basic_string<CharT>(literals));
}
如果您喜欢使用boost,但希望使用整个字符串作为分隔符(而不是之前提出的大多数解决方案中的单个字符),可以使用boost_split_iterator。
示例代码包括方便的模板:
#include <iostream>
#include <vector>
#include <boost/algorithm/string.hpp>
template<typename _OutputIterator>
inline void split(
const std::string& str,
const std::string& delim,
_OutputIterator result)
{
using namespace boost::algorithm;
typedef split_iterator<std::string::const_iterator> It;
for(It iter=make_split_iterator(str, first_finder(delim, is_equal()));
iter!=It();
++iter)
{
*(result++) = boost::copy_range<std::string>(*iter);
}
}
int main(int argc, char* argv[])
{
using namespace std;
vector<string> splitted;
split("HelloFOOworldFOO!", "FOO", back_inserter(splitted));
// or directly to console, for example
split("HelloFOOworldFOO!", "FOO", ostream_iterator<string>(cout, "\n"));
return 0;
}
这里有一个只使用标准正则表达式库的简单解决方案
#include <regex>
#include <string>
#include <vector>
std::vector<string> Tokenize( const string str, const std::regex regex )
{
using namespace std;
std::vector<string> result;
sregex_token_iterator it( str.begin(), str.end(), regex, -1 );
sregex_token_iterator reg_end;
for ( ; it != reg_end; ++it ) {
if ( !it->str().empty() ) //token could be empty:check
result.emplace_back( it->str() );
}
return result;
}
正则表达式参数允许检查多个参数(空格、逗号等)
我通常只选中空格和逗号分隔,所以我也有这个默认函数:
std::vector<string> TokenizeDefault( const string str )
{
using namespace std;
regex re( "[\\s,]+" );
return Tokenize( str, re );
}
“[\\s,]+”检查空格(\\s)和逗号(,)。
注意,如果要拆分wstring而不是string,
将所有std::regex更改为std::wregex将所有sregex_token_iterator更改为wsregex_token_idterator
注意,根据编译器的不同,您可能还希望引用字符串参数。
我使用以下方法
void split(string in, vector<string>& parts, char separator) {
string::iterator ts, curr;
ts = curr = in.begin();
for(; curr <= in.end(); curr++ ) {
if( (curr == in.end() || *curr == separator) && curr > ts )
parts.push_back( string( ts, curr ));
if( curr == in.end() )
break;
if( *curr == separator ) ts = curr + 1;
}
}
PlasmaHH,我忘记包含删除带有空格的标记的额外检查(curr>ts)。
使用vector作为基类的快速版本,可完全访问其所有运算符:
// Split string into parts.
class Split : public std::vector<std::string>
{
public:
Split(const std::string& str, char* delimList)
{
size_t lastPos = 0;
size_t pos = str.find_first_of(delimList);
while (pos != std::string::npos)
{
if (pos != lastPos)
push_back(str.substr(lastPos, pos-lastPos));
lastPos = pos + 1;
pos = str.find_first_of(delimList, lastPos);
}
if (lastPos < str.length())
push_back(str.substr(lastPos, pos-lastPos));
}
};
用于填充STL集的示例:
std::set<std::string> words;
Split split("Hello,World", ",");
words.insert(split.begin(), split.end());