Java有一个方便的分割方法:
String str = "The quick brown fox";
String[] results = str.split(" ");
在c++中有简单的方法来做到这一点吗?
Java有一个方便的分割方法:
String str = "The quick brown fox";
String[] results = str.split(" ");
在c++中有简单的方法来做到这一点吗?
当前回答
pystring是一个小型库,实现了Python的一系列字符串函数,包括split方法:
#include <string>
#include <vector>
#include "pystring.h"
std::vector<std::string> chunks;
pystring::split("this string", chunks);
// also can specify a separator
pystring::split("this-string", chunks, "-");
其他回答
下面是一个示例标记器类,它可以实现您想要的功能
//Header file
class Tokenizer
{
public:
static const std::string DELIMITERS;
Tokenizer(const std::string& str);
Tokenizer(const std::string& str, const std::string& delimiters);
bool NextToken();
bool NextToken(const std::string& delimiters);
const std::string GetToken() const;
void Reset();
protected:
size_t m_offset;
const std::string m_string;
std::string m_token;
std::string m_delimiters;
};
//CPP file
const std::string Tokenizer::DELIMITERS(" \t\n\r");
Tokenizer::Tokenizer(const std::string& s) :
m_string(s),
m_offset(0),
m_delimiters(DELIMITERS) {}
Tokenizer::Tokenizer(const std::string& s, const std::string& delimiters) :
m_string(s),
m_offset(0),
m_delimiters(delimiters) {}
bool Tokenizer::NextToken()
{
return NextToken(m_delimiters);
}
bool Tokenizer::NextToken(const std::string& delimiters)
{
size_t i = m_string.find_first_not_of(delimiters, m_offset);
if (std::string::npos == i)
{
m_offset = m_string.length();
return false;
}
size_t j = m_string.find_first_of(delimiters, i);
if (std::string::npos == j)
{
m_token = m_string.substr(i);
m_offset = m_string.length();
return true;
}
m_token = m_string.substr(i, j - i);
m_offset = j;
return true;
}
例子:
std::vector <std::string> v;
Tokenizer s("split this string", " ");
while (s.NextToken())
{
v.push_back(s.GetToken());
}
/// split a string into multiple sub strings, based on a separator string
/// for example, if separator="::",
///
/// s = "abc" -> "abc"
///
/// s = "abc::def xy::st:" -> "abc", "def xy" and "st:",
///
/// s = "::abc::" -> "abc"
///
/// s = "::" -> NO sub strings found
///
/// s = "" -> NO sub strings found
///
/// then append the sub-strings to the end of the vector v.
///
/// the idea comes from the findUrls() function of "Accelerated C++", chapt7,
/// findurls.cpp
///
void split(const string& s, const string& sep, vector<string>& v)
{
typedef string::const_iterator iter;
iter b = s.begin(), e = s.end(), i;
iter sep_b = sep.begin(), sep_e = sep.end();
// search through s
while (b != e){
i = search(b, e, sep_b, sep_e);
// no more separator found
if (i == e){
// it's not an empty string
if (b != e)
v.push_back(string(b, e));
break;
}
else if (i == b){
// the separator is found and right at the beginning
// in this case, we need to move on and search for the
// next separator
b = i + sep.length();
}
else{
// found the separator
v.push_back(string(b, i));
b = i;
}
}
}
boost库很好,但并不总是可用的。手工做这些事情也是很好的脑力锻炼。这里我们只使用STL中的std::search()算法,参见上面的代码。
这是一个简单的循环,只对标准库文件进行标记
#include <iostream.h>
#include <stdio.h>
#include <string.h>
#include <math.h>
#include <conio.h>
class word
{
public:
char w[20];
word()
{
for(int j=0;j<=20;j++)
{w[j]='\0';
}
}
};
void main()
{
int i=1,n=0,j=0,k=0,m=1;
char input[100];
word ww[100];
gets(input);
n=strlen(input);
for(i=0;i<=m;i++)
{
if(context[i]!=' ')
{
ww[k].w[j]=context[i];
j++;
}
else
{
k++;
j=0;
m++;
}
}
}
在我看来很奇怪的是,SO网站上有这么多注重速度的书呆子,却没有人给出一个使用编译时生成的分隔符查找表的版本(下面是示例实现)。使用查找表和迭代器应该在效率上击败std::regex,如果你不需要击败regex,就使用它,它是c++ 11的标准,超级灵活。
有些人已经建议使用正则表达式,但对于新手来说,这里有一个打包的示例,应该完全符合OP的期望:
std::vector<std::string> split(std::string::const_iterator it, std::string::const_iterator end, std::regex e = std::regex{"\\w+"}){
std::smatch m{};
std::vector<std::string> ret{};
while (std::regex_search (it,end,m,e)) {
ret.emplace_back(m.str());
std::advance(it, m.position() + m.length()); //next start position = match position + match length
}
return ret;
}
std::vector<std::string> split(const std::string &s, std::regex e = std::regex{"\\w+"}){ //comfort version calls flexible version
return split(s.cbegin(), s.cend(), std::move(e));
}
int main ()
{
std::string str {"Some people, excluding those present, have been compile time constants - since puberty."};
auto v = split(str);
for(const auto&s:v){
std::cout << s << std::endl;
}
std::cout << "crazy version:" << std::endl;
v = split(str, std::regex{"[^e]+"}); //using e as delim shows flexibility
for(const auto&s:v){
std::cout << s << std::endl;
}
return 0;
}
如果我们需要更快并接受所有字符必须为8位的约束,我们可以在编译时使用元编程创建一个查找表:
template<bool...> struct BoolSequence{}; //just here to hold bools
template<char...> struct CharSequence{}; //just here to hold chars
template<typename T, char C> struct Contains; //generic
template<char First, char... Cs, char Match> //not first specialization
struct Contains<CharSequence<First, Cs...>,Match> :
Contains<CharSequence<Cs...>, Match>{}; //strip first and increase index
template<char First, char... Cs> //is first specialization
struct Contains<CharSequence<First, Cs...>,First>: std::true_type {};
template<char Match> //not found specialization
struct Contains<CharSequence<>,Match>: std::false_type{};
template<int I, typename T, typename U>
struct MakeSequence; //generic
template<int I, bool... Bs, typename U>
struct MakeSequence<I,BoolSequence<Bs...>, U>: //not last
MakeSequence<I-1, BoolSequence<Contains<U,I-1>::value,Bs...>, U>{};
template<bool... Bs, typename U>
struct MakeSequence<0,BoolSequence<Bs...>,U>{ //last
using Type = BoolSequence<Bs...>;
};
template<typename T> struct BoolASCIITable;
template<bool... Bs> struct BoolASCIITable<BoolSequence<Bs...>>{
/* could be made constexpr but not yet supported by MSVC */
static bool isDelim(const char c){
static const bool table[256] = {Bs...};
return table[static_cast<int>(c)];
}
};
using Delims = CharSequence<'.',',',' ',':','\n'>; //list your custom delimiters here
using Table = BoolASCIITable<typename MakeSequence<256,BoolSequence<>,Delims>::Type>;
有了这些,创建getNextToken函数就很容易了:
template<typename T_It>
std::pair<T_It,T_It> getNextToken(T_It begin,T_It end){
begin = std::find_if(begin,end,std::not1(Table{})); //find first non delim or end
auto second = std::find_if(begin,end,Table{}); //find first delim or end
return std::make_pair(begin,second);
}
使用它也很简单:
int main() {
std::string s{"Some people, excluding those present, have been compile time constants - since puberty."};
auto it = std::begin(s);
auto end = std::end(s);
while(it != std::end(s)){
auto token = getNextToken(it,end);
std::cout << std::string(token.first,token.second) << std::endl;
it = token.second;
}
return 0;
}
这里有一个生动的例子:http://ideone.com/GKtkLQ
Boost有很强的拆分功能:Boost::algorithm::split。
示例程序:
#include <vector>
#include <boost/algorithm/string.hpp>
int main() {
auto s = "a,b, c ,,e,f,";
std::vector<std::string> fields;
boost::split(fields, s, boost::is_any_of(","));
for (const auto& field : fields)
std::cout << "\"" << field << "\"\n";
return 0;
}
输出:
"a"
"b"
" c "
""
"e"
"f"
""