Java有一个方便的分割方法:

String str = "The quick brown fox";
String[] results = str.split(" ");

在c++中有简单的方法来做到这一点吗?


当前回答

我贴出了类似问题的答案。 不要白费力气。我使用过许多库,我遇到过的最快、最灵活的库是:c++ String Toolkit Library。

这里有一个如何使用它的例子,我已经张贴在stackoverflow的其他地方。

#include <iostream>
#include <vector>
#include <string>
#include <strtk.hpp>

const char *whitespace  = " \t\r\n\f";
const char *whitespace_and_punctuation  = " \t\r\n\f;,=";

int main()
{
    {   // normal parsing of a string into a vector of strings
       std::string s("Somewhere down the road");
       std::vector<std::string> result;
       if( strtk::parse( s, whitespace, result ) )
       {
           for(size_t i = 0; i < result.size(); ++i )
            std::cout << result[i] << std::endl;
       }
    }

    {  // parsing a string into a vector of floats with other separators
       // besides spaces

       std::string s("3.0, 3.14; 4.0");
       std::vector<float> values;
       if( strtk::parse( s, whitespace_and_punctuation, values ) )
       {
           for(size_t i = 0; i < values.size(); ++i )
            std::cout << values[i] << std::endl;
       }
    }

    {  // parsing a string into specific variables

       std::string s("angle = 45; radius = 9.9");
       std::string w1, w2;
       float v1, v2;
       if( strtk::parse( s, whitespace_and_punctuation, w1, v1, w2, v2) )
       {
           std::cout << "word " << w1 << ", value " << v1 << std::endl;
           std::cout << "word " << w2 << ", value " << v2 << std::endl;
       }
    }

    return 0;
}

其他回答

如果你愿意使用C语言,你可以使用strtok函数。在使用它时,您应该注意多线程问题。

您可以简单地使用正则表达式库并使用正则表达式解决该问题。

使用表达式(\w+)和\1中的变量(或$1,取决于正则表达式的库实现)。

在我看来很奇怪的是,SO网站上有这么多注重速度的书呆子,却没有人给出一个使用编译时生成的分隔符查找表的版本(下面是示例实现)。使用查找表和迭代器应该在效率上击败std::regex,如果你不需要击败regex,就使用它,它是c++ 11的标准,超级灵活。

有些人已经建议使用正则表达式,但对于新手来说,这里有一个打包的示例,应该完全符合OP的期望:

std::vector<std::string> split(std::string::const_iterator it, std::string::const_iterator end, std::regex e = std::regex{"\\w+"}){
    std::smatch m{};
    std::vector<std::string> ret{};
    while (std::regex_search (it,end,m,e)) {
        ret.emplace_back(m.str());              
        std::advance(it, m.position() + m.length()); //next start position = match position + match length
    }
    return ret;
}
std::vector<std::string> split(const std::string &s, std::regex e = std::regex{"\\w+"}){  //comfort version calls flexible version
    return split(s.cbegin(), s.cend(), std::move(e));
}
int main ()
{
    std::string str {"Some people, excluding those present, have been compile time constants - since puberty."};
    auto v = split(str);
    for(const auto&s:v){
        std::cout << s << std::endl;
    }
    std::cout << "crazy version:" << std::endl;
    v = split(str, std::regex{"[^e]+"});  //using e as delim shows flexibility
    for(const auto&s:v){
        std::cout << s << std::endl;
    }
    return 0;
}

如果我们需要更快并接受所有字符必须为8位的约束,我们可以在编译时使用元编程创建一个查找表:

template<bool...> struct BoolSequence{};        //just here to hold bools
template<char...> struct CharSequence{};        //just here to hold chars
template<typename T, char C> struct Contains;   //generic
template<char First, char... Cs, char Match>    //not first specialization
struct Contains<CharSequence<First, Cs...>,Match> :
    Contains<CharSequence<Cs...>, Match>{};     //strip first and increase index
template<char First, char... Cs>                //is first specialization
struct Contains<CharSequence<First, Cs...>,First>: std::true_type {}; 
template<char Match>                            //not found specialization
struct Contains<CharSequence<>,Match>: std::false_type{};

template<int I, typename T, typename U> 
struct MakeSequence;                            //generic
template<int I, bool... Bs, typename U> 
struct MakeSequence<I,BoolSequence<Bs...>, U>:  //not last
    MakeSequence<I-1, BoolSequence<Contains<U,I-1>::value,Bs...>, U>{};
template<bool... Bs, typename U> 
struct MakeSequence<0,BoolSequence<Bs...>,U>{   //last  
    using Type = BoolSequence<Bs...>;
};
template<typename T> struct BoolASCIITable;
template<bool... Bs> struct BoolASCIITable<BoolSequence<Bs...>>{
    /* could be made constexpr but not yet supported by MSVC */
    static bool isDelim(const char c){
        static const bool table[256] = {Bs...};
        return table[static_cast<int>(c)];
    }   
};
using Delims = CharSequence<'.',',',' ',':','\n'>;  //list your custom delimiters here
using Table = BoolASCIITable<typename MakeSequence<256,BoolSequence<>,Delims>::Type>;

有了这些,创建getNextToken函数就很容易了:

template<typename T_It>
std::pair<T_It,T_It> getNextToken(T_It begin,T_It end){
    begin = std::find_if(begin,end,std::not1(Table{})); //find first non delim or end
    auto second = std::find_if(begin,end,Table{});      //find first delim or end
    return std::make_pair(begin,second);
}

使用它也很简单:

int main() {
    std::string s{"Some people, excluding those present, have been compile time constants - since puberty."};
    auto it = std::begin(s);
    auto end = std::end(s);
    while(it != std::end(s)){
        auto token = getNextToken(it,end);
        std::cout << std::string(token.first,token.second) << std::endl;
        it = token.second;
    }
    return 0;
}

这里有一个生动的例子:http://ideone.com/GKtkLQ

这是一个简单的stl解决方案(~5行!)使用std::find和std::find_first_not_of来处理重复的分隔符(例如空格或句号),以及开头和结尾的分隔符:

#include <string>
#include <vector>

void tokenize(std::string str, std::vector<string> &token_v){
    size_t start = str.find_first_not_of(DELIMITER), end=start;

    while (start != std::string::npos){
        // Find next occurence of delimiter
        end = str.find(DELIMITER, start);
        // Push back the token found into vector
        token_v.push_back(str.substr(start, end-start));
        // Skip all occurences of the delimiter to find new start
        start = str.find_first_not_of(DELIMITER, end);
    }
}

现场试试吧!

/// split a string into multiple sub strings, based on a separator string
/// for example, if separator="::",
///
/// s = "abc" -> "abc"
///
/// s = "abc::def xy::st:" -> "abc", "def xy" and "st:",
///
/// s = "::abc::" -> "abc"
///
/// s = "::" -> NO sub strings found
///
/// s = "" -> NO sub strings found
///
/// then append the sub-strings to the end of the vector v.
/// 
/// the idea comes from the findUrls() function of "Accelerated C++", chapt7,
/// findurls.cpp
///
void split(const string& s, const string& sep, vector<string>& v)
{
    typedef string::const_iterator iter;
    iter b = s.begin(), e = s.end(), i;
    iter sep_b = sep.begin(), sep_e = sep.end();

    // search through s
    while (b != e){
        i = search(b, e, sep_b, sep_e);

        // no more separator found
        if (i == e){
            // it's not an empty string
            if (b != e)
                v.push_back(string(b, e));
            break;
        }
        else if (i == b){
            // the separator is found and right at the beginning
            // in this case, we need to move on and search for the
            // next separator
            b = i + sep.length();
        }
        else{
            // found the separator
            v.push_back(string(b, i));
            b = i;
        }
    }
}

boost库很好,但并不总是可用的。手工做这些事情也是很好的脑力锻炼。这里我们只使用STL中的std::search()算法,参见上面的代码。