Java有一个方便的分割方法:

String str = "The quick brown fox";
String[] results = str.split(" ");

在c++中有简单的方法来做到这一点吗?


当前回答

这里有许多过于复杂的建议。试试这个简单的std::string解决方案:

using namespace std;

string someText = ...

string::size_type tokenOff = 0, sepOff = tokenOff;
while (sepOff != string::npos)
{
    sepOff = someText.find(' ', sepOff);
    string::size_type tokenLen = (sepOff == string::npos) ? sepOff : sepOff++ - tokenOff;
    string token = someText.substr(tokenOff, tokenLen);
    if (!token.empty())
        /* do something with token */;
    tokenOff = sepOff;
}

其他回答

If the maximum length of the input string to be tokenized is known, one can exploit this and implement a very fast version. I am sketching the basic idea below, which was inspired by both strtok() and the "suffix array"-data structure described Jon Bentley's "Programming Perls" 2nd edition, chapter 15. The C++ class in this case only gives some organization and convenience of use. The implementation shown can be easily extended for removing leading and trailing whitespace characters in the tokens.

基本上,可以将分隔符替换为以字符串结束的'\0'字符,并设置指向修改后字符串中的标记的指针。在极端情况下,当字符串仅由分隔符组成时,将得到字符串长度加1个空标记。复制要修改的字符串是可行的。

头文件:

class TextLineSplitter
{
public:

    TextLineSplitter( const size_t max_line_len );

    ~TextLineSplitter();

    void            SplitLine( const char *line,
                               const char sep_char = ',',
                             );

    inline size_t   NumTokens( void ) const
    {
        return mNumTokens;
    }

    const char *    GetToken( const size_t token_idx ) const
    {
        assert( token_idx < mNumTokens );
        return mTokens[ token_idx ];
    }

private:
    const size_t    mStorageSize;

    char           *mBuff;
    char          **mTokens;
    size_t          mNumTokens;

    inline void     ResetContent( void )
    {
        memset( mBuff, 0, mStorageSize );
        // mark all items as empty:
        memset( mTokens, 0, mStorageSize * sizeof( char* ) );
        // reset counter for found items:
        mNumTokens = 0L;
    }
};

Implementattion文件:

TextLineSplitter::TextLineSplitter( const size_t max_line_len ):
    mStorageSize ( max_line_len + 1L )
{
    // allocate memory
    mBuff   = new char  [ mStorageSize ];
    mTokens = new char* [ mStorageSize ];

    ResetContent();
}

TextLineSplitter::~TextLineSplitter()
{
    delete [] mBuff;
    delete [] mTokens;
}


void TextLineSplitter::SplitLine( const char *line,
                                  const char sep_char   /* = ',' */,
                                )
{
    assert( sep_char != '\0' );

    ResetContent();
    strncpy( mBuff, line, mMaxLineLen );

    size_t idx       = 0L; // running index for characters

    do
    {
        assert( idx < mStorageSize );

        const char chr = line[ idx ]; // retrieve current character

        if( mTokens[ mNumTokens ] == NULL )
        {
            mTokens[ mNumTokens ] = &mBuff[ idx ];
        } // if

        if( chr == sep_char || chr == '\0' )
        { // item or line finished
            // overwrite separator with a 0-terminating character:
            mBuff[ idx ] = '\0';
            // count-up items:
            mNumTokens ++;
        } // if

    } while( line[ idx++ ] );
}

使用的场景是:

// create an instance capable of splitting strings up to 1000 chars long:
TextLineSplitter spl( 1000 );
spl.SplitLine( "Item1,,Item2,Item3" );
for( size_t i = 0; i < spl.NumTokens(); i++ )
{
    printf( "%s\n", spl.GetToken( i ) );
}

输出:

Item1

Item2
Item3

无意冒犯,但对于这样一个简单的问题,你把事情搞得太复杂了。使用Boost有很多理由。但对于这么简单的事情,就像用20号雪橇打苍蝇一样。

void
split( vector<string> & theStringVector,  /* Altered/returned value */
       const  string  & theString,
       const  string  & theDelimiter)
{
    UASSERT( theDelimiter.size(), >, 0); // My own ASSERT macro.

    size_t  start = 0, end = 0;

    while ( end != string::npos)
    {
        end = theString.find( theDelimiter, start);

        // If at end, use length=maxLength.  Else use length=end-start.
        theStringVector.push_back( theString.substr( start,
                       (end == string::npos) ? string::npos : end - start));

        // If at end, use start=maxSize.  Else use start=end+delimiter.
        start = (   ( end > (string::npos - theDelimiter.size()) )
                  ?  string::npos  :  end + theDelimiter.size());
    }
}

例如(以Doug为例),

#define SHOW(I,X)   cout << "[" << (I) << "]\t " # X " = \"" << (X) << "\"" << endl

int
main()
{
    vector<string> v;

    split( v, "A:PEP:909:Inventory Item", ":" );

    for (unsigned int i = 0;  i < v.size();   i++)
        SHOW( i, v[i] );
}

是的,我们可以split()返回一个新的向量,而不是传入一个。包装和重载是很简单的。但根据我所做的事情,我经常发现重用已有的对象比总是创建新对象更好。(只要我不忘记清空中间的向量!)

参考:http://www.cplusplus.com/reference/string/string/。

(我最初是在写一个回应Doug的问题:基于分隔符的c++字符串修改和提取(关闭)。但由于马丁·约克用这里的指针结束了这个问题……我将泛化我的代码。)

如果你正在使用c++ ranges——完整的range -v3库,而不是c++ 20所接受的有限功能——你可以这样做:

auto results = str | ranges::views::tokenize(" ",1);

... 这是惰性求值。你也可以在这个范围内设置一个向量:

auto results = str | ranges::views::tokenize(" ",1) | ranges::to<std::vector>();

如果str有n个字符组成m个单词,这将占用O(m)个空间和O(n)个时间。

参见标准库自己的标记化示例。

MFC/ATL有一个非常好的标记器。从MSDN:

CAtlString str( "%First Second#Third" );
CAtlString resToken;
int curPos= 0;

resToken= str.Tokenize("% #",curPos);
while (resToken != "")
{
   printf("Resulting token: %s\n", resToken);
   resToken= str.Tokenize("% #",curPos);
};

Output

Resulting Token: First
Resulting Token: Second
Resulting Token: Third

我贴出了类似问题的答案。 不要白费力气。我使用过许多库,我遇到过的最快、最灵活的库是:c++ String Toolkit Library。

这里有一个如何使用它的例子,我已经张贴在stackoverflow的其他地方。

#include <iostream>
#include <vector>
#include <string>
#include <strtk.hpp>

const char *whitespace  = " \t\r\n\f";
const char *whitespace_and_punctuation  = " \t\r\n\f;,=";

int main()
{
    {   // normal parsing of a string into a vector of strings
       std::string s("Somewhere down the road");
       std::vector<std::string> result;
       if( strtk::parse( s, whitespace, result ) )
       {
           for(size_t i = 0; i < result.size(); ++i )
            std::cout << result[i] << std::endl;
       }
    }

    {  // parsing a string into a vector of floats with other separators
       // besides spaces

       std::string s("3.0, 3.14; 4.0");
       std::vector<float> values;
       if( strtk::parse( s, whitespace_and_punctuation, values ) )
       {
           for(size_t i = 0; i < values.size(); ++i )
            std::cout << values[i] << std::endl;
       }
    }

    {  // parsing a string into specific variables

       std::string s("angle = 45; radius = 9.9");
       std::string w1, w2;
       float v1, v2;
       if( strtk::parse( s, whitespace_and_punctuation, w1, v1, w2, v2) )
       {
           std::cout << "word " << w1 << ", value " << v1 << std::endl;
           std::cout << "word " << w2 << ", value " << v2 << std::endl;
       }
    }

    return 0;
}