我如何在c++中标记一个字符串?

Java有一个方便的分割方法:

String str = "The quick brown fox";
String[] results = str.split(" ");

在c++中有简单的方法来做到这一点吗?

当前回答

如果你正在使用c++ ranges——完整的range -v3库，而不是c++ 20所接受的有限功能——你可以这样做:

auto results = str | ranges::views::tokenize(" ",1);

．.．这是惰性求值。你也可以在这个范围内设置一个向量:

auto results = str | ranges::views::tokenize(" ",1) | ranges::to<std::vector>();

如果str有n个字符组成m个单词，这将占用O(m)个空间和O(n)个时间。

参见标准库自己的标记化示例。

2020-08-15 22:49:29

其他回答

您可以简单地使用正则表达式库并使用正则表达式解决该问题。

使用表达式(\w+)和\1中的变量(或$1，取决于正则表达式的库实现)。

2011-04-22 00:14:36

另一种快速方法是使用getline。喜欢的东西:

stringstream ss("bla bla");
string s;

while (getline(ss, s, ' ')) {
 cout << s << endl;
}

如果需要，可以创建一个简单的split()方法，返回vector<string>，即真的有用。

2008-11-28 04:17:39

如果你正在使用c++ ranges——完整的range -v3库，而不是c++ 20所接受的有限功能——你可以这样做:

auto results = str | ranges::views::tokenize(" ",1);

．.．这是惰性求值。你也可以在这个范围内设置一个向量:

auto results = str | ranges::views::tokenize(" ",1) | ranges::to<std::vector>();

如果str有n个字符组成m个单词，这将占用O(m)个空间和O(n)个时间。

参见标准库自己的标记化示例。

2020-08-15 22:49:29

这是一个简单的循环，只对标准库文件进行标记

#include <iostream.h>
#include <stdio.h>
#include <string.h>
#include <math.h>
#include <conio.h>
class word
    {
     public:
     char w[20];
     word()
      {
        for(int j=0;j<=20;j++)
        {w[j]='\0';
      }
   }



};

void main()
  {
    int i=1,n=0,j=0,k=0,m=1;
    char input[100];
    word ww[100];
    gets(input);

    n=strlen(input);


    for(i=0;i<=m;i++)
      {
        if(context[i]!=' ')
         {
            ww[k].w[j]=context[i];
            j++;

         }
         else
        {
         k++;
         j=0;
         m++;
        }

   }
 }

2013-05-19 13:42:11

Adam Pierce的回答提供了一个采用const char*的手工标记器。使用迭代器会有一些问题，因为对字符串的结束迭代器进行递增是未定义的。也就是说，给定字符串str{"The quick brown fox"}，我们当然可以做到:

auto start = find(cbegin(str), cend(str), ' ');
vector<string> tokens{ string(cbegin(str), start) };

while (start != cend(str)) {
    const auto finish = find(++start, cend(str), ' ');

    tokens.push_back(string(start, finish));
    start = finish;
}

生活的例子

如果你想通过使用标准功能来抽象复杂性，On Freund建议strtok是一个简单的选择:

vector<string> tokens;

for (auto i = strtok(data(str), " "); i != nullptr; i = strtok(nullptr, " ")) tokens.push_back(i);

如果你不能访问c++ 17，你需要像这个例子一样替换data(str): http://ideone.com/8kAGoa

虽然在示例中没有演示，但strtok不需要为每个标记使用相同的分隔符。除了这个优势，还有几个缺点:

strtok cannot be used on multiple strings at the same time: Either a nullptr must be passed to continue tokenizing the current string or a new char* to tokenize must be passed (there are some non-standard implementations which do support this however, such as: strtok_s) For the same reason strtok cannot be used on multiple threads simultaneously (this may however be implementation defined, for example: Visual Studio's implementation is thread safe) Calling strtok modifies the string it is operating on, so it cannot be used on const strings, const char*s, or literal strings, to tokenize any of these with strtok or to operate on a string who's contents need to be preserved, str would have to be copied, then the copy could be operated on

c++20为我们提供了split_view来以非破坏性的方式标记字符串:https://topanswers.xyz/cplusplus?q=749#a874

前面的方法不能就地生成标记化的向量，这意味着如果不将它们抽象为辅助函数，它们就不能初始化const vector<string>令牌。该功能和接受任何空白分隔符的能力可以使用istream_iterator来利用。例如，给定const string str{"The quick \tbrown \nfox"}，我们可以这样做:

istringstream is{ str };
const vector<string> tokens{ istream_iterator<string>(is), istream_iterator<string>() };

生活的例子

对于这个选项，需要构造一个istringstream的代价比前面两个选项要大得多，但是这个代价通常隐藏在字符串分配的代价中。

如果上面的选项都不够灵活，不能满足您的标记化需求，那么最灵活的选项是使用regex_token_iterator，当然这种灵活性会带来更大的开销，但同样，这可能隐藏在字符串分配成本中。例如，我们想要基于非转义的逗号进行标记化，也吃空白，给定以下输入:const string str{" the,qu\\，ick，\tbrown, fox"}我们可以这样做:

const regex re{ "\\s*((?:[^\\\\,]|\\\\.)*?)\\s*(?:,|$)" };
const vector<string> tokens{ sregex_token_iterator(cbegin(str), cend(str), re, 1), sregex_token_iterator() };

生活的例子

2016-07-26 16:51:20

我如何在c++中标记一个字符串?

推荐文章

最新文章

标签