如何转换一个实例的std::字符串小写

我想转换一个std::字符串小写。我知道tolower()函数。然而，在过去，我有这个函数的问题，它几乎不是理想的无论如何使用std::string将需要迭代每个字符。

有没有一种替代方案能100%有效?

当前回答

std::ctype::tolower()从标准c++本地化库将正确地为您做这件事。下面是一个例子，从下面的参考页面提取

#include <locale>
#include <iostream>

int main () {
  std::locale::global(std::locale("en_US.utf8"));
  std::wcout.imbue(std::locale());
  std::wcout << "In US English UTF-8 locale:\n";
  auto& f = std::use_facet<std::ctype<wchar_t>>(std::locale());
  std::wstring str = L"HELLo, wORLD!";
  std::wcout << "Lowercase form of the string '" << str << "' is ";
  f.tolower(&str[0], &str[0] + str.size());
  std::wcout << "'" << str << "'\n";
}

2016-01-29 02:25:50

其他回答

这是Stefan Mai的回应的后续:如果你想把转换的结果放在另一个字符串中，你需要在调用std::transform之前预先分配它的存储空间。由于STL将转换后的字符存储在目标迭代器中(在每次循环迭代时递增)，因此目标字符串不会自动调整大小，并且可能会占用内存。

#include <string>
#include <algorithm>
#include <iostream>

int main (int argc, char* argv[])
{
  std::string sourceString = "Abc";
  std::string destinationString;

  // Allocate the destination space
  destinationString.resize(sourceString.size());

  // Convert the source string to lower case
  // storing the result in destination string
  std::transform(sourceString.begin(),
                 sourceString.end(),
                 destinationString.begin(),
                 ::tolower);

  // Output the result of the conversion
  std::cout << sourceString
            << " -> "
            << destinationString
            << std::endl;
}

2013-03-28 06:25:54

博士tl;

使用ICU图书馆。如果您不这样做，您的转换例程将在您可能甚至没有意识到存在的情况下无声地中断。

首先你必须回答一个问题:std::string的编码是什么?是ISO-8859-1吗?或者ISO-8859-8?或者Windows Codepage 1252?不管你用什么来转换大写字母还是小写字母，你知道吗?(或者对于0x7f以上的字符会失败吗?)

如果您使用UTF-8(8位编码中唯一明智的选择)和std::string作为容器，如果您认为您仍然在控制事情，那么您已经欺骗了自己。您正在将一个多字节字符序列存储在一个不知道多字节概念的容器中，您可以对其执行的大多数操作也不知道多字节的概念!即使是像.substr()这样简单的东西也可能导致无效的(子)字符串，因为您在多字节序列中间进行了分割。

As soon as you try something like std::toupper( 'ß' ), or std::tolower( 'Σ' ) in any encoding, you are in trouble. Because 1), the standard only ever operates on one character at a time, so it simply cannot turn ß into SS as would be correct. And 2), the standard only ever operates on one character at a time, so it cannot decide whether Σ is in the middle of a word (where σ would be correct), or at the end (ς). Another example would be std::tolower( 'I' ), which should yield different results depending on the locale -- virtually everywhere you would expect i, but in Turkey ı (LATIN SMALL LETTER DOTLESS I) is the correct answer (which, again, is more than one byte in UTF-8 encoding).

因此，任何一次处理一个字符的大小写转换，或者更糟，一次处理一个字节的大小写转换，都在设计上被破坏了。这包括目前存在的所有std::变体。

还有一点，标准库能够做什么，取决于运行软件的机器支持哪些地区…如果您的目标区域位于客户机上不支持的区域之一，该怎么办?

因此，您真正要寻找的是一个能够正确处理所有这些问题的字符串类，而不是std::basic_string<>变量。

(c++ 11注:std::u16string和std::u32string较好，但仍不完美。c++ 20带来了std::u8string，但所有这些都是指定编码。在许多其他方面，他们仍然对Unicode机制一无所知，比如标准化、排序……)

虽然Boost看起来不错，API方面，Boost。Locale基本上是ICU的包装器。如果Boost是使用ICU支持编译的……如果不是，Boost。区域设置仅限于为标准库编译的区域设置支持。

相信我，让Boost与ICU一起编译有时真的很痛苦。(Windows中没有包含ICU的预编译二进制文件，所以你必须在应用程序中提供它们，这就打开了一个全新的蠕虫…)

所以我个人建议直接从马的嘴里获得完整的Unicode支持，并直接使用ICU库:

#include <unicode/unistr.h>
#include <unicode/ustream.h>
#include <unicode/locid.h>

#include <iostream>

int main()
{
    /*                          "Odysseus" */
    char const * someString = u8"ΟΔΥΣΣΕΥΣ";
    icu::UnicodeString someUString( someString, "UTF-8" );
    // Setting the locale explicitly here for completeness.
    // Usually you would use the user-specified system locale,
    // which *does* make a difference (see ı vs. i above).
    std::cout << someUString.toLower( "el_GR" ) << "\n";
    std::cout << someUString.toUpper( "el_GR" ) << "\n";
    return 0;
}

编译(本例中使用g++):

g++ -Wall example.cpp -licuuc -licuio

这给:

ὀδυσσεύς

注意，单词中间的Σ<-> Σ转换，单词末尾的Σ<->ς转换。没有<算法>的解决方案可以给你。

2014-06-05 15:06:39

看看优秀的c++17 cppp -unicodelib (GitHub)。它是单文件且只包含头文件。


#include <exception>
#include <iostream>
#include <codecvt>

// cpp-unicodelib, downloaded from GitHub
#include "unicodelib.h"
#include "unicodelib_encodings.h"

using namespace std;
using namespace unicode;

// converter that allows displaying a Unicode32 string
wstring_convert<codecvt_utf8<char32_t>, char32_t> converter;

std::u32string  in = U"Je suis là!";
cout << converter.to_bytes(in) << endl;

std::u32string  lc = to_lowercase(in);
cout << converter.to_bytes(lc) << endl;

输出

Je suis là!
je suis là!

2022-04-25 13:18:34

c++不需要为std::string实现ower或toupper方法，但可以用于char。人们可以很容易地读取字符串的每个字符，将其转换为所需的大小写，并将其放回字符串。不使用任何第三方库的示例代码:

#include<iostream>
    
int main(){
    std::string str = std::string("How ARe You");
    for(char &ch : str){
        ch = std::tolower(ch);
    }
    std::cout<<str<<std::endl;
    return 0;
}

对于字符串上基于字符的操作:对于字符串中的每个字符

2019-03-17 14:35:38

为ASCII字符串to_lower添加了一些可选库，它们都是生产级的，并进行了微优化，预计会比这里现有的答案更快(TODO:添加基准测试结果)。

Facebook的愚蠢:

void toLowerAscii(char* str, size_t length)

谷歌的绳降：

void AsciiStrToLower(std::string* s);

2021-06-22 09:49:44

如何转换一个实例的std::字符串小写

推荐文章

最新文章

标签