给定字符串“ThisStringHasNoSpacesButItDoesHaveCapitals”,什么是在大写字母之前添加空格的最好方法。所以结尾字符串是"This string Has No space But It Does Have大写"

下面是我使用正则表达式的尝试

System.Text.RegularExpressions.Regex.Replace(value, "[A-Z]", " $0")

当前回答

这里有一个更彻底的解决方案,它没有在单词前面放空格:

注意:我使用了多个regex(不简洁,但它也可以处理首字母缩略词和单字母单词)

Dim s As String = "ThisStringHasNoSpacesButItDoesHaveCapitals"
s = System.Text.RegularExpressions.Regex.Replace(s, "([a-z])([A-Z](?=[A-Z])[a-z]*)", "$1 $2")
s = System.Text.RegularExpressions.Regex.Replace(s, "([A-Z])([A-Z][a-z])", "$1 $2")
s = System.Text.RegularExpressions.Regex.Replace(s, "([a-z])([A-Z][a-z])", "$1 $2")
s = System.Text.RegularExpressions.Regex.Replace(s, "([a-z])([A-Z][a-z])", "$1 $2") // repeat a second time

In:

"ThisStringHasNoSpacesButItDoesHaveCapitals"
"IAmNotAGoat"
"LOLThatsHilarious!"
"ThisIsASMSMessage"

Out:

"This String Has No Spaces But It Does Have Capitals"
"I Am Not A Goat"
"LOL Thats Hilarious!"
"This Is ASMS Message" // (Difficult to handle single letter words when they are next to acronyms.)

其他回答

这个问题包括首字母缩写词和首字母缩写复数,比公认的答案快一点:

public string Sentencify(string value)
{
    if (string.IsNullOrWhiteSpace(value))
        return string.Empty;

    string final = string.Empty;
    for (int i = 0; i < value.Length; i++)
    {
        if (i != 0 && Char.IsUpper(value[i]))
        {
            if (!Char.IsUpper(value[i - 1]))
                final += " ";
            else if (i < (value.Length - 1))
            {
                if (!Char.IsUpper(value[i + 1]) && !((value.Length >= i && value[i + 1] == 's') ||
                                                     (value.Length >= i + 1 && value[i + 1] == 'e' && value[i + 2] == 's')))
                    final += " ";
            }
        }

        final += value[i];
    }

    return final;
}

通过以下测试:

string test1 = "RegularOTs";
string test2 = "ThisStringHasNoSpacesASCIIButItDoesHaveCapitalsLINQ";
string test3 = "ThisStringHasNoSpacesButItDoesHaveCapitals";

我开始做一个简单的扩展方法,基于二进制Worrier的代码,它将正确地处理首字母缩略词,并且是可重复的(不会破坏已经间隔的单词)。这是我的结果。

public static string UnPascalCase(this string text)
{
    if (string.IsNullOrWhiteSpace(text))
        return "";
    var newText = new StringBuilder(text.Length * 2);
    newText.Append(text[0]);
    for (int i = 1; i < text.Length; i++)
    {
        var currentUpper = char.IsUpper(text[i]);
        var prevUpper = char.IsUpper(text[i - 1]);
        var nextUpper = (text.Length > i + 1) ? char.IsUpper(text[i + 1]) || char.IsWhiteSpace(text[i + 1]): prevUpper;
        var spaceExists = char.IsWhiteSpace(text[i - 1]);
        if (currentUpper && !spaceExists && (!nextUpper || !prevUpper))
                newText.Append(' ');
        newText.Append(text[i]);
    }
    return newText.ToString();
}

下面是这个函数通过的单元测试用例。我把他建议的大部分案例都加到了这个清单上。其中三个没有通过的(两个只是罗马数字)被注释掉了:

Assert.AreEqual("For You And I", "ForYouAndI".UnPascalCase());
Assert.AreEqual("For You And The FBI", "ForYouAndTheFBI".UnPascalCase());
Assert.AreEqual("A Man A Plan A Canal Panama", "AManAPlanACanalPanama".UnPascalCase());
Assert.AreEqual("DNS Server", "DNSServer".UnPascalCase());
Assert.AreEqual("For You And I", "For You And I".UnPascalCase());
Assert.AreEqual("Mount Mᶜ Kinley National Park", "MountMᶜKinleyNationalPark".UnPascalCase());
Assert.AreEqual("El Álamo Tejano", "ElÁlamoTejano".UnPascalCase());
Assert.AreEqual("The Ævar Arnfjörð Bjarmason", "TheÆvarArnfjörðBjarmason".UnPascalCase());
Assert.AreEqual("Il Caffè Macchiato", "IlCaffèMacchiato".UnPascalCase());
//Assert.AreEqual("Mister Dženan Ljubović", "MisterDženanLjubović".UnPascalCase());
//Assert.AreEqual("Ole King Henry Ⅷ", "OleKingHenryⅧ".UnPascalCase());
//Assert.AreEqual("Carlos Ⅴº El Emperador", "CarlosⅤºElEmperador".UnPascalCase());
Assert.AreEqual("For You And The FBI", "For You And The FBI".UnPascalCase());
Assert.AreEqual("A Man A Plan A Canal Panama", "A Man A Plan A Canal Panama".UnPascalCase());
Assert.AreEqual("DNS Server", "DNS Server".UnPascalCase());
Assert.AreEqual("Mount Mᶜ Kinley National Park", "Mount Mᶜ Kinley National Park".UnPascalCase());

我知道这是一个旧的,但这是一个扩展我使用当我需要这样做:

public static class Extensions
{
    public static string ToSentence( this string Input )
    {
        return new string(Input.SelectMany((c, i) => i > 0 && char.IsUpper(c) ? new[] { ' ', c } : new[] { c }).ToArray());
    }
}

这将允许您使用MyCasedString.ToSentence()

在小写字母、大写字母或数字后添加空格的简单方法。

    string AddSpacesToSentence(string value, bool spaceLowerChar = true, bool spaceDigitChar = true, bool spaceSymbolChar = false)
    {
        var result = "";

        for (int i = 0; i < value.Length; i++)
        {
            char currentChar = value[i];
            char nextChar = value[i < value.Length - 1 ? i + 1 : value.Length - 1];

            if (spaceLowerChar && char.IsLower(currentChar) && !char.IsLower(nextChar))
            {
                result += value[i] + " ";
            }
            else if (spaceDigitChar && char.IsDigit(currentChar) && !char.IsDigit(nextChar))
            {
                result += value[i] + " ";
            }
            else if(spaceSymbolChar && char.IsSymbol(currentChar) && !char.IsSymbol(nextChar))
            {
                result += value[i];
            }
            else
            {
                result += value[i];
            }
        }

        return result;
    }

欢迎来到Unicode

所有这些解决方案对于现代文本来说本质上都是错误的。你需要使用能理解大小写的东西。由于Bob要求使用其他语言,我将为Perl提供两种语言。

我提供了四种解决方案,从最坏到最好。只有最好的人才是对的。其他人都有问题。下面是一个测试运行,向您展示什么可行,什么不可行,以及在哪里。我用了下划线,这样你们就能看到空格放在了哪里,我把所有错的地方都标记了出来。

Testing TheLoneRanger
               Worst:    The_Lone_Ranger
               Ok:       The_Lone_Ranger
               Better:   The_Lone_Ranger
               Best:     The_Lone_Ranger
Testing MountMᶜKinleyNationalPark
     [WRONG]   Worst:    Mount_MᶜKinley_National_Park
     [WRONG]   Ok:       Mount_MᶜKinley_National_Park
     [WRONG]   Better:   Mount_MᶜKinley_National_Park
               Best:     Mount_Mᶜ_Kinley_National_Park
Testing ElÁlamoTejano
     [WRONG]   Worst:    ElÁlamo_Tejano
               Ok:       El_Álamo_Tejano
               Better:   El_Álamo_Tejano
               Best:     El_Álamo_Tejano
Testing TheÆvarArnfjörðBjarmason
     [WRONG]   Worst:    TheÆvar_ArnfjörðBjarmason
               Ok:       The_Ævar_Arnfjörð_Bjarmason
               Better:   The_Ævar_Arnfjörð_Bjarmason
               Best:     The_Ævar_Arnfjörð_Bjarmason
Testing IlCaffèMacchiato
     [WRONG]   Worst:    Il_CaffèMacchiato
               Ok:       Il_Caffè_Macchiato
               Better:   Il_Caffè_Macchiato
               Best:     Il_Caffè_Macchiato
Testing MisterDženanLjubović
     [WRONG]   Worst:    MisterDženanLjubović
     [WRONG]   Ok:       MisterDženanLjubović
               Better:   Mister_Dženan_Ljubović
               Best:     Mister_Dženan_Ljubović
Testing OleKingHenryⅧ
     [WRONG]   Worst:    Ole_King_HenryⅧ
     [WRONG]   Ok:       Ole_King_HenryⅧ
     [WRONG]   Better:   Ole_King_HenryⅧ
               Best:     Ole_King_Henry_Ⅷ
Testing CarlosⅤºElEmperador
     [WRONG]   Worst:    CarlosⅤºEl_Emperador
     [WRONG]   Ok:       CarlosⅤº_El_Emperador
     [WRONG]   Better:   CarlosⅤº_El_Emperador
               Best:     Carlos_Ⅴº_El_Emperador

顺便说一下,这里几乎所有人都选择了第一种方式,即标记为“最差”的方式。少数人选择了标记为“OK”的第二种方式。但是在我之前没有人告诉过你如何做“更好”或“最好”的方法。

下面是带有四个方法的测试程序:

#!/usr/bin/env perl
use utf8;
use strict;
use warnings;

# First I'll prove these are fine variable names:
my (
    $TheLoneRanger              ,
    $MountMᶜKinleyNationalPark  ,
    $ElÁlamoTejano              ,
    $TheÆvarArnfjörðBjarmason   ,
    $IlCaffèMacchiato           ,
    $MisterDženanLjubović         ,
    $OleKingHenryⅧ              ,
    $CarlosⅤºElEmperador        ,
);

# Now I'll load up some string with those values in them:
my @strings = qw{
    TheLoneRanger
    MountMᶜKinleyNationalPark
    ElÁlamoTejano
    TheÆvarArnfjörðBjarmason
    IlCaffèMacchiato
    MisterDženanLjubović
    OleKingHenryⅧ
    CarlosⅤºElEmperador
};

my($new, $best, $ok);
my $mask = "  %10s   %-8s  %s\n";

for my $old (@strings) {
    print "Testing $old\n";
    ($best = $old) =~ s/(?<=\p{Lowercase})(?=[\p{Uppercase}\p{Lt}])/_/g;

    ($new = $old) =~ s/(?<=[a-z])(?=[A-Z])/_/g;
    $ok = ($new ne $best) && "[WRONG]";
    printf $mask, $ok, "Worst:", $new;

    ($new = $old) =~ s/(?<=\p{Ll})(?=\p{Lu})/_/g;
    $ok = ($new ne $best) && "[WRONG]";
    printf $mask, $ok, "Ok:", $new;

    ($new = $old) =~ s/(?<=\p{Ll})(?=[\p{Lu}\p{Lt}])/_/g;
    $ok = ($new ne $best) && "[WRONG]";
    printf $mask, $ok, "Better:", $new;

    ($new = $old) =~ s/(?<=\p{Lowercase})(?=[\p{Uppercase}\p{Lt}])/_/g;
    $ok = ($new ne $best) && "[WRONG]";
    printf $mask, $ok, "Best:", $new;
}

当你能在这个数据集上得到与“最佳”相同的分数时,你就知道你做对了。在那之前,你还没有。这里没有人比“还行”做得更好,大多数人都做得“最差”。我期待看到有人发布正确的ℂ代码。

我注意到,StackOverflow的高亮代码是悲惨的笨拙再次。他们所做的一切都和这里提到的其他糟糕的方法一样(大部分但不是全部)。难道不是早就该让ASCII停止使用了吗?这已经没有意义了,假装这是你的全部是大错特错的。这会导致糟糕的代码。