给定字符串“ThisStringHasNoSpacesButItDoesHaveCapitals”,什么是在大写字母之前添加空格的最好方法。所以结尾字符串是"This string Has No space But It Does Have大写"
下面是我使用正则表达式的尝试
System.Text.RegularExpressions.Regex.Replace(value, "[A-Z]", " $0")
给定字符串“ThisStringHasNoSpacesButItDoesHaveCapitals”,什么是在大写字母之前添加空格的最好方法。所以结尾字符串是"This string Has No space But It Does Have大写"
下面是我使用正则表达式的尝试
System.Text.RegularExpressions.Regex.Replace(value, "[A-Z]", " $0")
当前回答
这里有一个更彻底的解决方案,它没有在单词前面放空格:
注意:我使用了多个regex(不简洁,但它也可以处理首字母缩略词和单字母单词)
Dim s As String = "ThisStringHasNoSpacesButItDoesHaveCapitals"
s = System.Text.RegularExpressions.Regex.Replace(s, "([a-z])([A-Z](?=[A-Z])[a-z]*)", "$1 $2")
s = System.Text.RegularExpressions.Regex.Replace(s, "([A-Z])([A-Z][a-z])", "$1 $2")
s = System.Text.RegularExpressions.Regex.Replace(s, "([a-z])([A-Z][a-z])", "$1 $2")
s = System.Text.RegularExpressions.Regex.Replace(s, "([a-z])([A-Z][a-z])", "$1 $2") // repeat a second time
In:
"ThisStringHasNoSpacesButItDoesHaveCapitals"
"IAmNotAGoat"
"LOLThatsHilarious!"
"ThisIsASMSMessage"
Out:
"This String Has No Spaces But It Does Have Capitals"
"I Am Not A Goat"
"LOL Thats Hilarious!"
"This Is ASMS Message" // (Difficult to handle single letter words when they are next to acronyms.)
其他回答
仅由ASCII字符组成的输入字符串的c#解决方案。regex结合了反向回溯来忽略出现在字符串开头的大写字母。使用Regex.Replace()返回所需的字符串。
参见regex101.com演示。
using System;
using System.Text.RegularExpressions;
public class RegexExample
{
public static void Main()
{
var text = "ThisStringHasNoSpacesButItDoesHaveCapitals";
// Use negative lookbehind to match all capital letters
// that do not appear at the beginning of the string.
var pattern = "(?<!^)([A-Z])";
var rgx = new Regex(pattern);
var result = rgx.Replace(text, " $1");
Console.WriteLine("Input: [{0}]\nOutput: [{1}]", text, result);
}
}
预期的输出:
Input: [ThisStringHasNoSpacesButItDoesHaveCapitals]
Output: [This String Has No Spaces But It Does Have Capitals]
更新:这里有一个变种,也将处理首字母缩写(大写字母序列)。
参见regex101.com演示和ideone.com演示。
using System;
using System.Text.RegularExpressions;
public class RegexExample
{
public static void Main()
{
var text = "ThisStringHasNoSpacesASCIIButItDoesHaveCapitalsLINQ";
// Use positive lookbehind to locate all upper-case letters
// that are preceded by a lower-case letter.
var patternPart1 = "(?<=[a-z])([A-Z])";
// Used positive lookbehind and lookahead to locate all
// upper-case letters that are preceded by an upper-case
// letter and followed by a lower-case letter.
var patternPart2 = "(?<=[A-Z])([A-Z])(?=[a-z])";
var pattern = patternPart1 + "|" + patternPart2;
var rgx = new Regex(pattern);
var result = rgx.Replace(text, " $1$2");
Console.WriteLine("Input: [{0}]\nOutput: [{1}]", text, result);
}
}
预期的输出:
Input: [ThisStringHasNoSpacesASCIIButItDoesHaveCapitalsLINQ]
Output: [This String Has No Spaces ASCII But It Does Have Capitals LINQ]
受到二元忧虑者答案的启发,我尝试了一下。
结果如下:
/// <summary>
/// String Extension Method
/// Adds white space to strings based on Upper Case Letters
/// </summary>
/// <example>
/// strIn => "HateJPMorgan"
/// preserveAcronyms false => "Hate JP Morgan"
/// preserveAcronyms true => "Hate JPMorgan"
/// </example>
/// <param name="strIn">to evaluate</param>
/// <param name="preserveAcronyms" >determines saving acronyms (Optional => false) </param>
public static string AddSpaces(this string strIn, bool preserveAcronyms = false)
{
if (string.IsNullOrWhiteSpace(strIn))
return String.Empty;
var stringBuilder = new StringBuilder(strIn.Length * 2)
.Append(strIn[0]);
int i;
for (i = 1; i < strIn.Length - 1; i++)
{
var c = strIn[i];
if (Char.IsUpper(c) && (Char.IsLower(strIn[i - 1]) || (preserveAcronyms && Char.IsLower(strIn[i + 1]))))
stringBuilder.Append(' ');
stringBuilder.Append(c);
}
return stringBuilder.Append(strIn[i]).ToString();
}
测试使用秒表运行10000000次迭代和各种字符串长度和组合。
平均比二进制忧虑者的答案快50%(可能多一点)。
欢迎来到Unicode
所有这些解决方案对于现代文本来说本质上都是错误的。你需要使用能理解大小写的东西。由于Bob要求使用其他语言,我将为Perl提供两种语言。
我提供了四种解决方案,从最坏到最好。只有最好的人才是对的。其他人都有问题。下面是一个测试运行,向您展示什么可行,什么不可行,以及在哪里。我用了下划线,这样你们就能看到空格放在了哪里,我把所有错的地方都标记了出来。
Testing TheLoneRanger
Worst: The_Lone_Ranger
Ok: The_Lone_Ranger
Better: The_Lone_Ranger
Best: The_Lone_Ranger
Testing MountMᶜKinleyNationalPark
[WRONG] Worst: Mount_MᶜKinley_National_Park
[WRONG] Ok: Mount_MᶜKinley_National_Park
[WRONG] Better: Mount_MᶜKinley_National_Park
Best: Mount_Mᶜ_Kinley_National_Park
Testing ElÁlamoTejano
[WRONG] Worst: ElÁlamo_Tejano
Ok: El_Álamo_Tejano
Better: El_Álamo_Tejano
Best: El_Álamo_Tejano
Testing TheÆvarArnfjörðBjarmason
[WRONG] Worst: TheÆvar_ArnfjörðBjarmason
Ok: The_Ævar_Arnfjörð_Bjarmason
Better: The_Ævar_Arnfjörð_Bjarmason
Best: The_Ævar_Arnfjörð_Bjarmason
Testing IlCaffèMacchiato
[WRONG] Worst: Il_CaffèMacchiato
Ok: Il_Caffè_Macchiato
Better: Il_Caffè_Macchiato
Best: Il_Caffè_Macchiato
Testing MisterDženanLjubović
[WRONG] Worst: MisterDženanLjubović
[WRONG] Ok: MisterDženanLjubović
Better: Mister_Dženan_Ljubović
Best: Mister_Dženan_Ljubović
Testing OleKingHenryⅧ
[WRONG] Worst: Ole_King_HenryⅧ
[WRONG] Ok: Ole_King_HenryⅧ
[WRONG] Better: Ole_King_HenryⅧ
Best: Ole_King_Henry_Ⅷ
Testing CarlosⅤºElEmperador
[WRONG] Worst: CarlosⅤºEl_Emperador
[WRONG] Ok: CarlosⅤº_El_Emperador
[WRONG] Better: CarlosⅤº_El_Emperador
Best: Carlos_Ⅴº_El_Emperador
顺便说一下,这里几乎所有人都选择了第一种方式,即标记为“最差”的方式。少数人选择了标记为“OK”的第二种方式。但是在我之前没有人告诉过你如何做“更好”或“最好”的方法。
下面是带有四个方法的测试程序:
#!/usr/bin/env perl
use utf8;
use strict;
use warnings;
# First I'll prove these are fine variable names:
my (
$TheLoneRanger ,
$MountMᶜKinleyNationalPark ,
$ElÁlamoTejano ,
$TheÆvarArnfjörðBjarmason ,
$IlCaffèMacchiato ,
$MisterDženanLjubović ,
$OleKingHenryⅧ ,
$CarlosⅤºElEmperador ,
);
# Now I'll load up some string with those values in them:
my @strings = qw{
TheLoneRanger
MountMᶜKinleyNationalPark
ElÁlamoTejano
TheÆvarArnfjörðBjarmason
IlCaffèMacchiato
MisterDženanLjubović
OleKingHenryⅧ
CarlosⅤºElEmperador
};
my($new, $best, $ok);
my $mask = " %10s %-8s %s\n";
for my $old (@strings) {
print "Testing $old\n";
($best = $old) =~ s/(?<=\p{Lowercase})(?=[\p{Uppercase}\p{Lt}])/_/g;
($new = $old) =~ s/(?<=[a-z])(?=[A-Z])/_/g;
$ok = ($new ne $best) && "[WRONG]";
printf $mask, $ok, "Worst:", $new;
($new = $old) =~ s/(?<=\p{Ll})(?=\p{Lu})/_/g;
$ok = ($new ne $best) && "[WRONG]";
printf $mask, $ok, "Ok:", $new;
($new = $old) =~ s/(?<=\p{Ll})(?=[\p{Lu}\p{Lt}])/_/g;
$ok = ($new ne $best) && "[WRONG]";
printf $mask, $ok, "Better:", $new;
($new = $old) =~ s/(?<=\p{Lowercase})(?=[\p{Uppercase}\p{Lt}])/_/g;
$ok = ($new ne $best) && "[WRONG]";
printf $mask, $ok, "Best:", $new;
}
当你能在这个数据集上得到与“最佳”相同的分数时,你就知道你做对了。在那之前,你还没有。这里没有人比“还行”做得更好,大多数人都做得“最差”。我期待看到有人发布正确的ℂ代码。
我注意到,StackOverflow的高亮代码是悲惨的笨拙再次。他们所做的一切都和这里提到的其他糟糕的方法一样(大部分但不是全部)。难道不是早就该让ASCII停止使用了吗?这已经没有意义了,假装这是你的全部是大错特错的。这会导致糟糕的代码。
发现很多这些答案是相当迟钝的,但我还没有完全测试我的解决方案,但它适用于我需要的,应该处理首字母缩略词,并且比其他IMO更紧凑/可读:
private string CamelCaseToSpaces(string s)
{
if (string.IsNullOrEmpty(s)) return string.Empty;
StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < s.Length; i++)
{
stringBuilder.Append(s[i]);
int nextChar = i + 1;
if (nextChar < s.Length && char.IsUpper(s[nextChar]) && !char.IsUpper(s[i]))
{
stringBuilder.Append(" ");
}
}
return stringBuilder.ToString();
}
你拥有的一切都很完美。只需要记住将value重新赋值给这个函数的返回值即可。
value = System.Text.RegularExpressions.Regex.Replace(value, "[A-Z]", " $0");