什么是一个好的完整正则表达式或其他一些过程,将采取标题:
如何将标题更改为URL的一部分,如堆栈溢出?
然后把它变成
how-do-you-change-a-title-to-be-part-of-the-url-like-stack-overflow
在堆栈溢出的seo友好的url中使用?
我使用的开发环境是Ruby on Rails,但是如果有一些其他特定于平台的解决方案(。NET, PHP, Django),我也很想看到这些。
我相信我(或其他读者)在不同的平台上也会遇到同样的问题。
我使用自定义路由,我主要想知道如何改变字符串的所有特殊字符被删除,它都是小写的,所有空白被替换。
我们是这样做的。注意,可能有比你第一眼意识到的更多的边缘条件。
这是第二个版本,展开后的性能提高了5倍(是的,我对它进行了基准测试)。我认为我应该优化它,因为这个函数可以在每页被调用数百次。
/// <summary>
/// Produces optional, URL-friendly version of a title, "like-this-one".
/// hand-tuned for speed, reflects performance refactoring contributed
/// by John Gietzen (user otac0n)
/// </summary>
public static string URLFriendly(string title)
{
if (title == null) return "";
const int maxlen = 80;
int len = title.Length;
bool prevdash = false;
var sb = new StringBuilder(len);
char c;
for (int i = 0; i < len; i++)
{
c = title[i];
if ((c >= 'a' && c <= 'z') || (c >= '0' && c <= '9'))
{
sb.Append(c);
prevdash = false;
}
else if (c >= 'A' && c <= 'Z')
{
// tricky way to convert to lowercase
sb.Append((char)(c | 32));
prevdash = false;
}
else if (c == ' ' || c == ',' || c == '.' || c == '/' ||
c == '\\' || c == '-' || c == '_' || c == '=')
{
if (!prevdash && sb.Length > 0)
{
sb.Append('-');
prevdash = true;
}
}
else if ((int)c >= 128)
{
int prevlen = sb.Length;
sb.Append(RemapInternationalCharToAscii(c));
if (prevlen != sb.Length) prevdash = false;
}
if (i == maxlen) break;
}
if (prevdash)
return sb.ToString().Substring(0, sb.Length - 1);
else
return sb.ToString();
}
要查看被替换的代码的前一个版本(但在功能上与之相当,而且快了5倍),请查看这篇文章的修订历史(单击日期链接)。
另外,RemapInternationalCharToAscii方法的源代码可以在这里找到。
下面是Jeff代码的我的版本。我做了以下修改:
The hyphens were appended in such a way that one could be added, and then need removing as it was the last character in the string. That is, we never want “my-slug-”. This means an extra string allocation to remove it on this edge case. I’ve worked around this by delay-hyphening. If you compare my code to Jeff’s the logic for this is easy to follow.
His approach is purely lookup based and missed a lot of characters I found in examples while researching on Stack Overflow. To counter this, I first peform a normalisation pass (AKA collation mentioned in Meta Stack Overflow question Non US-ASCII characters dropped from full (profile) URL), and then ignore any characters outside the acceptable ranges. This works most of the time...
... For when it doesn’t I’ve also had to add a lookup table. As mentioned above, some characters don’t map to a low ASCII value when normalised. Rather than drop these I’ve got a manual list of exceptions that is doubtless full of holes, but it is better than nothing. The normalisation code was inspired by Jon Hanna’s great post in Stack Overflow question How can I remove accents on a string?.
The case conversion is now also optional.
public static class Slug
{
public static string Create(bool toLower, params string[] values)
{
return Create(toLower, String.Join("-", values));
}
/// <summary>
/// Creates a slug.
/// References:
/// http://www.unicode.org/reports/tr15/tr15-34.html
/// https://meta.stackexchange.com/questions/7435/non-us-ascii-characters-dropped-from-full-profile-url/7696#7696
/// https://stackoverflow.com/questions/25259/how-do-you-include-a-webpage-title-as-part-of-a-webpage-url/25486#25486
/// https://stackoverflow.com/questions/3769457/how-can-i-remove-accents-on-a-string
/// </summary>
/// <param name="toLower"></param>
/// <param name="normalised"></param>
/// <returns></returns>
public static string Create(bool toLower, string value)
{
if (value == null)
return "";
var normalised = value.Normalize(NormalizationForm.FormKD);
const int maxlen = 80;
int len = normalised.Length;
bool prevDash = false;
var sb = new StringBuilder(len);
char c;
for (int i = 0; i < len; i++)
{
c = normalised[i];
if ((c >= 'a' && c <= 'z') || (c >= '0' && c <= '9'))
{
if (prevDash)
{
sb.Append('-');
prevDash = false;
}
sb.Append(c);
}
else if (c >= 'A' && c <= 'Z')
{
if (prevDash)
{
sb.Append('-');
prevDash = false;
}
// Tricky way to convert to lowercase
if (toLower)
sb.Append((char)(c | 32));
else
sb.Append(c);
}
else if (c == ' ' || c == ',' || c == '.' || c == '/' || c == '\\' || c == '-' || c == '_' || c == '=')
{
if (!prevDash && sb.Length > 0)
{
prevDash = true;
}
}
else
{
string swap = ConvertEdgeCases(c, toLower);
if (swap != null)
{
if (prevDash)
{
sb.Append('-');
prevDash = false;
}
sb.Append(swap);
}
}
if (sb.Length == maxlen)
break;
}
return sb.ToString();
}
static string ConvertEdgeCases(char c, bool toLower)
{
string swap = null;
switch (c)
{
case 'ı':
swap = "i";
break;
case 'ł':
swap = "l";
break;
case 'Ł':
swap = toLower ? "l" : "L";
break;
case 'đ':
swap = "d";
break;
case 'ß':
swap = "ss";
break;
case 'ø':
swap = "o";
break;
case 'Þ':
swap = "th";
break;
}
return swap;
}
}
关于更多的细节,单元测试,以及为什么Facebook的URL方案比堆栈溢出更聪明的解释,我在我的博客上有一个扩展版本。
假设你的模型类有一个title属性,你可以简单地覆盖模型中的to_param方法,就像这样:
def to_param
title.downcase.gsub(/ /, '-')
end
Railscast的这集有所有的细节。你也可以使用这个来确保标题只包含有效字符:
validates_format_of :title, :with => /^[a-z0-9-]+$/,
:message => 'can only contain letters, numbers and hyphens'
下面是我的(稍慢,但写起来很有趣)Jeff的代码版本:
public static string URLFriendly(string title)
{
char? prevRead = null,
prevWritten = null;
var seq =
from c in title
let norm = RemapInternationalCharToAscii(char.ToLowerInvariant(c).ToString())[0]
let keep = char.IsLetterOrDigit(norm)
where prevRead.HasValue || keep
let replaced = keep ? norm
: prevWritten != '-' ? '-'
: (char?)null
where replaced != null
let s = replaced + (prevRead == null ? ""
: norm == '#' && "cf".Contains(prevRead.Value) ? "sharp"
: norm == '+' ? "plus"
: "")
let _ = prevRead = norm
from written in s
let __ = prevWritten = written
select written;
const int maxlen = 80;
return string.Concat(seq.Take(maxlen)).TrimEnd('-');
}
public static string RemapInternationalCharToAscii(string text)
{
var seq = text.Normalize(NormalizationForm.FormD)
.Where(c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark);
return string.Concat(seq).Normalize(NormalizationForm.FormC);
}
我的测试字符串:
“我喜欢c#, f#, c++,还有……焦糖布丁! !他们看到我在编程……他们hatin”……想抓住我的下流行为……”