什么是一个好的完整正则表达式或其他一些过程,将采取标题:
如何将标题更改为URL的一部分,如堆栈溢出?
然后把它变成
how-do-you-change-a-title-to-be-part-of-the-url-like-stack-overflow
在堆栈溢出的seo友好的url中使用?
我使用的开发环境是Ruby on Rails,但是如果有一些其他特定于平台的解决方案(。NET, PHP, Django),我也很想看到这些。
我相信我(或其他读者)在不同的平台上也会遇到同样的问题。
我使用自定义路由,我主要想知道如何改变字符串的所有特殊字符被删除,它都是小写的,所有空白被替换。
T-SQL实现,改编自dbo。UrlEncode:
CREATE FUNCTION dbo.Slug(@string varchar(1024))
RETURNS varchar(3072)
AS
BEGIN
DECLARE @count int, @c char(1), @i int, @slug varchar(3072)
SET @string = replace(lower(ltrim(rtrim(@string))),' ','-')
SET @count = Len(@string)
SET @i = 1
SET @slug = ''
WHILE (@i <= @count)
BEGIN
SET @c = substring(@string, @i, 1)
IF @c LIKE '[a-z0-9--]'
SET @slug = @slug + @c
SET @i = @i +1
END
RETURN @slug
END
下面是Jeff代码的我的版本。我做了以下修改:
The hyphens were appended in such a way that one could be added, and then need removing as it was the last character in the string. That is, we never want “my-slug-”. This means an extra string allocation to remove it on this edge case. I’ve worked around this by delay-hyphening. If you compare my code to Jeff’s the logic for this is easy to follow.
His approach is purely lookup based and missed a lot of characters I found in examples while researching on Stack Overflow. To counter this, I first peform a normalisation pass (AKA collation mentioned in Meta Stack Overflow question Non US-ASCII characters dropped from full (profile) URL), and then ignore any characters outside the acceptable ranges. This works most of the time...
... For when it doesn’t I’ve also had to add a lookup table. As mentioned above, some characters don’t map to a low ASCII value when normalised. Rather than drop these I’ve got a manual list of exceptions that is doubtless full of holes, but it is better than nothing. The normalisation code was inspired by Jon Hanna’s great post in Stack Overflow question How can I remove accents on a string?.
The case conversion is now also optional.
public static class Slug
{
public static string Create(bool toLower, params string[] values)
{
return Create(toLower, String.Join("-", values));
}
/// <summary>
/// Creates a slug.
/// References:
/// http://www.unicode.org/reports/tr15/tr15-34.html
/// https://meta.stackexchange.com/questions/7435/non-us-ascii-characters-dropped-from-full-profile-url/7696#7696
/// https://stackoverflow.com/questions/25259/how-do-you-include-a-webpage-title-as-part-of-a-webpage-url/25486#25486
/// https://stackoverflow.com/questions/3769457/how-can-i-remove-accents-on-a-string
/// </summary>
/// <param name="toLower"></param>
/// <param name="normalised"></param>
/// <returns></returns>
public static string Create(bool toLower, string value)
{
if (value == null)
return "";
var normalised = value.Normalize(NormalizationForm.FormKD);
const int maxlen = 80;
int len = normalised.Length;
bool prevDash = false;
var sb = new StringBuilder(len);
char c;
for (int i = 0; i < len; i++)
{
c = normalised[i];
if ((c >= 'a' && c <= 'z') || (c >= '0' && c <= '9'))
{
if (prevDash)
{
sb.Append('-');
prevDash = false;
}
sb.Append(c);
}
else if (c >= 'A' && c <= 'Z')
{
if (prevDash)
{
sb.Append('-');
prevDash = false;
}
// Tricky way to convert to lowercase
if (toLower)
sb.Append((char)(c | 32));
else
sb.Append(c);
}
else if (c == ' ' || c == ',' || c == '.' || c == '/' || c == '\\' || c == '-' || c == '_' || c == '=')
{
if (!prevDash && sb.Length > 0)
{
prevDash = true;
}
}
else
{
string swap = ConvertEdgeCases(c, toLower);
if (swap != null)
{
if (prevDash)
{
sb.Append('-');
prevDash = false;
}
sb.Append(swap);
}
}
if (sb.Length == maxlen)
break;
}
return sb.ToString();
}
static string ConvertEdgeCases(char c, bool toLower)
{
string swap = null;
switch (c)
{
case 'ı':
swap = "i";
break;
case 'ł':
swap = "l";
break;
case 'Ł':
swap = toLower ? "l" : "L";
break;
case 'đ':
swap = "d";
break;
case 'ß':
swap = "ss";
break;
case 'ø':
swap = "o";
break;
case 'Þ':
swap = "th";
break;
}
return swap;
}
}
关于更多的细节,单元测试,以及为什么Facebook的URL方案比堆栈溢出更聪明的解释,我在我的博客上有一个扩展版本。