什么是一个好的完整正则表达式或其他一些过程,将采取标题:

如何将标题更改为URL的一部分,如堆栈溢出?

然后把它变成

how-do-you-change-a-title-to-be-part-of-the-url-like-stack-overflow

在堆栈溢出的seo友好的url中使用?

我使用的开发环境是Ruby on Rails,但是如果有一些其他特定于平台的解决方案(。NET, PHP, Django),我也很想看到这些。

我相信我(或其他读者)在不同的平台上也会遇到同样的问题。

我使用自定义路由,我主要想知道如何改变字符串的所有特殊字符被删除,它都是小写的,所有空白被替换。


当前回答

我们是这样做的。注意,可能有比你第一眼意识到的更多的边缘条件。

这是第二个版本,展开后的性能提高了5倍(是的,我对它进行了基准测试)。我认为我应该优化它,因为这个函数可以在每页被调用数百次。

/// <summary>
/// Produces optional, URL-friendly version of a title, "like-this-one". 
/// hand-tuned for speed, reflects performance refactoring contributed
/// by John Gietzen (user otac0n) 
/// </summary>
public static string URLFriendly(string title)
{
    if (title == null) return "";

    const int maxlen = 80;
    int len = title.Length;
    bool prevdash = false;
    var sb = new StringBuilder(len);
    char c;

    for (int i = 0; i < len; i++)
    {
        c = title[i];
        if ((c >= 'a' && c <= 'z') || (c >= '0' && c <= '9'))
        {
            sb.Append(c);
            prevdash = false;
        }
        else if (c >= 'A' && c <= 'Z')
        {
            // tricky way to convert to lowercase
            sb.Append((char)(c | 32));
            prevdash = false;
        }
        else if (c == ' ' || c == ',' || c == '.' || c == '/' || 
            c == '\\' || c == '-' || c == '_' || c == '=')
        {
            if (!prevdash && sb.Length > 0)
            {
                sb.Append('-');
                prevdash = true;
            }
        }
        else if ((int)c >= 128)
        {
            int prevlen = sb.Length;
            sb.Append(RemapInternationalCharToAscii(c));
            if (prevlen != sb.Length) prevdash = false;
        }
        if (i == maxlen) break;
    }

    if (prevdash)
        return sb.ToString().Substring(0, sb.Length - 1);
    else
        return sb.ToString();
}

要查看被替换的代码的前一个版本(但在功能上与之相当,而且快了5倍),请查看这篇文章的修订历史(单击日期链接)。

另外,RemapInternationalCharToAscii方法的源代码可以在这里找到。

其他回答

您需要设置一个自定义路由,将URL指向将处理它的控制器。因为您使用的是Ruby on Rails,这里介绍一下如何使用他们的路由引擎。

在Ruby中,你需要一个你已经知道的正则表达式,下面是要使用的正则表达式:

def permalink_for(str)
    str.gsub(/[^\w\/]|[!\(\)\.]+/, ' ').strip.downcase.gsub(/\ +/, '-')
end

为了更好地衡量,这里是WordPress中的PHP函数…我认为WordPress是使用花哨链接的最受欢迎的平台之一。

    function sanitize_title_with_dashes($title) {
            $title = strip_tags($title);
            // Preserve escaped octets.
            $title = preg_replace('|%([a-fA-F0-9][a-fA-F0-9])|', '---$1---', $title);
            // Remove percent signs that are not part of an octet.
            $title = str_replace('%', '', $title);
            // Restore octets.
            $title = preg_replace('|---([a-fA-F0-9][a-fA-F0-9])---|', '%$1', $title);
            $title = remove_accents($title);
            if (seems_utf8($title)) {
                    if (function_exists('mb_strtolower')) {
                            $title = mb_strtolower($title, 'UTF-8');
                    }
                    $title = utf8_uri_encode($title, 200);
            }
            $title = strtolower($title);
            $title = preg_replace('/&.+?;/', '', $title); // kill entities
            $title = preg_replace('/[^%a-z0-9 _-]/', '', $title);
            $title = preg_replace('/\s+/', '-', $title);
            $title = preg_replace('|-+|', '-', $title);
            $title = trim($title, '-');
            return $title;
    }

这个函数以及一些支持函数可以在wp-includes/formatting.php中找到。

下面是我的(稍慢,但写起来很有趣)Jeff的代码版本:

public static string URLFriendly(string title)
{
    char? prevRead = null,
        prevWritten = null;

    var seq = 
        from c in title
        let norm = RemapInternationalCharToAscii(char.ToLowerInvariant(c).ToString())[0]
        let keep = char.IsLetterOrDigit(norm)
        where prevRead.HasValue || keep
        let replaced = keep ? norm
            :  prevWritten != '-' ? '-'
            :  (char?)null
        where replaced != null
        let s = replaced + (prevRead == null ? ""
            : norm == '#' && "cf".Contains(prevRead.Value) ? "sharp"
            : norm == '+' ? "plus"
            : "")
        let _ = prevRead = norm
        from written in s
        let __ = prevWritten = written
        select written;

    const int maxlen = 80;  
    return string.Concat(seq.Take(maxlen)).TrimEnd('-');
}

public static string RemapInternationalCharToAscii(string text)
{
    var seq = text.Normalize(NormalizationForm.FormD)
        .Where(c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark);

    return string.Concat(seq).Normalize(NormalizationForm.FormC);
}

我的测试字符串:

“我喜欢c#, f#, c++,还有……焦糖布丁! !他们看到我在编程……他们hatin”……想抓住我的下流行为……”

你也可以使用这个JavaScript函数在表单中生成鼻涕虫(这个是基于/从Django复制的):

function makeSlug(urlString, filter) {
    // Changes, e.g., "Petty theft" to "petty_theft".
    // Remove all these words from the string before URLifying

    if(filter) {
        removelist = ["a", "an", "as", "at", "before", "but", "by", "for", "from",
        "is", "in", "into", "like", "of", "off", "on", "onto", "per",
        "since", "than", "the", "this", "that", "to", "up", "via", "het", "de", "een", "en",
        "with"];
    }
    else {
        removelist = [];
    }
    s = urlString;
    r = new RegExp('\\b(' + removelist.join('|') + ')\\b', 'gi');
    s = s.replace(r, '');
    s = s.replace(/[^-\w\s]/g, ''); // Remove unneeded characters
    s = s.replace(/^\s+|\s+$/g, ''); // Trim leading/trailing spaces
    s = s.replace(/[-\s]+/g, '-'); // Convert spaces to hyphens
    s = s.toLowerCase(); // Convert to lowercase
    return s; // Trim to first num_chars characters
}

我不熟悉Ruby on Rails,但以下是(未经测试的)PHP代码。如果您觉得有用的话,可以很快地将其转换为Ruby on Rails。

$sURL = "This is a title to convert to URL-format. It has 1 number in it!";
// To lower-case
$sURL = strtolower($sURL);

// Replace all non-word characters with spaces
$sURL = preg_replace("/\W+/", " ", $sURL);

// Remove trailing spaces (so we won't end with a separator)
$sURL = trim($sURL);

// Replace spaces with separators (hyphens)
$sURL = str_replace(" ", "-", $sURL);

echo $sURL;
// outputs: this-is-a-title-to-convert-to-url-format-it-has-1-number-in-it

我希望这能有所帮助。