什么是一个好的完整正则表达式或其他一些过程,将采取标题:

如何将标题更改为URL的一部分,如堆栈溢出?

然后把它变成

how-do-you-change-a-title-to-be-part-of-the-url-like-stack-overflow

在堆栈溢出的seo友好的url中使用?

我使用的开发环境是Ruby on Rails,但是如果有一些其他特定于平台的解决方案(。NET, PHP, Django),我也很想看到这些。

我相信我(或其他读者)在不同的平台上也会遇到同样的问题。

我使用自定义路由,我主要想知道如何改变字符串的所有特殊字符被删除,它都是小写的,所有空白被替换。


当前回答

有一个叫PermalinkFu的小Ruby on Rails插件可以做到这一点。escape方法将转换为适合URL的字符串。看一下代码;那个方法很简单。

为了删除非ascii字符,它使用iconv lib将'utf-8'转换为'ascii//ignore//translit'。然后空格变成破折号,所有东西都被缩小,等等。

其他回答

不,不,不。你们都错了。除了变音符符(diacritic -fu)之外,你已经差不多了,但是亚洲字符呢(Ruby开发人员没有考虑到他们的日本同胞,真是可耻)。

Firefox和Safari都在URL中显示非ascii字符,坦率地说,它们看起来很棒。支持像“http://somewhere.com/news/read/お前たはアホじゃないかい”这样的链接是很好的。

这里有一些PHP代码,只是我自己写的,没有进行压力测试。

<?php
    function slug($str)
    {
        $args = func_get_args();
        array_filter($args);  //remove blanks
        $slug = mb_strtolower(implode('-', $args));

        $real_slug = '';
        $hyphen = '';
        foreach(SU::mb_str_split($slug) as $c)
        {
            if (strlen($c) > 1 && mb_strlen($c)===1)
            {
                $real_slug .= $hyphen . $c;
                $hyphen = '';
            }
            else
            {
                switch($c)
                {
                    case '&':
                        $hyphen = $real_slug ? '-and-' : '';
                        break;
                    case 'a':
                    case 'b':
                    case 'c':
                    case 'd':
                    case 'e':
                    case 'f':
                    case 'g':
                    case 'h':
                    case 'i':
                    case 'j':
                    case 'k':
                    case 'l':
                    case 'm':
                    case 'n':
                    case 'o':
                    case 'p':
                    case 'q':
                    case 'r':
                    case 's':
                    case 't':
                    case 'u':
                    case 'v':
                    case 'w':
                    case 'x':
                    case 'y':
                    case 'z':

                    case 'A':
                    case 'B':
                    case 'C':
                    case 'D':
                    case 'E':
                    case 'F':
                    case 'G':
                    case 'H':
                    case 'I':
                    case 'J':
                    case 'K':
                    case 'L':
                    case 'M':
                    case 'N':
                    case 'O':
                    case 'P':
                    case 'Q':
                    case 'R':
                    case 'S':
                    case 'T':
                    case 'U':
                    case 'V':
                    case 'W':
                    case 'X':
                    case 'Y':
                    case 'Z':

                    case '0':
                    case '1':
                    case '2':
                    case '3':
                    case '4':
                    case '5':
                    case '6':
                    case '7':
                    case '8':
                    case '9':
                        $real_slug .= $hyphen . $c;
                        $hyphen = '';
                        break;

                    default:
                       $hyphen = $hyphen ? $hyphen : ($real_slug ? '-' : '');
                }
            }
        }
        return $real_slug;
    }

例子:

$str = "~!@#$%^&*()_+-=[]\{}|;':\",./<>?\n\r\t\x07\x00\x04 コリン ~!@#$%^&*()_+-=[]\{}|;':\",./<>?\n\r\t\x07\x00\x04 トーマス ~!@#$%^&*()_+-=[]\{}|;':\",./<>?\n\r\t\x07\x00\x04 アーノルド ~!@#$%^&*()_+-=[]\{}|;':\",./<>?\n\r\t\x07\x00\x04";
echo slug($str);

outputs: 科林-德-托马斯-德-阿诺德

“- -”是因为&变成了“- -”。

有一个叫PermalinkFu的小Ruby on Rails插件可以做到这一点。escape方法将转换为适合URL的字符串。看一下代码;那个方法很简单。

为了删除非ascii字符,它使用iconv lib将'utf-8'转换为'ascii//ignore//translit'。然后空格变成破折号,所有东西都被缩小,等等。

重写Jeff的代码,使其更简洁

    public static string RemapInternationalCharToAscii(char c)
    {
        var s = c.ToString().ToLowerInvariant();

        var mappings = new Dictionary<string, string>
        {
            { "a", "àåáâäãåą" },
            { "c", "çćčĉ" },
            { "d", "đ" },
            { "e", "èéêëę" },
            { "g", "ğĝ" },
            { "h", "ĥ" },
            { "i", "ìíîïı" },
            { "j", "ĵ" },
            { "l", "ł" },
            { "n", "ñń" },
            { "o", "òóôõöøőð" },
            { "r", "ř" },
            { "s", "śşšŝ" },
            { "ss", "ß" },
            { "th", "Þ" },
            { "u", "ùúûüŭů" },
            { "y", "ýÿ" },
            { "z", "żźž" }
        };

        foreach(var mapping in mappings)
        {
            if (mapping.Value.Contains(s))
                return mapping.Key;
        }

        return string.Empty;
    }

下面是Jeff代码的我的版本。我做了以下修改:

The hyphens were appended in such a way that one could be added, and then need removing as it was the last character in the string. That is, we never want “my-slug-”. This means an extra string allocation to remove it on this edge case. I’ve worked around this by delay-hyphening. If you compare my code to Jeff’s the logic for this is easy to follow. His approach is purely lookup based and missed a lot of characters I found in examples while researching on Stack Overflow. To counter this, I first peform a normalisation pass (AKA collation mentioned in Meta Stack Overflow question Non US-ASCII characters dropped from full (profile) URL), and then ignore any characters outside the acceptable ranges. This works most of the time... ... For when it doesn’t I’ve also had to add a lookup table. As mentioned above, some characters don’t map to a low ASCII value when normalised. Rather than drop these I’ve got a manual list of exceptions that is doubtless full of holes, but it is better than nothing. The normalisation code was inspired by Jon Hanna’s great post in Stack Overflow question How can I remove accents on a string?. The case conversion is now also optional. public static class Slug { public static string Create(bool toLower, params string[] values) { return Create(toLower, String.Join("-", values)); } /// <summary> /// Creates a slug. /// References: /// http://www.unicode.org/reports/tr15/tr15-34.html /// https://meta.stackexchange.com/questions/7435/non-us-ascii-characters-dropped-from-full-profile-url/7696#7696 /// https://stackoverflow.com/questions/25259/how-do-you-include-a-webpage-title-as-part-of-a-webpage-url/25486#25486 /// https://stackoverflow.com/questions/3769457/how-can-i-remove-accents-on-a-string /// </summary> /// <param name="toLower"></param> /// <param name="normalised"></param> /// <returns></returns> public static string Create(bool toLower, string value) { if (value == null) return ""; var normalised = value.Normalize(NormalizationForm.FormKD); const int maxlen = 80; int len = normalised.Length; bool prevDash = false; var sb = new StringBuilder(len); char c; for (int i = 0; i < len; i++) { c = normalised[i]; if ((c >= 'a' && c <= 'z') || (c >= '0' && c <= '9')) { if (prevDash) { sb.Append('-'); prevDash = false; } sb.Append(c); } else if (c >= 'A' && c <= 'Z') { if (prevDash) { sb.Append('-'); prevDash = false; } // Tricky way to convert to lowercase if (toLower) sb.Append((char)(c | 32)); else sb.Append(c); } else if (c == ' ' || c == ',' || c == '.' || c == '/' || c == '\\' || c == '-' || c == '_' || c == '=') { if (!prevDash && sb.Length > 0) { prevDash = true; } } else { string swap = ConvertEdgeCases(c, toLower); if (swap != null) { if (prevDash) { sb.Append('-'); prevDash = false; } sb.Append(swap); } } if (sb.Length == maxlen) break; } return sb.ToString(); } static string ConvertEdgeCases(char c, bool toLower) { string swap = null; switch (c) { case 'ı': swap = "i"; break; case 'ł': swap = "l"; break; case 'Ł': swap = toLower ? "l" : "L"; break; case 'đ': swap = "d"; break; case 'ß': swap = "ss"; break; case 'ø': swap = "o"; break; case 'Þ': swap = "th"; break; } return swap; } }

关于更多的细节,单元测试,以及为什么Facebook的URL方案比堆栈溢出更聪明的解释,我在我的博客上有一个扩展版本。

为了更好地衡量,这里是WordPress中的PHP函数…我认为WordPress是使用花哨链接的最受欢迎的平台之一。

    function sanitize_title_with_dashes($title) {
            $title = strip_tags($title);
            // Preserve escaped octets.
            $title = preg_replace('|%([a-fA-F0-9][a-fA-F0-9])|', '---$1---', $title);
            // Remove percent signs that are not part of an octet.
            $title = str_replace('%', '', $title);
            // Restore octets.
            $title = preg_replace('|---([a-fA-F0-9][a-fA-F0-9])---|', '%$1', $title);
            $title = remove_accents($title);
            if (seems_utf8($title)) {
                    if (function_exists('mb_strtolower')) {
                            $title = mb_strtolower($title, 'UTF-8');
                    }
                    $title = utf8_uri_encode($title, 200);
            }
            $title = strtolower($title);
            $title = preg_replace('/&.+?;/', '', $title); // kill entities
            $title = preg_replace('/[^%a-z0-9 _-]/', '', $title);
            $title = preg_replace('/\s+/', '-', $title);
            $title = preg_replace('|-+|', '-', $title);
            $title = trim($title, '-');
            return $title;
    }

这个函数以及一些支持函数可以在wp-includes/formatting.php中找到。