什么是一个好的完整正则表达式或其他一些过程,将采取标题:

如何将标题更改为URL的一部分,如堆栈溢出?

然后把它变成

how-do-you-change-a-title-to-be-part-of-the-url-like-stack-overflow

在堆栈溢出的seo友好的url中使用?

我使用的开发环境是Ruby on Rails,但是如果有一些其他特定于平台的解决方案(。NET, PHP, Django),我也很想看到这些。

我相信我(或其他读者)在不同的平台上也会遇到同样的问题。

我使用自定义路由,我主要想知道如何改变字符串的所有特殊字符被删除,它都是小写的,所有空白被替换。


当前回答

如果你正在使用Rails edge,你可以依赖Inflector。参数化-下面是文档中的例子:

  class Person
    def to_param
      "#{id}-#{name.parameterize}"
    end
  end

  @person = Person.find(1)
  # => #<Person id: 1, name: "Donald E. Knuth">

  <%= link_to(@person.name, person_path(@person)) %>
  # => <a href="/person/1-donald-e-knuth">Donald E. Knuth</a>

另外,如果你需要在以前版本的Rails中处理更多的外来字符,比如重音(éphémère),你可以混合使用PermalinkFu和DiacriticsFu:

DiacriticsFu::escape("éphémère")
=> "ephemere"

DiacriticsFu::escape("räksmörgås")
=> "raksmorgas"

其他回答

为了更好地衡量,这里是WordPress中的PHP函数…我认为WordPress是使用花哨链接的最受欢迎的平台之一。

    function sanitize_title_with_dashes($title) {
            $title = strip_tags($title);
            // Preserve escaped octets.
            $title = preg_replace('|%([a-fA-F0-9][a-fA-F0-9])|', '---$1---', $title);
            // Remove percent signs that are not part of an octet.
            $title = str_replace('%', '', $title);
            // Restore octets.
            $title = preg_replace('|---([a-fA-F0-9][a-fA-F0-9])---|', '%$1', $title);
            $title = remove_accents($title);
            if (seems_utf8($title)) {
                    if (function_exists('mb_strtolower')) {
                            $title = mb_strtolower($title, 'UTF-8');
                    }
                    $title = utf8_uri_encode($title, 200);
            }
            $title = strtolower($title);
            $title = preg_replace('/&.+?;/', '', $title); // kill entities
            $title = preg_replace('/[^%a-z0-9 _-]/', '', $title);
            $title = preg_replace('/\s+/', '-', $title);
            $title = preg_replace('|-+|', '-', $title);
            $title = trim($title, '-');
            return $title;
    }

这个函数以及一些支持函数可以在wp-includes/formatting.php中找到。

你也可以使用这个JavaScript函数在表单中生成鼻涕虫(这个是基于/从Django复制的):

function makeSlug(urlString, filter) {
    // Changes, e.g., "Petty theft" to "petty_theft".
    // Remove all these words from the string before URLifying

    if(filter) {
        removelist = ["a", "an", "as", "at", "before", "but", "by", "for", "from",
        "is", "in", "into", "like", "of", "off", "on", "onto", "per",
        "since", "than", "the", "this", "that", "to", "up", "via", "het", "de", "een", "en",
        "with"];
    }
    else {
        removelist = [];
    }
    s = urlString;
    r = new RegExp('\\b(' + removelist.join('|') + ')\\b', 'gi');
    s = s.replace(r, '');
    s = s.replace(/[^-\w\s]/g, ''); // Remove unneeded characters
    s = s.replace(/^\s+|\s+$/g, ''); // Trim leading/trailing spaces
    s = s.replace(/[-\s]+/g, '-'); // Convert spaces to hyphens
    s = s.toLowerCase(); // Convert to lowercase
    return s; // Trim to first num_chars characters
}

您可以使用以下帮助器方法。它可以转换Unicode字符。

public static string ConvertTextToSlug(string s)
{
    StringBuilder sb = new StringBuilder();

    bool wasHyphen = true;

    foreach (char c in s)
    {
        if (char.IsLetterOrDigit(c))
        {
            sb.Append(char.ToLower(c));
            wasHyphen = false;
        }
        else
            if (char.IsWhiteSpace(c) && !wasHyphen)
            {
                sb.Append('-');
                wasHyphen = true;
            }
    }

    // Avoid trailing hyphens
    if (wasHyphen && sb.Length > 0)
        sb.Length--;

    return sb.ToString().Replace("--","-");
}

Brian的Ruby代码:

title.downcase.strip.gsub(/\ /, '-').gsub(/[^\w\-]/, '')

Downcase将字符串转换为小写,strip删除开头和结尾的空格,第一个gsub调用全局地用破折号替换空格,第二个调用删除所有不是字母或破折号的内容。

我喜欢这种不使用正则表达式的方式,所以我将其移植到PHP。我刚刚添加了一个名为is_between的函数来检查字符:

function is_between($val, $min, $max)
{
    $val = (int) $val; $min = (int) $min; $max = (int) $max;

    return ($val >= $min && $val <= $max);
}

function international_char_to_ascii($char)
{
    if (mb_strpos('àåáâäãåa', $char) !== false)
    {
        return 'a';
    }

    if (mb_strpos('èéêëe', $char) !== false)
    {
        return 'e';
    }

    if (mb_strpos('ìíîïi', $char) !== false)
    {
        return 'i';
    }

    if (mb_strpos('òóôõö', $char) !== false)
    {
        return 'o';
    }

    if (mb_strpos('ùúûüuu', $char) !== false)
    {
        return 'u';
    }

    if (mb_strpos('çccc', $char) !== false)
    {
        return 'c';
    }

    if (mb_strpos('zzž', $char) !== false)
    {
        return 'z';
    }

    if (mb_strpos('ssšs', $char) !== false)
    {
        return 's';
    }

    if (mb_strpos('ñn', $char) !== false)
    {
        return 'n';
    }

    if (mb_strpos('ýÿ', $char) !== false)
    {
        return 'y';
    }

    if (mb_strpos('gg', $char) !== false)
    {
        return 'g';
    }

    if (mb_strpos('r', $char) !== false)
    {
        return 'r';
    }

    if (mb_strpos('l', $char) !== false)
    {
        return 'l';
    }

    if (mb_strpos('d', $char) !== false)
    {
        return 'd';
    }

    if (mb_strpos('ß', $char) !== false)
    {
        return 'ss';
    }

    if (mb_strpos('Þ', $char) !== false)
    {
        return 'th';
    }

    if (mb_strpos('h', $char) !== false)
    {
        return 'h';
    }

    if (mb_strpos('j', $char) !== false)
    {
        return 'j';
    }
    return '';
}

function url_friendly_title($url_title)
{
    if (empty($url_title))
    {
        return '';
    }

    $url_title = mb_strtolower($url_title);

    $url_title_max_length   = 80;
    $url_title_length       = mb_strlen($url_title);
    $url_title_friendly     = '';
    $url_title_dash_added   = false;
    $url_title_char = '';

    for ($i = 0; $i < $url_title_length; $i++)
    {
        $url_title_char     = mb_substr($url_title, $i, 1);

        if (strlen($url_title_char) == 2)
        {
            $url_title_ascii    = ord($url_title_char[0]) * 256 + ord($url_title_char[1]) . "\r\n";
        }
        else
        {
            $url_title_ascii    = ord($url_title_char);
        }

        if (is_between($url_title_ascii, 97, 122) || is_between($url_title_ascii, 48, 57))
        {
            $url_title_friendly .= $url_title_char;

            $url_title_dash_added = false;
        }
        elseif(is_between($url_title_ascii, 65, 90))
        {
            $url_title_friendly .= chr(($url_title_ascii | 32));

            $url_title_dash_added = false;
        }
        elseif($url_title_ascii == 32 || $url_title_ascii == 44 || $url_title_ascii == 46 || $url_title_ascii == 47 || $url_title_ascii == 92 || $url_title_ascii == 45 || $url_title_ascii == 47 || $url_title_ascii == 95 || $url_title_ascii == 61)
        {
            if (!$url_title_dash_added && mb_strlen($url_title_friendly) > 0)
            {
                $url_title_friendly .= chr(45);

                $url_title_dash_added = true;
            }
        }
        else if ($url_title_ascii >= 128)
        {
            $url_title_previous_length = mb_strlen($url_title_friendly);

            $url_title_friendly .= international_char_to_ascii($url_title_char);

            if ($url_title_previous_length != mb_strlen($url_title_friendly))
            {
                $url_title_dash_added = false;
            }
        }

        if ($i == $url_title_max_length)
        {
            break;
        }
    }

    if ($url_title_dash_added)
    {
        return mb_substr($url_title_friendly, 0, -1);
    }
    else
    {
        return $url_title_friendly;
    }
}