我有一个用PHP编写的代码片段,它从数据库中提取一个文本块,并将其发送到网页上的一个小部件。原文可以是一篇很长的文章,也可以是一两个短句;但是对于这个小部件,我不能显示超过200个字符。我可以使用substr()在200个字符处切断文本,但结果将在单词中间切断——我真正想要的是在200个字符前的最后一个单词的末尾切断文本。


当前回答

就我所知,这里所有的解只有在起点固定的情况下才有效。 允许你转动这个: 悲伤之神,神圣之神,痛苦之神,痛苦之神。Ut enim ad minim veniam。 到这个: 神圣的,神圣的… 如果想要截断一组特定关键字周围的单词,该怎么办?

截断一组特定关键字周围的文本。

我们的目标是能够转换这个:

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna liqua. Ut enim ad minim veniam.

到这个:

...consectetur adipisicing elit, sed do eiusmod tempor...

这是在显示搜索结果、摘要等时非常常见的情况。为了实现这一点,我们可以结合使用以下两种方法:

    /**
     * Return the index of the $haystack matching $needle,
     * or NULL if there is no match.
     *
     * This function is case-insensitive  
     * 
     * @param string $needle
     * @param array $haystack
     * @return false|int
     */
    function regexFindInArray(string $needle, array $haystack): ?int
    {
        for ($i = 0; $i < count($haystack); $i++) {
            if (preg_match('/' . preg_quote($needle) . '/i', $haystack[$i]) === 1) {
                return $i;
            }
        }
        return null;
    }

    /**
     * If the keyword is not present, it returns the maximum number of full 
     * words that the max number of characters provided by $maxLength allow,
     * starting from the left.
     *
     * If the keyword is present, it adds words to both sides of the keyword
     * keeping a balanace between the length of the suffix and the prefix.
     *
     * @param string $text
     * @param string $keyword
     * @param int $maxLength
     * @param string $ellipsis
     * @return string
     */
    function truncateWordSurroundingsByLength(string $text, string $keyword, 
            int $maxLength, string $ellipsis): string
    {
        if (strlen($text) < $maxLength) {
            return $text;
        }

        $pattern = '/' . '^(.*?)\s' .
                   '([^\s]*' . preg_quote($keyword) . '[^\s]*)' .
                   '\s(.*)$' . '/i';
        preg_match($pattern, $text, $matches);

        // break everything into words except the matching keywords, 
        // which can contain spaces
        if (count($matches) == 4) {
            $words = preg_split("/\s+/", $matches[1], -1, PREG_SPLIT_NO_EMPTY);
            $words[] = $matches[2];
            $words = array_merge($words, 
                              preg_split("/\s+/", $matches[3], -1, PREG_SPLIT_NO_EMPTY));
        } else {
            $words = preg_split("/\s+/", $text, -1, PREG_SPLIT_NO_EMPTY);
        }

        // find the index of the matching word
        $firstMatchingWordIndex = regexFindInArray($keyword, $words) ?? 0;

        $length = false;
        $prefixLength = $suffixLength = 0;
        $prefixIndex = $firstMatchingWordIndex - 1;
        $suffixIndex = $firstMatchingWordIndex + 1;

        // Initialize the text with the matching word
        $text = $words[$firstMatchingWordIndex];

        while (($prefixIndex >= 0 or $suffixIndex <= count($words))
                and strlen($text) < $maxLength and strlen($text) !== $length) {
            $length = strlen($text);
            if (isset($words[$prefixIndex])
                and (strlen($text) + strlen($words[$prefixIndex]) <= $maxLength)
                and ($prefixLength <= $suffixLength 
                     or strlen($text) + strlen($words[$suffixIndex]) <= $maxLength)) {
                $prefixLength += strlen($words[$prefixIndex]);
                $text = $words[$prefixIndex] . ' ' . $text;
                $prefixIndex--;
            }
            if (isset($words[$suffixIndex])
                and (strlen($text) + strlen($words[$suffixIndex]) <= $maxLength)
                and ($suffixLength <= $prefixLength 
                     or strlen($text) + strlen($words[$prefixIndex]) <= $maxLength)) {
                $suffixLength += strlen($words[$suffixIndex]);
                $text = $text . ' ' . $words[$suffixIndex];
                $suffixIndex++;
            }
        }

        if ($prefixIndex > 0) {
            $text = $ellipsis . ' ' . $text;
        }
        if ($suffixIndex < count($words)) {
            $text = $text . ' ' . $ellipsis;
        }

        return $text;
    }

现在你可以做:

$text = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do' .
        'iusmod tempor incididunt ut labore et dolore magna liqua. Ut enim' .
        'ad minim veniam.';

$text = truncateWordSurroundingsByLength($text, 'elit', 25, '...');

var_dump($text); // string(32) "... adipisicing elit, sed do ..."

运行代码。

其他回答

基于@Justin Poliey的正则表达式:

// Trim very long text to 120 characters. Add an ellipsis if the text is trimmed.
if(strlen($very_long_text) > 120) {
  $matches = array();
  preg_match("/^(.{1,120})[\s]/i", $very_long_text, $matches);
  $trimmed_text = $matches[0]. '...';
}

Keep in mind whenever you're splitting by "word" anywhere that some languages such as Chinese and Japanese do not use a space character to split words. Also, a malicious user could simply enter text without any spaces, or using some Unicode look-alike to the standard space character, in which case any solution you use may end up displaying the entire text anyway. A way around this may be to check the string length after splitting it on spaces as normal, then, if the string is still above an abnormal limit - maybe 225 characters in this case - going ahead and splitting it dumbly at that limit.

当涉及到非ascii字符时,还有一个类似的警告;包含它们的字符串可能会被PHP的标准strlen()解释为比实际更长,因为单个字符可能占用两个或更多字节,而不是一个字节。如果你只是使用strlen()/substr()函数来分割字符串,你可能会在字符中间分割字符串!如果有疑问,mb_strlen()/mb_substr()更简单一些。

Dave和AmalMurali的代码中添加了IF/ELSEIF语句,用于处理没有空格的字符串

if ((strpos($string, ' ') !== false) && (strlen($string) > 200)) { 
    $WidgetText = substr($string, 0, strrpos(substr($string, 0, 200), ' ')); 
} 
elseif (strlen($string) > 200) {
    $WidgetText = substr($string, 0, 200);
}

就我所知,这里所有的解只有在起点固定的情况下才有效。 允许你转动这个: 悲伤之神,神圣之神,痛苦之神,痛苦之神。Ut enim ad minim veniam。 到这个: 神圣的,神圣的… 如果想要截断一组特定关键字周围的单词,该怎么办?

截断一组特定关键字周围的文本。

我们的目标是能够转换这个:

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna liqua. Ut enim ad minim veniam.

到这个:

...consectetur adipisicing elit, sed do eiusmod tempor...

这是在显示搜索结果、摘要等时非常常见的情况。为了实现这一点,我们可以结合使用以下两种方法:

    /**
     * Return the index of the $haystack matching $needle,
     * or NULL if there is no match.
     *
     * This function is case-insensitive  
     * 
     * @param string $needle
     * @param array $haystack
     * @return false|int
     */
    function regexFindInArray(string $needle, array $haystack): ?int
    {
        for ($i = 0; $i < count($haystack); $i++) {
            if (preg_match('/' . preg_quote($needle) . '/i', $haystack[$i]) === 1) {
                return $i;
            }
        }
        return null;
    }

    /**
     * If the keyword is not present, it returns the maximum number of full 
     * words that the max number of characters provided by $maxLength allow,
     * starting from the left.
     *
     * If the keyword is present, it adds words to both sides of the keyword
     * keeping a balanace between the length of the suffix and the prefix.
     *
     * @param string $text
     * @param string $keyword
     * @param int $maxLength
     * @param string $ellipsis
     * @return string
     */
    function truncateWordSurroundingsByLength(string $text, string $keyword, 
            int $maxLength, string $ellipsis): string
    {
        if (strlen($text) < $maxLength) {
            return $text;
        }

        $pattern = '/' . '^(.*?)\s' .
                   '([^\s]*' . preg_quote($keyword) . '[^\s]*)' .
                   '\s(.*)$' . '/i';
        preg_match($pattern, $text, $matches);

        // break everything into words except the matching keywords, 
        // which can contain spaces
        if (count($matches) == 4) {
            $words = preg_split("/\s+/", $matches[1], -1, PREG_SPLIT_NO_EMPTY);
            $words[] = $matches[2];
            $words = array_merge($words, 
                              preg_split("/\s+/", $matches[3], -1, PREG_SPLIT_NO_EMPTY));
        } else {
            $words = preg_split("/\s+/", $text, -1, PREG_SPLIT_NO_EMPTY);
        }

        // find the index of the matching word
        $firstMatchingWordIndex = regexFindInArray($keyword, $words) ?? 0;

        $length = false;
        $prefixLength = $suffixLength = 0;
        $prefixIndex = $firstMatchingWordIndex - 1;
        $suffixIndex = $firstMatchingWordIndex + 1;

        // Initialize the text with the matching word
        $text = $words[$firstMatchingWordIndex];

        while (($prefixIndex >= 0 or $suffixIndex <= count($words))
                and strlen($text) < $maxLength and strlen($text) !== $length) {
            $length = strlen($text);
            if (isset($words[$prefixIndex])
                and (strlen($text) + strlen($words[$prefixIndex]) <= $maxLength)
                and ($prefixLength <= $suffixLength 
                     or strlen($text) + strlen($words[$suffixIndex]) <= $maxLength)) {
                $prefixLength += strlen($words[$prefixIndex]);
                $text = $words[$prefixIndex] . ' ' . $text;
                $prefixIndex--;
            }
            if (isset($words[$suffixIndex])
                and (strlen($text) + strlen($words[$suffixIndex]) <= $maxLength)
                and ($suffixLength <= $prefixLength 
                     or strlen($text) + strlen($words[$prefixIndex]) <= $maxLength)) {
                $suffixLength += strlen($words[$suffixIndex]);
                $text = $text . ' ' . $words[$suffixIndex];
                $suffixIndex++;
            }
        }

        if ($prefixIndex > 0) {
            $text = $ellipsis . ' ' . $text;
        }
        if ($suffixIndex < count($words)) {
            $text = $text . ' ' . $ellipsis;
        }

        return $text;
    }

现在你可以做:

$text = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do' .
        'iusmod tempor incididunt ut labore et dolore magna liqua. Ut enim' .
        'ad minim veniam.';

$text = truncateWordSurroundingsByLength($text, 'elit', 25, '...');

var_dump($text); // string(32) "... adipisicing elit, sed do ..."

运行代码。

用这个:

下面的代码将删除','。如果你有任何其他字符或子字符串,你可以用它来代替','

substr($string, 0, strrpos(substr($string, 0, $comparingLength), ','))

//如果你有另一个字符串帐户

substr($string, 0, strrpos(substr($string, 0, $comparingLength-strlen($currentString)), ','))