如何检查字符串是否包含特定单词？

考虑：

$a = 'How are you?';

if ($a contains 'are')
    echo 'true';

假设我有上面的代码，如果（$a包含“are”），写语句的正确方法是什么？

当前回答

查看strpos（）：

<?php
$mystring = 'abc';
$findme   = 'a';
$pos = strpos($mystring, $findme);

// Note our use of ===. Simply, == would not work as expected
// because the position of 'a' was the 0th (first) character.
if ($pos === false) {
    echo "The string '$findme' was not found in the string '$mystring'.";
} else {
    echo "The string '$findme' was found in the string '$mystring',";
    echo " and exists at position $pos.";
}

2010-12-06 13:16:30

其他回答

我有点印象深刻，这里没有一个使用strpos、strstr和类似函数的答案提到多字节字符串函数（2015-05-08）。

基本上，如果您在查找某些语言（如德语、法语、葡萄牙语、西班牙语等）特定字符的单词时遇到困难（例如：ä，é，ô，ç，º，ñ），您可能需要在函数前面加上mb_。因此，接受的答案将使用mb_strpos或mb_stripos（用于不区分大小写的匹配）：

if (mb_strpos($a,'are') !== false) {
    echo 'true';
}

如果不能保证所有数据都是100%的UTF-8格式，则可能需要使用mb_函数。

乔尔·斯波尔斯基（Joel Spolsky）的一篇很好的文章解释了为什么每个软件开发人员都必须了解Unicode和字符集（没有借口！）。

2015-05-08 16:18:40

参考SamGoody和Lego Stormtropr的评论。

如果您正在寻找基于多个单词的接近度/相关性对搜索结果进行排名的PHP算法这里有一种仅使用PHP生成搜索结果的快速简便方法：

其他布尔搜索方法（如strpos（）、preg_match（）、strstr（）或stristr（

无法搜索多个单词结果未排名

基于向量空间模型和tf idf（术语频率–反向文档频率）的PHP方法：

这听起来很难，但却出奇地容易。

如果我们想搜索字符串中的多个单词，核心问题是如何为每个单词分配权重？

如果我们可以根据字符串作为一个整体的代表性来加权字符串中的项，我们可以按照与查询最匹配的结果排序。

这是向量空间模型的思想，与SQL全文搜索的工作原理相距不远：

function get_corpus_index($corpus = array(), $separator=' ') {

    $dictionary = array();

    $doc_count = array();

    foreach($corpus as $doc_id => $doc) {

        $terms = explode($separator, $doc);

        $doc_count[$doc_id] = count($terms);

        // tf–idf, short for term frequency–inverse document frequency, 
        // according to wikipedia is a numerical statistic that is intended to reflect 
        // how important a word is to a document in a corpus

        foreach($terms as $term) {

            if(!isset($dictionary[$term])) {

                $dictionary[$term] = array('document_frequency' => 0, 'postings' => array());
            }
            if(!isset($dictionary[$term]['postings'][$doc_id])) {

                $dictionary[$term]['document_frequency']++;

                $dictionary[$term]['postings'][$doc_id] = array('term_frequency' => 0);
            }

            $dictionary[$term]['postings'][$doc_id]['term_frequency']++;
        }

        //from http://phpir.com/simple-search-the-vector-space-model/

    }

    return array('doc_count' => $doc_count, 'dictionary' => $dictionary);
}

function get_similar_documents($query='', $corpus=array(), $separator=' '){

    $similar_documents=array();

    if($query!=''&&!empty($corpus)){

        $words=explode($separator,$query);

        $corpus=get_corpus_index($corpus, $separator);

        $doc_count=count($corpus['doc_count']);

        foreach($words as $word) {

            if(isset($corpus['dictionary'][$word])){

                $entry = $corpus['dictionary'][$word];


                foreach($entry['postings'] as $doc_id => $posting) {

                    //get term frequency–inverse document frequency
                    $score=$posting['term_frequency'] * log($doc_count + 1 / $entry['document_frequency'] + 1, 2);

                    if(isset($similar_documents[$doc_id])){

                        $similar_documents[$doc_id]+=$score;

                    }
                    else{

                        $similar_documents[$doc_id]=$score;

                    }
                }
            }
        }

        // length normalise
        foreach($similar_documents as $doc_id => $score) {

            $similar_documents[$doc_id] = $score/$corpus['doc_count'][$doc_id];

        }

        // sort from  high to low

        arsort($similar_documents);

    }   

    return $similar_documents;
}

案例1

$query = 'are';

$corpus = array(
    1 => 'How are you?',
);

$match_results=get_similar_documents($query,$corpus);
echo '<pre>';
    print_r($match_results);
echo '</pre>';

结果

Array
(
    [1] => 0.52832083357372
)

案例2

$query = 'are';

$corpus = array(
    1 => 'how are you today?',
    2 => 'how do you do',
    3 => 'here you are! how are you? Are we done yet?'
);

$match_results=get_similar_documents($query,$corpus);
echo '<pre>';
    print_r($match_results);
echo '</pre>';

结果

Array
(
    [1] => 0.54248125036058
    [3] => 0.21699250014423
)

案例3

$query = 'we are done';

$corpus = array(
    1 => 'how are you today?',
    2 => 'how do you do',
    3 => 'here you are! how are you? Are we done yet?'
);

$match_results=get_similar_documents($query,$corpus);
echo '<pre>';
    print_r($match_results);
echo '</pre>';

结果

Array
(
    [3] => 0.6813781191217
    [1] => 0.54248125036058
)

还有很多改进要做但是该模型提供了从自然查询获得良好结果的方法，它没有布尔运算符，例如strpos（）、preg_match（）、strstr（）或stritr（）。

不可接受的

可选地，在搜索单词之前消除冗余

从而减少索引大小并减少存储需求更少的磁盘I/O更快的索引和因此更快的搜索。

1.标准化

将所有文本转换为小写

2.停止字消除

从文本中删除没有实际意义的单词（如“and”、“or”、“the”、“for”等）

3.字典替换

将具有相同或相似含义的单词替换为其他单词。（例如：将“饥饿”和“饥饿”替换为“饥饿”）可以执行进一步的算法度量（滚雪球）以进一步将单词减少到其基本含义。用十六进制等价物替换颜色名称通过降低精度来减少数值是规范文本的其他方式。

资源

http://linuxgazette.net/164/sephton.htmlhttp://snowball.tartarus.org/MySQL全文搜索分数说明http://dev.mysql.com/doc/internals/en/full-text-search.htmlhttp://en.wikipedia.org/wiki/Vector_space_modelhttp://en.wikipedia.org/wiki/Tf%E2%80%93idfhttp://phpir.com/simple-search-the-vector-space-model/

2014-10-15 19:21:34

在PHP中，验证字符串是否包含某个子字符串的最佳方法是使用一个简单的助手函数，如下所示：

function contains($haystack, $needle, $caseSensitive = false) {
    return $caseSensitive ?
            (strpos($haystack, $needle) === FALSE ? FALSE : TRUE):
            (stripos($haystack, $needle) === FALSE ? FALSE : TRUE);
}

说明：

strpos查找字符串中第一个区分大小写的子字符串的位置。stripos查找字符串中不区分大小写的子字符串第一次出现的位置。myFunction（$haystack，$needle）===假？FALSE：TRUE确保myFunction始终返回布尔值，并修复子字符串索引为0时的意外行为。$case敏感？A：B选择strpos或stripos来完成工作，具体取决于$caseSensitive的值。

输出：

var_dump(contains('bare','are'));            // Outputs: bool(true)
var_dump(contains('stare', 'are'));          // Outputs: bool(true)
var_dump(contains('stare', 'Are'));          // Outputs: bool(true)
var_dump(contains('stare', 'Are', true));    // Outputs: bool(false)
var_dump(contains('hair', 'are'));           // Outputs: bool(false)
var_dump(contains('aren\'t', 'are'));        // Outputs: bool(true)
var_dump(contains('Aren\'t', 'are'));        // Outputs: bool(true)
var_dump(contains('Aren\'t', 'are', true));  // Outputs: bool(false)
var_dump(contains('aren\'t', 'Are'));        // Outputs: bool(true)
var_dump(contains('aren\'t', 'Are', true));  // Outputs: bool(false)
var_dump(contains('broad', 'are'));          // Outputs: bool(false)
var_dump(contains('border', 'are'));         // Outputs: bool(false)

2016-02-21 18:39:50

如果要避免“虚假”和“真实”问题，可以使用subst_count：

if (substr_count($a, 'are') > 0) {
    echo "at least one 'are' is present!";
}

它比strpos慢一点，但它避免了比较问题。

2013-07-09 08:38:53

要确定一个字符串是否包含另一个字符串，可以使用PHP函数strpos（）。

int strpos ( string $haystack , mixed $needle [, int $offset = 0 ] )`

<?php

$haystack = 'how are you';
$needle = 'are';

if (strpos($haystack,$needle) !== false) {
    echo "$haystack contains $needle";
}

?>

警告：

如果您正在搜索的针位于干草堆的开始位置，它将返回位置0，如果您进行==比较，但这不起作用，则需要执行===

A==符号是一个比较，用于测试左侧的变量/表达式/常量是否与右侧的变量/表达/常量具有相同的值。

A==符号是比较两个变量/表达式/常量是否相等并且具有相同的类型，即两者都是字符串或两者都是整数。

使用这种方法的优点之一是，与str_contains（）不同，每个PHP版本都支持此函数。

2010-12-06 14:06:12

如何检查字符串是否包含特定单词？

推荐文章

最新文章

标签