全文搜索引擎的比较- Lucene, Sphinx, Postgresql, MySQL?

很高兴看到有人插话谈论Lucene——因为我对此一无所知。

而斯芬克斯，我很了解，所以看看我能不能帮上忙。

Result relevance ranking is the default. You can set up your own sorting should you wish, and give specific fields higher weightings. Indexing speed is super-fast, because it talks directly to the database. Any slowness will come from complex SQL queries and un-indexed foreign keys and other such problems. I've never noticed any slowness in searching either. I'm a Rails guy, so I've no idea how easy it is to implement with Django. There is a Python API that comes with the Sphinx source though. The search service daemon (searchd) is pretty low on memory usage - and you can set limits on how much memory the indexer process uses too. Scalability is where my knowledge is more sketchy - but it's easy enough to copy index files to multiple machines and run several searchd daemons. The general impression I get from others though is that it's pretty damn good under high load, so scaling it out across multiple machines isn't something that needs to be dealt with. There's no support for 'did-you-mean', etc - although these can be done with other tools easily enough. Sphinx does stem words though using dictionaries, so 'driving' and 'drive' (for example) would be considered the same in searches. Sphinx doesn't allow partial index updates for field data though. The common approach to this is to maintain a delta index with all the recent changes, and re-index this after every change (and those new results appear within a second or two). Because of the small amount of data, this can take a matter of seconds. You will still need to re-index the main dataset regularly though (although how regularly depends on the volatility of your data - every day? every hour?). The fast indexing speeds keep this all pretty painless though.

我不知道这是否适用于您的情况，但Evan Weaver比较了一些常见的Rails搜索选项(Sphinx, Ferret (Lucene的Ruby移植)和Solr)，运行了一些基准测试。我想可能有用。

我还没有深入研究MySQL的全文搜索，但我知道它在速度和功能方面都无法与Sphinx、Lucene或Solr竞争。

2009-04-10 15:08:56

我很惊讶没有更多关于Solr的信息。Solr与Sphinx非常相似，但具有更高级的功能(我没有使用过Sphinx，只是阅读过它)。

下面链接的答案详细介绍了一些关于Sphinx的事情，这也适用于Solr。全文搜索引擎的比较- Lucene, Sphinx, Postgresql, MySQL?

Solr还提供了以下附加特性:

Supports replication Multiple cores (think of these as separate databases with their own configuration and own indexes) Boolean searches Highlighting of keywords (fairly easy to do in application code if you have regex-fu; however, why not let a specialized tool do a better job for you) Update index via XML or delimited file Communicate with the search server via HTTP (it can even return Json, Native PHP/Ruby/Python) PDF, Word document indexing Dynamic fields Facets Aggregate fields Stop words, synonyms, etc. More Like this... Index directly from the database with custom queries Auto-suggest Cache Autowarming Fast indexing (compare to MySQL full-text search indexing times) -- Lucene uses a binary inverted index format. Boosting (custom rules for increasing relevance of a particular keyword or phrase, etc.) Fielded searches (if a search user knows the field he/she wants to search, they narrow down their search by typing the field, then the value, and ONLY that field is searched rather than everything -- much better user experience)

顺便说一下，还有很多功能;但是，我只列出了在生产中实际使用的特性。顺便说一句，MySQL支持上述列表中的#1、#3和#11(有限)。对于您正在寻找的特性，关系数据库并不能满足您的需求。我会马上消除这些。

另外，Solr(实际上是Lucene)是一个文档数据库(例如NoSQL)，因此任何其他文档数据库的许多优点都可以通过Solr实现。换句话说，您可以将它用于不仅仅是搜索(即性能)。要有创意:)

2010-12-09 04:20:35

SearchTools-Avi说:“MySQL文本搜索，它甚至不能索引三个字母或更少的单词。”

从MySQL 5.0开始，MySQL全文的最小字长是可调的。谷歌'mysql全文最小长度'简单的指令。

也就是说，MySQL全文文本有局限性:首先，一旦你达到一百万左右的记录，它就会变得很慢，……

2009-09-28 00:51:39

我现在正在研究PostgreSQL全文搜索，它拥有现代搜索引擎的所有正确功能，非常好的扩展字符和多语言支持，与数据库中的文本字段很好地紧密集成。

但它没有用户友好的搜索操作符，如+或AND(使用& | !)，我对它在他们的文档网站上的工作方式不太满意。虽然它在结果片段中有粗体匹配项，但匹配项的默认算法并不好。此外，如果你想索引rtf, PDF, MS Office，你必须找到并集成一个文件格式转换器。

OTOH，它比MySQL的文本搜索好多了，后者甚至不能索引三个字母或更少的单词。这是MediaWiki搜索的默认值，我真的认为它对最终用户没有好处:http://www.searchtools.com/analysis/mediawiki-search/

在我所见过的所有案例中，Lucene/Solr和Sphinx都非常出色。它们都是可靠的代码，并且在可用性方面有了显著的改进，所以这些工具都可以让搜索满足几乎所有人。

对于SHAILI - SOLR包括Lucene搜索代码库和组件，是一个很好的独立搜索引擎。

2009-09-17 22:57:50

很高兴看到有人插话谈论Lucene——因为我对此一无所知。

而斯芬克斯，我很了解，所以看看我能不能帮上忙。

Result relevance ranking is the default. You can set up your own sorting should you wish, and give specific fields higher weightings. Indexing speed is super-fast, because it talks directly to the database. Any slowness will come from complex SQL queries and un-indexed foreign keys and other such problems. I've never noticed any slowness in searching either. I'm a Rails guy, so I've no idea how easy it is to implement with Django. There is a Python API that comes with the Sphinx source though. The search service daemon (searchd) is pretty low on memory usage - and you can set limits on how much memory the indexer process uses too. Scalability is where my knowledge is more sketchy - but it's easy enough to copy index files to multiple machines and run several searchd daemons. The general impression I get from others though is that it's pretty damn good under high load, so scaling it out across multiple machines isn't something that needs to be dealt with. There's no support for 'did-you-mean', etc - although these can be done with other tools easily enough. Sphinx does stem words though using dictionaries, so 'driving' and 'drive' (for example) would be considered the same in searches. Sphinx doesn't allow partial index updates for field data though. The common approach to this is to maintain a delta index with all the recent changes, and re-index this after every change (and those new results appear within a second or two). Because of the small amount of data, this can take a matter of seconds. You will still need to re-index the main dataset regularly though (although how regularly depends on the volatility of your data - every day? every hour?). The fast indexing speeds keep this all pretty painless though.

我不知道这是否适用于您的情况，但Evan Weaver比较了一些常见的Rails搜索选项(Sphinx, Ferret (Lucene的Ruby移植)和Solr)，运行了一些基准测试。我想可能有用。

我还没有深入研究MySQL的全文搜索，但我知道它在速度和功能方面都无法与Sphinx、Lucene或Solr竞争。

2009-04-10 15:08:56

这是我对这个老问题的看法。我强烈推荐你看看ElasticSearch。

Elasticsearch是一个基于Lucene的搜索服务器。它提供了一个分布式的、支持多租户的全文搜索引擎，具有RESTful web界面和无模式的JSON文档。Elasticsearch是用Java开发的，在Apache许可证的条款下作为开源发布。

相对于其他FTS(全文搜索)引擎的优势是:

rest式接口更好的可伸缩性大型社区由Lucene构建开发人员丰富的文档有很多开放源码库可用(包括Django)

我们在我们的项目中使用这个搜索引擎，并且非常满意。

2014-01-21 19:07:12

全文搜索引擎的比较- Lucene, Sphinx, Postgresql, MySQL?

推荐文章

最新文章

标签