

Lucene/Lucene with Compass/Solr 斯芬克斯 Postgresql内置全文搜索 MySQl内置全文搜索


结果相关性和排名 搜索和索引速度 易于使用,易于与Django集成 资源需求——站点将托管在VPS上,所以理想情况下搜索引擎不需要大量的RAM和CPU 可伸缩性 额外的功能,如“你的意思是?”,相关搜索等


编辑:至于索引需求,由于用户不断地向站点输入数据,这些数据将需要不断地进行索引。它不必是实时的,但理想情况下,新数据在索引中显示的延迟不超过15 - 30分钟


我们刚刚从Elasticsearch切换到Postgres Full Text。因为我们已经使用了Postgres,所以我们现在省去了保持索引更新的麻烦。 但这只影响全文搜索。然而,在某些用例中,Elasicsearch明显更好。也许是面或类似的东西。








Result relevance ranking is the default. You can set up your own sorting should you wish, and give specific fields higher weightings. Indexing speed is super-fast, because it talks directly to the database. Any slowness will come from complex SQL queries and un-indexed foreign keys and other such problems. I've never noticed any slowness in searching either. I'm a Rails guy, so I've no idea how easy it is to implement with Django. There is a Python API that comes with the Sphinx source though. The search service daemon (searchd) is pretty low on memory usage - and you can set limits on how much memory the indexer process uses too. Scalability is where my knowledge is more sketchy - but it's easy enough to copy index files to multiple machines and run several searchd daemons. The general impression I get from others though is that it's pretty damn good under high load, so scaling it out across multiple machines isn't something that needs to be dealt with. There's no support for 'did-you-mean', etc - although these can be done with other tools easily enough. Sphinx does stem words though using dictionaries, so 'driving' and 'drive' (for example) would be considered the same in searches. Sphinx doesn't allow partial index updates for field data though. The common approach to this is to maintain a delta index with all the recent changes, and re-index this after every change (and those new results appear within a second or two). Because of the small amount of data, this can take a matter of seconds. You will still need to re-index the main dataset regularly though (although how regularly depends on the volatility of your data - every day? every hour?). The fast indexing speeds keep this all pretty painless though.

我不知道这是否适用于您的情况,但Evan Weaver比较了一些常见的Rails搜索选项(Sphinx, Ferret (Lucene的Ruby移植)和Solr),运行了一些基准测试。我想可能有用。



下面链接的答案详细介绍了一些关于Sphinx的事情,这也适用于Solr。 全文搜索引擎的比较- Lucene, Sphinx, Postgresql, MySQL?


Supports replication Multiple cores (think of these as separate databases with their own configuration and own indexes) Boolean searches Highlighting of keywords (fairly easy to do in application code if you have regex-fu; however, why not let a specialized tool do a better job for you) Update index via XML or delimited file Communicate with the search server via HTTP (it can even return Json, Native PHP/Ruby/Python) PDF, Word document indexing Dynamic fields Facets Aggregate fields Stop words, synonyms, etc. More Like this... Index directly from the database with custom queries Auto-suggest Cache Autowarming Fast indexing (compare to MySQL full-text search indexing times) -- Lucene uses a binary inverted index format. Boosting (custom rules for increasing relevance of a particular keyword or phrase, etc.) Fielded searches (if a search user knows the field he/she wants to search, they narrow down their search by typing the field, then the value, and ONLY that field is searched rather than everything -- much better user experience)



我们刚刚从Elasticsearch切换到Postgres Full Text。因为我们已经使用了Postgres,所以我们现在省去了保持索引更新的麻烦。 但这只影响全文搜索。然而,在某些用例中,Elasicsearch明显更好。也许是面或类似的东西。


但它没有用户友好的搜索操作符,如+或AND(使用& | !),我对它在他们的文档网站上的工作方式不太满意。虽然它在结果片段中有粗体匹配项,但匹配项的默认算法并不好。此外,如果你想索引rtf, PDF, MS Office,你必须找到并集成一个文件格式转换器。



对于SHAILI - SOLR包括Lucene搜索代码库和组件,是一个很好的独立搜索引擎。