"你是什么意思?"算法的工作吗?

我一直在为一个投资组合管理工具开发一个内部网站。有很多文本数据，公司名称等。我对一些搜索引擎的能力印象深刻，它们可以非常快速地回答“你的意思是:xxxx”。

我需要能够智能地接受用户的查询，并不仅响应原始搜索结果，而且还响应“您的意思是?”当有一个极有可能的替代答案等

我正在开发ASP。NET (VB -别跟我过不去!)］

更新: 好吧，在没有数百万“付费用户”的情况下，我该如何模仿这种模式?

为每个“已知”或“正确”的术语生成拼写错误并执行查找? 还有其他更优雅的方法吗?

当前回答

通常，产品拼写纠正器会使用几种方法来提供拼写建议。一些人:

Decide on a way to determine whether spelling correction is required. These may include insufficient results, results which are not specific or accurate enough (according to some measure), etc. Then: Use a large body of text or a dictionary, where all, or most are known to be correctly spelled. These are easily found online, in places such as LingPipe. Then to determine the best suggestion you look for a word which is the closest match based on several measures. The most intuitive one is similar characters. What has been shown through research and experimentation is that two or three character sequence matches work better. (bigrams and trigrams). To further improve results, weigh a higher score upon a match at the beginning, or end of the word. For performance reasons, index all these words as trigrams or bigrams, so that when you are performing a lookup, you convert to n-gram, and lookup via hashtable or trie. Use heuristics related to potential keyboard mistakes based on character location. So that "hwllo" should be "hello" because 'w' is close to 'e'. Use a phonetic key (Soundex, Metaphone) to index the words and lookup possible corrections. In practice this normally returns worse results than using n-gram indexing, as described above. In each case you must select the best correction from a list. This may be a distance metric such as levenshtein, the keyboard metric, etc. For a multi-word phrase, only one word may be misspelled, in which case you can use the remaining words as context in determining a best match.

2009-04-16 18:07:37

其他回答

谷歌显然建议搜索结果最好的问题，而不是拼写正确的问题。但在这种情况下，可能拼写纠正器会更可行。当然，您可以为每个查询存储一些值，基于它返回的结果有多好。

So,

You need a dictionary (english or based on your data) Generate a word trellis and calculate probabilities for the transitions using your dictionary. Add a decoder to calculate minimum error distance using your trellis. Of course you should take care of insertions and deletions when calculating distances. Fun thing is that QWERTY keyboard maximizes the distance if you hit keys close to each other.(cae would turn car, cay would turn cat) Return the word which has the minimum distance. Then you could compare that to your query database and check if there is better results for other close matches.

2008-11-21 01:17:17

以下是直接来自来源的解释(几乎)

搜索101 !

在分钟 22：03

值得一看!

基本上，根据谷歌前CTO Douglas Merrill的说法，它是这样的:

1)你在谷歌里写了一个(拼错的)单词

2)你找不到你想要的(不要点击任何结果)

3)你意识到你拼错了这个词，所以你在搜索框里重写了这个词。

4)你找到你想要的(你点击第一个链接)

这个模式乘以数百万次，显示了什么是最常见的拼写错误，什么是最“常见”的更正。

这样谷歌几乎可以立即提供每种语言的拼写纠正。