找出两个字符串之间的相似度度量

如何在Python中获得一个字符串与另一个字符串相似的概率?

我想要得到一个十进制值，比如0.9(意思是90%)等等。最好是标准的Python和库。

e.g.

similar("Apple","Appel") #would have a high prob.

similar("Apple","Mango") #would have a lower prob.

当前回答

这是内置的。

from difflib import SequenceMatcher

def similar(a, b):
    return SequenceMatcher(None, a, b).ratio()

使用它:

>>> similar("Apple","Appel")
0.8
>>> similar("Apple","Mango")
0.0

2013-06-30 08:18:52

其他回答

内置的SequenceMatcher在大输入时非常慢，下面是如何用diff-match-patch完成的:

from diff_match_patch import diff_match_patch

def compute_similarity_and_diff(text1, text2):
    dmp = diff_match_patch()
    dmp.Diff_Timeout = 0.0
    diff = dmp.diff_main(text1, text2, False)

    # similarity
    common_text = sum([len(txt) for op, txt in diff if op == 0])
    text_length = max(len(text1), len(text2))
    sim = common_text / text_length

    return sim, diff

2018-04-30 14:24:03

我想你们可能在寻找一种描述字符串之间距离的算法。这里有一些你可以参考的:

汉明距离 Levenshtein距离 Damerau-Levenshtein距离 Jaro-Winkler距离

2013-06-30 08:45:51

Textdistance:

TextDistance - python库，用于通过多种算法比较两个或多个序列之间的距离。它有Textdistance

30 +算法纯python实现简单的使用两个以上的序列比较有些算法在一个类中有多个实现。可选的numpy使用最高速度。

例二:

import textdistance
textdistance.hamming('test', 'text')

输出:

Example2:

import textdistance

textdistance.hamming.normalized_similarity('test', 'text')

输出:

0.75

谢谢，干杯!

2020-10-19 19:38:27

这是内置的。

from difflib import SequenceMatcher

def similar(a, b):
    return SequenceMatcher(None, a, b).ratio()

使用它:

>>> similar("Apple","Appel")
0.8
>>> similar("Apple","Mango")
0.0

2013-06-30 08:18:52

这是我想到的:

import string

def match(a,b):
    a,b = a.lower(), b.lower()
    error = 0
    for i in string.ascii_lowercase:
            error += abs(a.count(i) - b.count(i))
    total = len(a) + len(b)
    return (total-error)/total

if __name__ == "__main__":
    print(match("pple inc", "Apple Inc."))

2020-12-01 21:22:34

找出两个字符串之间的相似度度量

推荐文章

最新文章

标签