维基百科上的余弦相似度文章

你能在这里(以列表或其他形式)显示向量吗? 然后算一算,看看是怎么回事?


当前回答

简单的JAVA代码计算余弦相似度

/**
   * Method to calculate cosine similarity of vectors
   * 1 - exactly similar (angle between them is 0)
   * 0 - orthogonal vectors (angle between them is 90)
   * @param vector1 - vector in the form [a1, a2, a3, ..... an]
   * @param vector2 - vector in the form [b1, b2, b3, ..... bn]
   * @return - the cosine similarity of vectors (ranges from 0 to 1)
   */
  private double cosineSimilarity(List<Double> vector1, List<Double> vector2) {

    double dotProduct = 0.0;
    double normA = 0.0;
    double normB = 0.0;
    for (int i = 0; i < vector1.size(); i++) {
      dotProduct += vector1.get(i) * vector2.get(i);
      normA += Math.pow(vector1.get(i), 2);
      normB += Math.pow(vector2.get(i), 2);
    }
    return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
  }

其他回答

简单的JAVA代码计算余弦相似度

/**
   * Method to calculate cosine similarity of vectors
   * 1 - exactly similar (angle between them is 0)
   * 0 - orthogonal vectors (angle between them is 90)
   * @param vector1 - vector in the form [a1, a2, a3, ..... an]
   * @param vector2 - vector in the form [b1, b2, b3, ..... bn]
   * @return - the cosine similarity of vectors (ranges from 0 to 1)
   */
  private double cosineSimilarity(List<Double> vector1, List<Double> vector2) {

    double dotProduct = 0.0;
    double normA = 0.0;
    double normB = 0.0;
    for (int i = 0; i < vector1.size(); i++) {
      dotProduct += vector1.get(i) * vector2.get(i);
      normA += Math.pow(vector1.get(i), 2);
      normB += Math.pow(vector2.get(i), 2);
    }
    return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
  }

这是一个简单的Python代码,实现余弦相似度。

from scipy import linalg, mat, dot
import numpy as np

In [12]: matrix = mat( [[2, 1, 0, 2, 0, 1, 1, 1],[2, 1, 1, 1, 1, 0, 1, 1]] )

In [13]: matrix
Out[13]: 
matrix([[2, 1, 0, 2, 0, 1, 1, 1],
        [2, 1, 1, 1, 1, 0, 1, 1]])
In [14]: dot(matrix[0],matrix[1].T)/np.linalg.norm(matrix[0])/np.linalg.norm(matrix[1])
Out[14]: matrix([[ 0.82158384]])

为了简单起见,我化简了向量a和b:

Let :
    a : [1, 1, 0]
    b : [1, 0, 1]

那么余弦相似度(Theta)

 (Theta) = (1*1 + 1*0 + 0*1)/sqrt((1^2 + 1^2))* sqrt((1^2 + 1^2)) = 1/2 = 0.5

cos0.5的逆是60度。

这是我在c#中的实现。

using System;

namespace CosineSimilarity
{
    class Program
    {
        static void Main()
        {
            int[] vecA = {1, 2, 3, 4, 5};
            int[] vecB = {6, 7, 7, 9, 10};

            var cosSimilarity = CalculateCosineSimilarity(vecA, vecB);

            Console.WriteLine(cosSimilarity);
            Console.Read();
        }

        private static double CalculateCosineSimilarity(int[] vecA, int[] vecB)
        {
            var dotProduct = DotProduct(vecA, vecB);
            var magnitudeOfA = Magnitude(vecA);
            var magnitudeOfB = Magnitude(vecB);

            return dotProduct/(magnitudeOfA*magnitudeOfB);
        }

        private static double DotProduct(int[] vecA, int[] vecB)
        {
            // I'm not validating inputs here for simplicity.            
            double dotProduct = 0;
            for (var i = 0; i < vecA.Length; i++)
            {
                dotProduct += (vecA[i] * vecB[i]);
            }

            return dotProduct;
        }

        // Magnitude of the vector is the square root of the dot product of the vector with itself.
        private static double Magnitude(int[] vector)
        {
            return Math.Sqrt(DotProduct(vector, vector));
        }
    }
}

这段Python代码是我实现算法的快速而肮脏的尝试:

import math
from collections import Counter

def build_vector(iterable1, iterable2):
    counter1 = Counter(iterable1)
    counter2 = Counter(iterable2)
    all_items = set(counter1.keys()).union(set(counter2.keys()))
    vector1 = [counter1[k] for k in all_items]
    vector2 = [counter2[k] for k in all_items]
    return vector1, vector2

def cosim(v1, v2):
    dot_product = sum(n1 * n2 for n1, n2 in zip(v1, v2) )
    magnitude1 = math.sqrt(sum(n ** 2 for n in v1))
    magnitude2 = math.sqrt(sum(n ** 2 for n in v2))
    return dot_product / (magnitude1 * magnitude2)


l1 = "Julie loves me more than Linda loves me".split()
l2 = "Jane likes me more than Julie loves me or".split()


v1, v2 = build_vector(l1, l2)
print(cosim(v1, v2))