是否可以使用scikit-learn K-Means聚类来指定自己的距离函数?
当前回答
def distance_metrics(dist_metrics):
kmeans_instance = kmeans(trs_data, initial_centers, metric=dist_metrics)
label = np.zeros(210, dtype=int)
for i in range(0, len(clusters)):
for index, j in enumerate(clusters[i]):
label[j] = i
其他回答
是的,在当前稳定版本的sklearn (scikit-learn 1.1.3)中,您可以轻松地使用自己的距离度量。你所要做的就是创建一个继承自sklearn.cluster.KMeans的类,并覆盖它的_transform方法。
下面的例子是IOU与Yolov2论文的距离。
import sklearn.cluster
import numpy as np
def anchor_iou(box_dims, centroid_box_dims):
box_w, box_h = box_dims[..., 0], box_dims[..., 1]
centroid_w, centroid_h = centroid_box_dims[..., 0], centroid_box_dims[..., 1]
inter_w = np.minimum(box_w[..., np.newaxis], centroid_w[np.newaxis, ...])
inter_h = np.minimum(box_h[..., np.newaxis], centroid_h[np.newaxis, ...])
inter_area = inter_w * inter_h
centroid_area = centroid_w * centroid_h
box_area = box_w * box_h
return inter_area / (
centroid_area[np.newaxis, ...] + box_area[..., np.newaxis] - inter_area
)
class IOUKMeans(sklearn.cluster.KMeans):
def __init__(
self,
n_clusters=8,
*,
init="k-means++",
n_init=10,
max_iter=300,
tol=1e-4,
verbose=0,
random_state=None,
copy_x=True,
algorithm="lloyd",
):
super().__init__(
n_clusters=n_clusters,
init=init,
n_init=n_init,
max_iter=max_iter,
tol=tol,
verbose=verbose,
random_state=random_state,
copy_x=copy_x,
algorithm=algorithm
)
def _transform(self, X):
return anchor_iou(X, self.cluster_centers_)
rng = np.random.default_rng(12345)
num_boxes = 10
bboxes = rng.integers(low=0, high=100, size=(num_boxes, 2))
kmeans = IOUKMeans(num_clusters).fit(bboxes)
是的,你可以使用差分度量函数;然而,根据定义,k-means聚类算法依赖于每个聚类均值的欧几里得距离。
你可以使用不同的度量,所以即使你仍然在计算平均值你也可以使用像mahalnobis距离这样的东西。
Spectral Python的k-means允许使用L1 (Manhattan)距离。
The Affinity propagation algorithm from the sklearn library allows you to pass the similarity matrix instead of the samples. So, you can use your metric to compute the similarity matrix (not the dissimilarity matrix) and pass it to the function by setting the "affinity" term to "precomputed".https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AffinityPropagation.html#sklearn.cluster.AffinityPropagation.fit In terms of the K-Mean, I think it is also possible but I have not tried it. However, as the other answers stated, finding the mean using a different metric will be the issue. Instead, you can use PAM (K-Medoids) algorthim as it calculates the change in Total Deviation (TD), thus it does not rely on the distance metric. https://python-kmedoids.readthedocs.io/en/latest/#fasterpam
不幸的是没有:scikit-learn目前实现的k-means只使用欧几里得距离。
将k-means扩展到其他距离并不是一件简单的事情,denis上面的回答并不是对其他度量实现k-means的正确方法。
推荐文章
- 在python shell中按方向键时看到转义字符
- 在pip install中方括号是什么意思?
- 使用Matplotlib以非阻塞的方式绘图
- 使用sklearn缩放Pandas数据框架列
- 如何创建关键或附加一个元素的关键?
- virtualenv的问题-无法激活
- 是否可以使用scikit-learn K-Means聚类来指定自己的距离函数?
- 如何在Python中删除文本文件的文件内容?
- 一个干净、轻量级的Python扭曲的替代品?
- 在Python中从字符串中移除所有非数字字符
- 在Python中,如何以排序的键顺序遍历字典?
- Python中的多行f-string
- 批量归一化和退出的排序?
- Python中的“@=”符号是什么意思?
- read_csv . pandas中的Datetime dtypes