数据挖掘中分类和聚类的区别?

有人能解释一下数据挖掘中分类和聚类的区别吗?

如果可以，请给出两者的例子以理解主旨。

当前回答

通过聚类，可以用所需的属性(如数量、形状和提取的聚类的其他属性)对数据进行分组。而在分类中，组的数量和形状是固定的。大多数聚类算法都给出了聚类个数作为参数。然而，有一些方法可以找出合适的集群数量。

2017-09-02 05:34:27

其他回答

分类和聚类之间的主要区别是: 分类是借助类标签对数据进行分类的过程。另一方面，聚类类似于分类，但没有预定义的类标签。分类与监督学习相适应。与此相反，聚类也被称为无监督学习。分类方法提供训练样本，聚类方法不提供训练数据。

希望这对你有所帮助!

2019-01-01 07:45:43

我是一个数据挖掘的新手，但正如我的课本所说，分类应该是监督学习，而聚类应该是非监督学习。监督学习和无监督学习之间的区别可以在这里找到。

2011-03-09 17:40:58

通常，在分类中，您有一组预定义的类，并希望知道新对象属于哪个类。

聚类尝试将一组对象分组，并发现对象之间是否存在某种关系。

在机器学习的背景下，分类是监督学习，聚类是无监督学习。

也可以看看维基百科上的分类和聚类。

2011-02-21 10:44:56

分类

是根据从例子中学习，将预定义的类分配给新的观察结果。

这是机器学习的关键任务之一。

聚类(或聚类分析)

尽管被普遍认为是“无监督分类”，但它完全不同。

与许多机器学习者教你的不同，它不是将“类”分配给对象，而是没有预先定义它们。这是做了太多分类的人的有限观点;一个典型的例子，如果你有一个锤子(分类器)，所有的东西对你来说都像钉子(分类问题)。但这也是为什么从事分类的人没有掌握聚类的诀窍。

相反，可以将其视为结构发现。聚类的任务是在你的数据中找到你以前不知道的结构(例如组)。如果您学习了一些新的东西，那么群集是成功的。如果你只知道你已经知道的结构，它就失败了。

聚类分析是数据挖掘的关键任务(也是机器学习中的丑小鸭，所以不要相信机器学习者对聚类的否定)。

“无监督学习”有点矛盾

这在文献中反复出现，但无监督学习是该死的。它并不存在，但它就像“军事情报”一样自相矛盾。

算法要么从例子中学习(那么它就是“监督学习”)，要么不学习。如果所有的聚类方法都是“学习”，那么计算一个数据集的最小值、最大值和平均值也是“无监督学习”。然后任何计算“学习”它的输出。因此，术语“无监督学习”是完全没有意义的，它意味着一切和什么都不是。

Some "unsupervised learning" algorithms do, however, fall into the optimization category. For example k-means is a least-squares optimization. Such methods are all over statistics, so I don't think we need to label them "unsupervised learning", but instead should continue to call them "optimization problems". It's more precise, and more meaningful. There are plenty of clustering algorithms who do not involve optimization, and who do not fit into machine-learning paradigms well. So stop squeezing them in there under the umbrella "unsupervised learning".

有一些与集群相关的“学习”，但学习的不是程序。用户应该学习关于他的数据集的新东西。

2015-08-19 12:53:23

首先，像这里的许多回答一样:分类是有监督的学习，聚类是无监督的。这意味着:

Classification needs labeled data so the classifiers can be trained on this data, and after that start classifying new unseen data based on what he knows. Unsupervised learning like clustering does not uses labeled data, and what it actually does is to discover intrinsic structures in the data like groups. Another difference between both techniques (related to the previous one), is the fact that classification is a form of discrete regression problem where the output is a categorical dependent variable. Whereas clustering's output yields a set of subsets called groups. The way to evaluate these two models is also different for the same reason: in classification you often have to check for the precision and recall, things like overfitting and underfitting, etc. Those things will tell you how good is the model. But in clustering you usually need the vision of and expert to interpret what you find, because you don't know what type of structure you have (type of group or cluster). That's why clustering belongs to exploratory data analysis. Finally, i would say that applications are the main difference between both. Classification as the word says, is used to discriminate instances that belong to a class or another, for example a man or a woman, a cat or a dog, etc. Clustering is often used in the diagnosis of medical illness, discovery of patterns, etc.

2018-11-29 13:43:37

数据挖掘中分类和聚类的区别?

推荐文章

最新文章

标签