数据挖掘中分类和聚类的区别?

有人能解释一下数据挖掘中分类和聚类的区别吗?

如果可以，请给出两者的例子以理解主旨。

当前回答

我是一个数据挖掘的新手，但正如我的课本所说，分类应该是监督学习，而聚类应该是非监督学习。监督学习和无监督学习之间的区别可以在这里找到。

2011-03-09 17:40:58

其他回答

聚类的目的是在数据中找到组。“集群”是一个直观的概念，确实如此没有严格的数学定义。一个集群的成员应该是彼此相似，而与其他集群的成员不同。一个集群算法对一个未标记的数据集Z进行操作，并在其上生成一个分区。

对于类和类标签，类包含相似的对象，而来自不同类的对象是不同的。有些类具有明确的含义，在最简单的情况下相互排斥。例如，在签名验证中，签名为任意一种真的或伪造的。真正的阶级是两者之一，不管我们可能不是能根据观察到的特定特征正确猜测的。

2014-04-23 15:13:33

There are two definitions in data mining "Supervised" and "Unsupervised". When someone tells the computer, algorithm, code, ... that this thing is like an apple and that thing is like an orange, this is supervised learning and using supervised learning (like tags for each sample in a data set) for classifying the data, you'll get classification. But on the other hand if you let the computer find out what is what and differentiate between features of the given data set, in fact learning unsupervised, for classifying the data set this would be called clustering. In this case data that are fed to the algorithm don't have tags and the algorithm should find out different classes.

2017-02-27 21:19:44

2011-03-09 17:40:58

首先，像这里的许多回答一样:分类是有监督的学习，聚类是无监督的。这意味着:

Classification needs labeled data so the classifiers can be trained on this data, and after that start classifying new unseen data based on what he knows. Unsupervised learning like clustering does not uses labeled data, and what it actually does is to discover intrinsic structures in the data like groups. Another difference between both techniques (related to the previous one), is the fact that classification is a form of discrete regression problem where the output is a categorical dependent variable. Whereas clustering's output yields a set of subsets called groups. The way to evaluate these two models is also different for the same reason: in classification you often have to check for the precision and recall, things like overfitting and underfitting, etc. Those things will tell you how good is the model. But in clustering you usually need the vision of and expert to interpret what you find, because you don't know what type of structure you have (type of group or cluster). That's why clustering belongs to exploratory data analysis. Finally, i would say that applications are the main difference between both. Classification as the word says, is used to discriminate instances that belong to a class or another, for example a man or a woman, a cat or a dog, etc. Clustering is often used in the diagnosis of medical illness, discovery of patterns, etc.

2018-11-29 13:43:37

分类

是根据从例子中学习，将预定义的类分配给新的观察结果。

这是机器学习的关键任务之一。

聚类(或聚类分析)

尽管被普遍认为是“无监督分类”，但它完全不同。

与许多机器学习者教你的不同，它不是将“类”分配给对象，而是没有预先定义它们。这是做了太多分类的人的有限观点;一个典型的例子，如果你有一个锤子(分类器)，所有的东西对你来说都像钉子(分类问题)。但这也是为什么从事分类的人没有掌握聚类的诀窍。

相反，可以将其视为结构发现。聚类的任务是在你的数据中找到你以前不知道的结构(例如组)。如果您学习了一些新的东西，那么群集是成功的。如果你只知道你已经知道的结构，它就失败了。

聚类分析是数据挖掘的关键任务(也是机器学习中的丑小鸭，所以不要相信机器学习者对聚类的否定)。

“无监督学习”有点矛盾

这在文献中反复出现，但无监督学习是该死的。它并不存在，但它就像“军事情报”一样自相矛盾。

算法要么从例子中学习(那么它就是“监督学习”)，要么不学习。如果所有的聚类方法都是“学习”，那么计算一个数据集的最小值、最大值和平均值也是“无监督学习”。然后任何计算“学习”它的输出。因此，术语“无监督学习”是完全没有意义的，它意味着一切和什么都不是。

Some "unsupervised learning" algorithms do, however, fall into the optimization category. For example k-means is a least-squares optimization. Such methods are all over statistics, so I don't think we need to label them "unsupervised learning", but instead should continue to call them "optimization problems". It's more precise, and more meaningful. There are plenty of clustering algorithms who do not involve optimization, and who do not fit into machine-learning paradigms well. So stop squeezing them in there under the umbrella "unsupervised learning".

有一些与集群相关的“学习”，但学习的不是程序。用户应该学习关于他的数据集的新东西。

2015-08-19 12:53:23

数据挖掘中分类和聚类的区别?

推荐文章

最新文章

标签