数据挖掘中分类和聚类的区别?

有人能解释一下数据挖掘中分类和聚类的区别吗?

如果可以，请给出两者的例子以理解主旨。

当前回答

如果你问过任何数据挖掘或机器学习的人这个问题，他们会使用术语监督学习和无监督学习来解释聚类和分类之间的区别。首先让我解释一下有监督和无监督这两个关键词。

Supervised learning: suppose you have a basket and it is filled with some fresh fruits and your task is to arrange the same type fruits at one place. suppose the fruits are apple,banana,cherry, and grape. so you already know from your previous work that, the shape of each and every fruit so it is easy to arrange the same type of fruits at one place. here your previous work is called as trained data in data mining. so you already learn the things from your trained data, This is because of you have a response variable which says you that if some fruit have so and so features it is grape, like that for each and every fruit.

这种类型的数据将从经过训练的数据中获得。这种类型的学习被称为监督学习。这种类型的解决问题属于分类。所以你已经学会了这些东西，所以你可以自信地工作。

无监督: 假设你有一个篮子，里面装满了一些新鲜的水果，你的任务是把相同类型的水果摆放在一个地方。

这一次你对这些水果一无所知，你是第一次看到这些水果，所以你会如何安排相同类型的水果。

你首先要做的是拿起这个水果然后选择这个水果的任何物理特性。假设你取了颜色。

然后你会根据颜色来排列它们，然后这些组会是这样的。红色组:苹果和樱桃水果。绿色组:香蕉和葡萄。那么现在你将用另一个物理字符作为大小，所以现在群是这样的。红色和大尺寸:苹果。红色，体积小，樱桃果状。绿色，大个头:香蕉。绿色，体积小，葡萄型。工作完成了，大团圆结局。

这里你之前什么都没学，意味着没有训练数据和响应变量。这种类型的学习被称为无监督学习。聚类属于无监督学习。

2014-12-28 18:18:50

其他回答

请阅读以下信息:

2016-08-09 03:03:28

我认为分类是将数据集中的记录分类为预定义的类，甚至是在运行中定义类。我认为这是任何有价值的数据挖掘的先决条件，我喜欢把它看作无监督学习，即在挖掘数据和分类作为一个很好的起点时，一个人不知道他/她在寻找什么

另一端的聚类属于监督学习，即一个人知道要寻找什么参数，它们之间的相关性以及关键水平。我认为这需要对统计学和数学有所了解

2013-08-19 21:07:17

分类 —预测类别标签 -根据训练集和类标签属性中的值(类标签)对数据进行分类(构造模型) —使用该模型对新数据进行分类

集群:数据对象的集合 —同一集群内彼此相似 —与其他集群中的对象不同

2012-01-11 14:15:21

聚类是一种对对象进行分组的方法，通过这种方式，具有相似特征的对象聚集在一起，而具有不同特征的对象分开。它是机器学习和数据挖掘中常用的统计数据分析技术。

分类是在训练数据集的基础上识别、区分和理解对象的分类过程。分类是一种有监督的学习技术，其中训练集和正确定义的观察是可用的。

2017-05-03 19:08:05

首先，像这里的许多回答一样:分类是有监督的学习，聚类是无监督的。这意味着:

Classification needs labeled data so the classifiers can be trained on this data, and after that start classifying new unseen data based on what he knows. Unsupervised learning like clustering does not uses labeled data, and what it actually does is to discover intrinsic structures in the data like groups. Another difference between both techniques (related to the previous one), is the fact that classification is a form of discrete regression problem where the output is a categorical dependent variable. Whereas clustering's output yields a set of subsets called groups. The way to evaluate these two models is also different for the same reason: in classification you often have to check for the precision and recall, things like overfitting and underfitting, etc. Those things will tell you how good is the model. But in clustering you usually need the vision of and expert to interpret what you find, because you don't know what type of structure you have (type of group or cluster). That's why clustering belongs to exploratory data analysis. Finally, i would say that applications are the main difference between both. Classification as the word says, is used to discriminate instances that belong to a class or another, for example a man or a woman, a cat or a dog, etc. Clustering is often used in the diagnosis of medical illness, discovery of patterns, etc.

2018-11-29 13:43:37

数据挖掘中分类和聚类的区别?

推荐文章

最新文章

标签