在人工智能和机器学习方面,监督学习和无监督学习的区别是什么? 你能举个例子简单地解释一下吗?


当前回答

既然你问了这个非常基本的问题,似乎有必要详细说明机器学习本身是什么。

Machine Learning is a class of algorithms which is data-driven, i.e. unlike "normal" algorithms it is the data that "tells" what the "good answer" is. Example: a hypothetical non-machine learning algorithm for face detection in images would try to define what a face is (round skin-like-colored disk, with dark area where you expect the eyes etc). A machine learning algorithm would not have such coded definition, but would "learn-by-examples": you'll show several images of faces and not-faces and a good algorithm will eventually learn and be able to predict whether or not an unseen image is a face.

这个特殊的人脸检测的例子是有监督的,这意味着你的例子必须被标记,或者明确地说哪些是人脸,哪些不是。

在无监督算法中,你的例子没有标记,也就是说你什么都不说。当然,在这种情况下,算法本身不能“发明”人脸是什么,但它可以尝试将数据聚类到不同的组中,例如,它可以区分人脸与风景非常不同,而风景与马非常不同。

Since another answer mentions it (though, in an incorrect way): there are "intermediate" forms of supervision, i.e. semi-supervised and active learning. Technically, these are supervised methods in which there is some "smart" way to avoid a large number of labeled examples. In active learning, the algorithm itself decides which thing you should label (e.g. it can be pretty sure about a landscape and a horse, but it might ask you to confirm if a gorilla is indeed the picture of a face). In semi-supervised learning, there are two different algorithms which start with the labeled examples, and then "tell" each other the way they think about some large number of unlabeled data. From this "discussion" they learn.

其他回答

监督式学习

训练数据包括输入向量的示例及其相应的目标向量的应用被称为监督学习问题。

无监督学习

在其他模式识别问题中,训练数据由一组输入向量x组成,没有任何对应的目标值。这种无监督学习问题的目标可能是在数据中发现相似的例子组,在这里它被称为聚类

模式识别和机器学习(Bishop, 2006)

我一直认为无监督学习和有监督学习之间的区别是随意的,有点令人困惑。这两种情况之间没有真正的区别,相反,在一系列情况下,算法可以或多或少地“监督”。半监督学习的存在是界限模糊的一个明显例子。

我倾向于认为监督是对算法提供关于应该首选哪些解决方案的反馈。对于传统的监督设置,比如垃圾邮件检测,你告诉算法“不要在训练集上犯任何错误”;对于传统的无监督设置,比如聚类,你告诉算法“彼此接近的点应该在同一个聚类中”。很巧的是,第一种反馈形式比后者更具体。

简而言之,当有人说“有监督”时,想想分类,当他们说“无监督”时,想想聚类,尽量不要过于担心除此之外的问题。

监督式机器学习 算法从训练数据集中学习的过程 预测产出。”

预测输出精度与训练数据(长度)成正比

监督学习是指你有输入变量(x)(训练数据集)和输出变量(Y)(测试数据集),你使用一种算法来学习从输入到输出的映射函数。

Y = f(X)

主要类型:

分类(离散y轴) 预测(连续y轴)

算法:

分类算法: 神经网络 Naïve贝叶斯分类器 费雪线性判别 然而, 决策树 超级向量机 预测算法: 最近的邻居 线性回归,多元回归

应用领域:

将电子邮件分类为垃圾邮件 患者是否有 疾病与否 语音识别 预测HR是否会选择特定的候选人 预测股票市场价格

既然你问了这个非常基本的问题,似乎有必要详细说明机器学习本身是什么。

Machine Learning is a class of algorithms which is data-driven, i.e. unlike "normal" algorithms it is the data that "tells" what the "good answer" is. Example: a hypothetical non-machine learning algorithm for face detection in images would try to define what a face is (round skin-like-colored disk, with dark area where you expect the eyes etc). A machine learning algorithm would not have such coded definition, but would "learn-by-examples": you'll show several images of faces and not-faces and a good algorithm will eventually learn and be able to predict whether or not an unseen image is a face.

这个特殊的人脸检测的例子是有监督的,这意味着你的例子必须被标记,或者明确地说哪些是人脸,哪些不是。

在无监督算法中,你的例子没有标记,也就是说你什么都不说。当然,在这种情况下,算法本身不能“发明”人脸是什么,但它可以尝试将数据聚类到不同的组中,例如,它可以区分人脸与风景非常不同,而风景与马非常不同。

Since another answer mentions it (though, in an incorrect way): there are "intermediate" forms of supervision, i.e. semi-supervised and active learning. Technically, these are supervised methods in which there is some "smart" way to avoid a large number of labeled examples. In active learning, the algorithm itself decides which thing you should label (e.g. it can be pretty sure about a landscape and a horse, but it might ask you to confirm if a gorilla is indeed the picture of a face). In semi-supervised learning, there are two different algorithms which start with the labeled examples, and then "tell" each other the way they think about some large number of unlabeled data. From this "discussion" they learn.

监督学习是指你为算法提供的数据被“标记”或“标记”,以帮助你的逻辑做出决策。

示例:贝叶斯垃圾邮件过滤,您必须将一个项目标记为垃圾邮件以优化结果。

无监督学习是一种试图在原始数据之外没有任何外部输入的情况下找到相关性的算法。

例如:数据挖掘聚类算法。