当我们必须预测分类(或离散)结果的值时,我们使用逻辑回归。我相信我们使用线性回归来预测给定输入值的结果值。
那么,这两种方法有什么不同呢?
当我们必须预测分类(或离散)结果的值时,我们使用逻辑回归。我相信我们使用线性回归来预测给定输入值的结果值。
那么,这两种方法有什么不同呢?
在线性回归中,结果(因变量)是连续的。它可以有无限个可能值中的任意一个。在逻辑回归中,结果(因变量)只有有限数量的可能值。
例如,如果X包含以平方英尺为单位的房屋面积,而Y包含这些房屋的相应销售价格,您可以使用线性回归来预测销售价格作为房屋大小的函数。虽然可能的销售价格实际上可能没有任何值,但有很多可能的值,因此可以选择线性回归模型。
相反,如果你想根据房子的大小来预测房子是否会卖到20万美元以上,你会使用逻辑回归。可能的输出是Yes,房子将以超过20万美元的价格出售,或者No,房子不会。
Linear regression output as probabilities It's tempting to use the linear regression output as probabilities but it's a mistake because the output can be negative, and greater than 1 whereas probability can not. As regression might actually produce probabilities that could be less than 0, or even bigger than 1, logistic regression was introduced. Source: http://gerardnico.com/wiki/data_mining/simple_logistic_regression Outcome In linear regression, the outcome (dependent variable) is continuous. It can have any one of an infinite number of possible values. In logistic regression, the outcome (dependent variable) has only a limited number of possible values. The dependent variable Logistic regression is used when the response variable is categorical in nature. For instance, yes/no, true/false, red/green/blue, 1st/2nd/3rd/4th, etc. Linear regression is used when your response variable is continuous. For instance, weight, height, number of hours, etc. Equation Linear regression gives an equation which is of the form Y = mX + C, means equation with degree 1. However, logistic regression gives an equation which is of the form Y = eX + e-X Coefficient interpretation In linear regression, the coefficient interpretation of independent variables are quite straightforward (i.e. holding all other variables constant, with a unit increase in this variable, the dependent variable is expected to increase/decrease by xxx). However, in logistic regression, depends on the family (binomial, Poisson, etc.) and link (log, logit, inverse-log, etc.) you use, the interpretation is different. Error minimization technique Linear regression uses ordinary least squares method to minimise the errors and arrive at a best possible fit, while logistic regression uses maximum likelihood method to arrive at the solution. Linear regression is usually solved by minimizing the least squares error of the model to the data, therefore large errors are penalized quadratically. Logistic regression is just the opposite. Using the logistic loss function causes large errors to be penalized to an asymptotically constant. Consider linear regression on categorical {0, 1} outcomes to see why this is a problem. If your model predicts the outcome is 38, when the truth is 1, you've lost nothing. Linear regression would try to reduce that 38, logistic wouldn't (as much)2.
基本区别:
线性回归基本上是一个回归模型,这意味着它将给出一个函数的非离散/连续输出。这个方法给出了值。例如,给定x, f(x)是多少
例如,给定一个由不同因素组成的训练集和训练后的房地产价格,我们可以提供所需的因素来确定房地产价格。
逻辑回归基本上是一种二元分类算法,这意味着这里函数的输出值是离散的。例如:对于给定的x,如果f(x)>阈值将其分类为1,否则将其分类为0。
例如,给定一组脑瘤大小作为训练数据,我们可以使用大小作为输入来确定它是良性肿瘤还是恶性肿瘤。因此这里的输出不是0就是1。
这里的函数基本上是假设函数
它们在解决解决方案方面非常相似,但正如其他人所说,一个(逻辑回归)是用于预测类别“适合”(Y/N或1/0),另一个(线性回归)是用于预测值。
所以如果你想预测你是否有癌症Y/N(或概率)-使用逻辑。如果你想知道你能活多少年,用线性回归吧!
只是补充一下之前的答案。
线性回归
Is meant to resolve the problem of predicting/estimating the output value for a given element X (say f(x)). The result of the prediction is a continuous function where the values may be positive or negative. In this case you normally have an input dataset with lots of examples and the output value for each one of them. The goal is to be able to fit a model to this data set so you are able to predict that output for new different/never seen elements. Following is the classical example of fitting a line to set of points, but in general linear regression could be used to fit more complex models (using higher polynomial degrees):
解决问题
线性回归有两种不同的求解方法:
法方程(直接解题方法) 梯度下降(迭代法)
逻辑回归
是为了解决分类问题,给定一个元素,你必须把它分成N个类别。典型的例子是,例如,给定一封邮件,将其分类为垃圾邮件,或者给定一辆车辆,查找它属于哪个类别(汽车、卡车、货车等)。基本上输出是一组有限的离散值。
解决问题
逻辑回归问题只能通过梯度下降来解决。一般来说,公式与线性回归非常相似,唯一的区别是使用不同的假设函数。在线性回归中,假设的形式为:
h(x) = theta_0 + theta_1*x_1 + theta_2*x_2 ..
其中是我们试图拟合的模型[1,x_1, x_2, ..]为输入向量。在逻辑回归中,假设函数是不同的:
g(x) = 1 / (1 + e^-x)
This function has a nice property, basically it maps any value to the range [0,1] which is appropiate to handle propababilities during the classificatin. For example in case of a binary classification g(X) could be interpreted as the probability to belong to the positive class. In this case normally you have different classes that are separated with a decision boundary which basically a curve that decides the separation between the different classes. Following is an example of dataset separated in two classes.
You can also use the below code to generate the linear regression curve q_df = details_df # q_df = pd.get_dummies(q_df) q_df = pd.get_dummies(q_df, columns=[ "1", "2", "3", "4", "5", "6", "7", "8", "9" ]) q_1_df = q_df["1"] q_df = q_df.drop(["2", "3", "4", "5"], axis=1) (import statsmodels.api as sm) x = sm.add_constant(q_df) train_x, test_x, train_y, test_y = sklearn.model_selection.train_test_split( x, q3_rechange_delay_df, test_size=0.2, random_state=123 ) lmod = sm.OLS(train_y, train_x).fit() lmod.summary() lmod.predict()[:10] lmod.get_prediction().summary_frame()[:10] sm.qqplot(lmod.resid,line="q") plt.title("Q-Q plot of Standardized Residuals") plt.show()
非常同意以上的评论。 除此之外,还有一些不同之处
在线性回归中,残差被假设为正态分布。 在逻辑回归中,残差需要是独立的,但不是正态分布。
线性回归假设解释变量值的恒定变化导致响应变量的恒定变化。 如果响应变量的值代表概率(在逻辑回归中),则此假设不成立。
广义线性模型(GLM)不假设因变量和自变量之间存在线性关系。但在logit模型中,它假设link函数与自变量之间是线性关系。
简单地说,如果在线性回归模型中有更多的测试用例到达,这些测试用例远离预测y=1和y=0的阈值(例如=0.5)。在这种情况下,假设就会改变,变得更糟。因此,线性回归模型不适用于分类问题。
另一个问题是,如果分类是y=0和y=1, h(x)可以是> 1或< 0。因此,我们使用Logistic回归0<=h(x)<=1。
| Basis | Linear | Logistic |
|-----------------------------------------------------------------|--------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|
| Basic | The data is modelled using a straight line. | The probability of some obtained event is represented as a linear function of a combination of predictor variables. |
| Linear relationship between dependent and independent variables | Is required | Not required |
| The independent variable | Could be correlated with each other. (Specially in multiple linear regression) | Should not be correlated with each other (no multicollinearity exist). |
逻辑回归用于预测分类输出,如是/否,低/中/高等。你基本上有2种类型的逻辑回归二元逻辑回归(是/否,批准/不批准)或多类逻辑回归(低/中/高,0-9等数字)
另一方面,线性回归是因变量(y)是连续的。 Y = mx + c是一个简单的线性回归方程(m =斜率,c是Y截距)。多元线性回归有不止一个自变量(x1,x2,x3,…)等)
Regression means continuous variable, Linear means there is linear relation between y and x. Ex= You are trying to predict salary from no of years of experience. So here salary is independent variable(y) and yrs of experience is dependent variable(x). y=b0+ b1*x1 We are trying to find optimum value of constant b0 and b1 which will give us best fitting line for your observation data. It is a equation of line which gives continuous value from x=0 to very large value. This line is called Linear regression model.
逻辑回归是一种分类技术。不要被术语回归所误导。这里我们预测y=0还是1。
在这里,我们首先需要从下面的公式中找出给定x的p(y=1) (y=1的w概率)。
概率p通过下面的公式与y相关
Ex=我们可以将患癌几率超过50%的肿瘤分类为1,将患癌几率低于50%的肿瘤分类为0。
这里红点被预测为0,而绿点被预测为1。
在线性回归中,结果是连续的,而在逻辑回归中,结果只有有限数量的可能值(离散的)。
例子: 在一种情况下,x的给定值是一个地块的平方英尺大小,然后预测y的比率是在线性回归下。
相反,如果你想根据面积预测地块是否会以超过30万卢比的价格出售,你将使用逻辑回归。可能的输出是Yes,该地块的售价将超过30万卢比,或者No。
在线性回归的情况下,结果是连续的,而在逻辑回归的情况下,结果是离散的(非连续的)
要执行线性回归,我们需要因变量和自变量之间的线性关系。但要执行逻辑回归,我们不需要因变量和自变量之间的线性关系。
线性回归是在数据中拟合一条直线,而逻辑回归是在数据中拟合一条曲线。
线性回归是机器学习的一种回归算法,逻辑回归是机器学习的一种分类算法。
线性回归假设因变量呈高斯(或正态)分布。逻辑回归假设因变量为二项分布。