Classification vs. Regression
Regression can take continuous output values, while classification can only take discrete values. (0,1,etc.)
Binary classification problems have output values 0 or 1.
Use linear regression and map all predictions greater than 0.5 as a 1 and all less than 0.5 as a 0 doesn’t work well because classification is not actually a linear function.
Logistic function (Sigmoid)
The graph looks like:
h
θ
(
x
)
h_{\theta}(x)
hθ(x) gives us the probability of output = 1 given
x
x
x, on the other hand,
1
−
h
θ
(
x
)
1-h_{\theta}(x)
1−hθ(x) gives us the probability of output = 0 given
x
x
x.
When h θ ( x ) > = 0.5 h_{\theta}(x) >= 0.5 hθ(x)>=0.5 output = 1, otherwise output = 0.
Therefore, we can say that when θ T x > = 0 \theta^{T}x >=0 θTx>=0, we have output = 1, otherwise output = 0.
Decision Boundary
A property of the Logistic Function that separates h θ ( x ) h_{\theta}(x) hθ(x) = 1 & h θ ( x ) h_{\theta}(x) hθ(x) = 0
Boundary itself is the set of points that yields h θ ( x ) h_{\theta}(x) hθ(x) = 0.5.
Cost function
Linear regression cost function won’t work because Logistic function makes it non-convex (has a lot of local optima)
Cost function for Logistic Function is:
For y = 1, the function looks like:
For y = 0, the function looks like:
Here are the properties of Logistic cost function:
We can further compress our cost function to:
Then we make it applicable to multivariate situation: