Training set:
x = “input” variable
feature
y = “output” variable
target
m = number of training examples
(x, y) = single training example
(xi, yi) = ith training example
ŷ :estimated
cost function
Model : f(x) = wx + b
w, b : parameters
coefficients
weights
ŷi = f(xi) = wxi + b
cost function :J(w, b) =
1
2
m
[
∑
i
=
1
m
(
y
^
i
−
y
i
)
2
]
\frac{1}{2m}[ \sum_{i=1}^{m} (ŷ^i - y^i)^2]
2m1[∑i=1m(y^i−yi)2]
goal :minimize J(w, b)
gradient descent
outline: start with some w,b
keep changing w,b to reduce J(w,b)
until settle at or near a minimum
w = w - α
ə
ə
w
\frac{ə}{əw}
əwəJ(w,b)
b = b - α
ə
ə
b
\frac{ə}{əb}
əbəJ(w,b)
α: learning rate/ the size of step