machine learning 笔记
- Linear Regression with one variable
- Linear Regression with multiple variables
- Linear Algebra:
- Octave Tutorial
Linear Regression with one variable
Notion
- m = Number of training examples
- x’s = “input” variable / features
- y’s = “output” variable / “target” variable
Hypothesis
\[ h_{\theta}(x) = \theta_0 + \theta_1x \]
Cost function
\[
J(\theta)
= \frac{1}{2m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2
= \frac{1}{2m} \sum_{i=1}^m (\theta_0+\theta_1x^{(i)} - y^{(i)})^2
\]
Gredient descent
\[
\theta_j = \theta_j - \alpha \frac{\partial}{\partial\theta_j} J(\Theta) \\
= \theta_j - \alpha \frac{\partial}{\partial\theta_j} (\frac{1}{2m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2) \\
= \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})(x^{(i)}_j)
\]
- j = 0
\[
\theta_0 = \theta_0 - \alpha \frac{1}{m} \sum_{i=1}^m(\theta_0+\theta_1x^{(i)}-y^{(i)})
\]
- j = 1
\[
\theta_1 = \theta_1 - \alpha \frac{1}{m} \sum_{i=1}^m(\theta_0+\theta_1x^{(i)}-y^{(i)})x^{(i)}
\]
(simultaneously update \(\theta_j\) for all j)
Linear Regression with multiple variables
Notion
- \(n\) = number of features
- \(x^{(i)}\) = input of \(i^{th}\) training example
- \(x^{(i)}_j\) = input of \(i^{th}\) training example 's \(j^{th}\) feature.
- \(x^{(2)}_3\) = 2
Hypothesis derivation
- Previousely: \( h_{\theta}(x) = \theta_0 + \theta_1x \)
- New algorithm: \( h_{\theta}(x) = \theta_0 + \theta_1x_1 + \theta_2x_2 + \theta_3x_3 \)
\(
X =
\begin{bmatrix} x_0\\x_1\\x_2\\\vdots\\x_n \end{bmatrix}
\in R^{n+1}
\)
\(
\Theta =
\begin{bmatrix} \theta_0\\\theta_1\\\theta_2\\\vdots\\\theta_n \end{bmatrix}
\in R^{n+1}
\)
- For convenience of notation, define \(x_0=1\)
\(
h_{\theta}(x)
= \theta_0x_0 + \theta_1x_1 + \theta_2x_2 + \theta_3x_3
= \Theta^TX
\)
Hypothesis
\[
\begin{align}
h_{\theta}(x)
&= \Theta^TX \\
&= \theta_0x_0 + \theta_1x_1 + \theta_2x_2+ \cdots + \theta_nx_n
\end{align}
\]
Cost function
\[
\begin{align}
J(\theta)
&= J(\theta_0, \theta_1, \cdots, \theta_n) \\
&= \frac{1}{2m} \sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})^2
\end{align}
\]
Gradient descent
\[
\begin{align}
\theta_j &= \theta_j - \alpha \frac{\partial}{\partial\theta_j} J(\Theta) \\
&= \theta_j - \alpha \frac{\partial}{\partial\theta_j} (\frac{1}{2m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2) \\
&= \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})(x^{(i)}_j)
\end{align}
\]
- j = 0
\[
\theta_0 = \theta_0 - \alpha \frac{1}{m} \sum_{i=1}^m(\theta_0+\theta_1x^{(i)}-y^{(i)})(x^{(i)}_0)
\]
- j = 1
\[
\theta_1 = \theta_1 - \alpha \frac{1}{m} \sum_{i=1}^m(\theta_0+\theta_1x^{(i)}-y^{(i)})x^{(i)}(x^{(i)}_1)
\]
……
Feature Scaling
- Make sure features are on a similar scale.
- Get every feature into approximately a \(-1 \leq x_i \leq 1\) range.
Mean normalization
Replace \(x_i\) with \(x_i-\mu_i\) to make features have approximately zero mean
(Do not apply to ).
E.g.
\[ x_1 = \frac{size-1000}{2000} \]
\[ x_2 = \frac{bedrooms-2}{5} \]
\[x_1 = \frac{x_1-\mu_1}{\sigma_1}\]
\[x_2 = \frac{x_2-\mu_2}{\sigma_2}\]
Learning rate
While doing gradient decent
\[
\theta_j = \theta_j - \alpha \frac{\partial}{\partial\theta_j} J(\Theta)
\]
“Debugging”: How to make sure gradient descent is working correctly.
How to choose learning rate
Summary:
- If \(\alpha\) too small: slow convergence
- If \(\alpha\) too large: \(J(\theta)\) may not decrease on every iteration; may not converge.
To choose \(\alpha\), try
\(\cdots, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, \cdots\)
Normal Equation
Normal equation: Method to solve for analy2cally.
\[ \theta = (X^TX)^{-1}X^Ty \]
Gradient Descent | Normal Equation | |
---|---|---|
1 | Need choose \(\alpha\) | No need to choose \(\alpha\) |
2 | Need many iterations | Do not need to iterate |
3 | Works well even n is large | Need to compute \((X^TX)^{-1}\), time complexity O(n3) |
4 | Slow if n is very large |
Linear Algebra:
Matrices and vectors
- Matrix:
- Dimension of matrix:
- Matrix Elements
- Vector
Addition and scalar multiplication
Matrix Addition
Scalar Multiplication
Matrix-vector multiplication
Matrix-matrix multiplication
Matrix multiplication properties
Inverse and transpose
- Inverse
- Transponse
Octave Tutorial
Basic operations
Moving data around
Computing on data
Plotting data
Control statements: for, while, if statements
Vectorial implementation
\[
\begin{split}
&\theta_j = \theta_j - \alpha \frac{\partial}{\partial\theta_j} J(\Theta) \\
&= \theta_j - \alpha \frac{\partial}{\partial\theta_j} (\frac{1}{2m} \sum_{i=1}^m (h_\theta(x^{(i)} - y^{(i)}))^2) \\
&= \theta_j - \alpha \frac{\partial}{\partial\theta_j} (\frac{1}{2m} \sum_{i=1}^m (\theta_0+\theta_1x^{(i)} - y^{(i)})^2) \\
&= \theta_j - \alpha (\frac{1}{m} \sum_{i=1}^m (1+x^{(i)}))
\end{split}
\]