statduck
Basis Expansions & Regularization 본문
Basis Expansions & Regularization
We can't assure our function is linear.
To deal with non-linear problem, we can use transformed X instead of original X.
Basis Expansions and Regularization
$$ f(X)=\sum^M_{m=1}\beta_mh_m(X) $$
The basis function, f(X), is linear on h even though $h(X)$ is non linear
| Form | |
| $h_m(X)=X_m$ | Basic linear model | 
| $h_m(X)=X_j^2 \; or \; h_m(X)=X_jX_k$ | Polynomial model | 
| $h_m(X)=log(X_j), \sqrt{X_j}$ | Log model | 
| $h_m(X)=I(L_m\leq X_k \leq U_m)$ | Range model (When you want to locally analyze data) | 
| Methods | Example | 
| Restriction | Limited to additional model | 
| Selection | Select only significant variables on the model | 
| Regularization | Constrained coefficients | 
Natural Cubic Spline
$$
N_1(X)=1,; N_2(X)=X, ; N_{k+2}(X)=d_k(X)-d_{K-1}(X) \\
d_k(X)=\frac{(X-\xi_k)^3_+-(X-\xi_K)^3_+}{\xi_K-\xi_k}
$$
$$
\hat{\theta}=(N^TN+\lambda\Omega_N)^{-1}N^Ty \\
\hat{f}(x)=\sum^N_{j=1}N_j(x)\hat{\theta}_j
$$
class spline:
    def __init__(self, x, y):
        x = np.array([[1]*x.shape[0], x, np.power(x,2), np.power(x,3)])
        b1 = min(x[1]) + (max(x[1])-min(x[1]))/3
        b2 = min(x[1]) + 2*(max(x[1])-min(x[1]))/3
        x1 = np.append(x, [np.power(x[1],3), np.power(x[1],3)], axis=0)
        x1 = np.transpose(x1)[x[1]<b1]
        x2 = np.append(x, [np.power(x[1],3)], axis=0)
        x2 = np.transpose(x2)[(b1<=x[1])&(x[1]<b2)]
        self.x = np.transpose(x)
        self.y = y
        self.x1 = x1
        self.x2 = x2
    def training(self):
        x = self.x # col vec expression
        y = self.y
        x1 = self.x1
        x2 = self.x2
        xt = np.transpose(x)
        beta = np.linalg.inv(xt@x)@xt@y
       #  잔차에다가 또 피팅해주는 방식이에요. y값만 바뀌는 거겠죠
        y_fit = y-(x@beta)
        x1t = np.transpose(x1)
        beta1= np.linalg.inv(x1t@x1 + np.diag([0.01]*x1.shape[1]))@x1t@y_fit
        x2t = np.transpose(x2)
        beta2 = (1/(x2t@x2))*(x2t@y_fit)
        return(np.array([beta1,beta2]))
    def prediction(self, X_test):
        X_test = np.insert(X_test,0,1,axis=1)
        y_pred = (X_test@self.beta > 0).astype('uint8')
        return(y_pred)
Piecewise Polynomials and Splines
✏️ Local regression using range function.

$$ f(X)=\beta_1I(X<\xi_1)+\beta_2I(\xi_1\leq X<\xi_2)+\beta_3I(\xi_2 \leq X) $$
In this case, estimated beta is equal to the mean of target in each area.

$$ \begin{split} f(X) = & \beta_1I(X<\xi_1)+\beta_2I(\xi_1\leq X<\xi_2)+\beta_3I(\xi_2 \leq X)+ \\ & \beta_4I(X<\xi_1)X+\beta_5I(\xi_1\leq X<\xi_2)X+\beta_6I(\xi_2\leq X)X \\ & (f(\xi_1^-)=f(\xi_1^+), f(\xi_2^-)=f(\xi_2^+)) \end{split} $$
$(X-\xi_1)_+$ can be changed into $max(0,X-\xi_1)$.
✏️ Piecewise Cubic Polynomials

$$
f(X)=\beta_1+\beta_2X+\beta_3X^2+\beta_4X^3+\beta_5(X-\xi_1)^3_++\beta_6(X-\xi_2)^3_+
$$
This equation satisfies three constrains that are continuous, first derivative continuous, and second derivative continuous in the border line.$(X-\xi_k)^3_+$means this equation satisfies all constrains because it is a cubic function.
Parameter number
(# of range) $\times$ (# of parameter per range) - (# of knot) $\times$(# of constrain per knot) = 3*4-2*3=6
In lagrange multiplier these two sentences have same meaning,
- Maximize f(x,y), s.t. g(x,y)=k
- Maximize h, s.t. h(x,y,d)=f(x,y)+d(g(x,y)-k)
It implies one constraint becomes one term in the lagrange equation. Thus, we minus the number of constrains when we derive the parameter number above.
✏️ Weakness of Local polynomial regression

- It shows irregular tendency around border lines
- It's hard to use extrapolation
The border lines mean the minimun or maximum of input variables. In these borders the variance of predicted value becomes big.
$$
Point ;wise ;var=Var[\hat{f}(x_0)]
$$
Natural Cubic Spline
To overcome the weakness of local polynomial regression, natural cubic spline appears. This model adds linear constraint on the border line. To add this constraint, we need to think about this equation.
$$
f(X)=\beta_1+\beta_2X+\beta_3(d_1(X)-d_{K-1}(X))+\cdots+\beta_K(d_K(X)-d_{K-1}(X))
$$
$$
d_k(X)=\dfrac{(X-\xi_k)^3_+-(X-\xi_K)^3_+}{\xi_K-\xi_k}
$$
Proof: https://statkwon.github.io/ml/natural-spline/
Reference
Hastie, T., Tibshirani, R.,, Friedman, J. (2001). The Elements of Statistical Learning. New York, NY, USA: Springer New York Inc..
'Machine Learning' 카테고리의 다른 글
| Kernel Smoothing (0) | 2022.06.09 | 
|---|---|
| Smoothing Splines & Smoother Matrices (0) | 2022.06.09 | 
| Linear Classfier (2) (0) | 2022.05.27 | 
| Linear Classifier (1) (0) | 2022.05.27 | 
| Orthogonalization (0) | 2022.05.27 | 
 
								 
								 
								