Zarathu Co.,Ltd
GAM 은 비선형관계를 다루는 통계방법이다
종속변수 형태따라 여러종류
\[\begin{align} Y=\beta_0+\beta_1 x_{1}+\beta_2 x_2+\cdots+\epsilon \end{align}\] \[\begin{align} Y=\beta_0+ f(x_1)+\beta_2 x_2 \cdots+\epsilon \end{align}\] \(f(x_1,x_2)\)꼴의 형태도 가능
Locally weighted scatterplot smoothing
Cubic = 3차방정식
Cubic + 처음과 끝은 Linear
library(splines)
cs1 <- glm(time ~ bs(age, knots = c(40, 50, 60, 70)) + sex, data = colon)
cs2 <- glm(time ~ bs(age, df = 4) + sex, data = colon)
ns1 <- glm(time ~ ns(age, knots = c(40, 50, 60, 70)) + sex, data = colon)
ns2 <- glm(time ~ ns(age, df = 4) + sex, data = colon)
age.grid <- seq(min(colon$age), max(colon$age), by = 1)
with(colon, plot(age,time,col="grey",xlab="Age",ylab="Time"))
points(age.grid, predict(cs1, newdata = data.frame(age=age.grid, sex = 1)), col=1, lwd=1, type="l")
points(age.grid, predict(cs2, newdata = data.frame(age=age.grid, sex = 1)), col=2, lwd=2, type="l")
points(age.grid, predict(ns1, newdata = data.frame(age=age.grid, sex = 1)), col=3, lwd=3, type="l")
points(age.grid, predict(ns2, newdata = data.frame(age=age.grid, sex = 1)), col=4, lwd=4, type="l")
#adding cutpoints
abline(v = c(40, 50, 60, 70), lty=2, col="black")
legend("topleft", c("cs:knots" ,"cs:df", "ns:knots", "ns:df"), col = 1:4, lwd = 1:4)
mgcv R 패키지의 기본옵션.
Loess, Cubic spline
Smoothing(penalized) spline
Family: gaussian
Link function: identity
Formula:
time ~ s(age) + sex
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1533.194 31.619 48.489 <2e-16 ***
sex 8.354 43.851 0.191 0.849
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(age) 7.584 8.447 2.725 0.00437 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.00938 Deviance explained = 1.4%
GCV = 8.9245e+05 Scale est. = 8.8784e+05 n = 1858
s(age)
0.04120719
Smoothing spline: basis function 들의 합
\[s(x) = \sum_{k = 1}^K \beta_k b_k(x)\]
9개의 basis function
model_matrix 에 계수를 곱하면 곡선의 y값
(Intercept) sex s(age).1 s(age).2 s(age).3 s(age).4
1533.193607 8.353593 776.245594 1393.793067 232.785846 -1154.982151
s(age).5 s(age).6 s(age).7 s(age).8 s(age).9
-324.817013 -1018.225598 274.926370 3106.242791 -783.531680
\(k = 6\): df의 최대값을 6으로 제한
sp = 1000: \(\lambda\) 1000으로 고정, 거의 직선을 의미
family = binomial
family = cox.ph
- weights = status
family = poisson
- exp trans
Poisson 분포의 가정 평균=분산
이 만족하지 않을 때.
family = quasipoisson
www.zarathu.com