Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 209860

How to get X axis on Fig 5.3 in Elements of Statistical Learning?

$
0
0

I am trying to make figure 5.3 in Elements of statistical learning using the South African Heart Disease data. I have gotten to a point where I have been able to get the pointwise variances and plot it against "sbp" of the model predictor variables thus far. In part, because since my pointwise variance vector is of dimension 462 by 1 , the only other things that could plot the point wise variance is one of the predictor variables, in my case "sbp" which contains the same number of data points 462. With that, I get a plot that looks like this: 

enter image description here

Eye balling this plot, I can see knots at 33% (123) and 66%(162) for the cubic spline model with df=6-1 (Note:-1 because there is an intercept) in agreement to the fig 5.3 with knots at 0.33 and 0.66, as explained in the description from figure 5.3. I think I am getting close but my problem now is that this is not being plotted against X from 0 to 1 with 50 points like the figure explains. Here's what the figure should display in principle:

enter image description here

The code for my figure is done in r and is curently only attempting the cubic spline model. If I wanted to do the natural cubic spline I would just replace the bs() function used for the cubic spline with ns() function to build the required H matrix of basis functions. Please see code showing how I am constructing the Cubic Spline model:

 library(sqldf)
 library(splines)
 library(gam)
 library(mgcv)
 SAheart <- read.table("SAheart.data", 
                sep =  ",", head=T,row.names = 1)

 SAheart.var<-sqldf("select    sbp,tobacco,ldl,famhist,obesity,alcohol,age,chd from SAheart")
 attach(SAheart.var)
 sbp<-SAheart.var[,1]
 tobacco<-SAheart.var[,2]
 ldl.bsf<-SAheart.var[,3]
 famhist<-SAheart.var[,4]
 obesity<-SAheart.var[,5]
 alcohol<-SAheart.var[,6]
 age<-SAheart.var[,7]
 chd<-SAheart.var[,8]

#Ignore these two models since they are simply dummy models for the natural cubic spline and global linear
SAheartGlobalLinear<-gam(chd~ sbp,data=SAheart)
SAheartNaturalCubicSpline<-gam(chd~ns(sbp,df=5),method="REML",data=SAheart)

#SAheartCubicSpline
sbp.bs <- bs(sbp,df=5)
tobacco.bs<-bs(tobacco,df=5)
ldl.bsf.bs<-bs(ldl.bsf,df=5)
famhist<-as.numeric(famhist)-1
obesity.bs<-bs(obesity,df=5)
alcohol.bs<-bs(alcohol,df=5)
age.bs<-bs(age,df=5)
chd.bs<-bs(chd,df=5)

#build required H matrix of basis functions using df=6-1 degrees of freedom
H <-cbind(sbp.bs,tobacco.bs,ldl.bsf.bs,famhist,obesity.bs,age.bs)

#centering the columns of H, intercept column is not centered
#producing another basis of the column space
H<-cbind(rep(1,dim(SAheart)[1]),scale(H,scale=FALSE))
#obtain coefficients with glm.fit
SAheartCubicSpline<-glm.fit(H,chd, family = binomial())
coeff<-SAheartCubicSpline$coefficients
#make W eight matrix 462 by 462
W= diag(SAheartCubicSpline$weights)
#construct covariance matrix Note: I made it two different ways, not sure if it matters
Sigma = solve(t(H)%*%W%*%H)
sigma = (t(H)%*%W%*%H)^-1
#Calculate pointwise variance for one single predictor "sbp"
pw.var<-diag(H[,2:6]%*%Sigma[2:6,2:6]%*%t(H[,2:6]))
#make plot
plot(sbp,pw.var) 

I think I am getting close but my problem now is that, this is not being plotted against X from 0 to 1 with 50 points because my point wise variance vector has 462 points. I wonder how point wise variance against X as an interval of U[0,1] with 50 random points would get you the cubic spline plot as seen in figure 5.3. Also, if possible, I would also like to know how could I also fit the global cubic polynomial and global linear. Otherwise, I completely understand yet I would love to know where I am going wrong in terms of the x-axis from figure 5.3. Thanks in advance!


Viewing all articles
Browse latest Browse all 209860

Trending Articles