Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 204742

Segmented linear regression with discontinuous data

$
0
0

I have a dataset that looks to be piecewise linear. I would like to perform a segmented linear regression in R. The issue is that there is a discontinuity at the breakpoint. By using some pieces of code from this question I managed to get something, but I am not satisfied.

Dataset

Here is a dummy dataset.

equation of the dataset

NB = 100
A1 = 2 # coeff for first part
A2 = 1 # coeff for second part
B1 = 0 # intercept for first part
B2 = 300 # intercept for second part
df = data.frame(n=1:NB)
df$n = sample(500, size=NB, replace=TRUE)
df$noise = sample(20, size=NB, replace=TRUE)-10
my_func <- function(n, noise) {
    if(n < 100) {
        return(A1*n+B1 + noise)
    }
    else {
        return(A2*n+B2 + noise)
    }
}
df$fn = mapply(my_func, df$n, df$noise)

Using segmented package

This is quite straightforward, we simply perform a classical linear regression and give it to segmented.

library(segmented)
library(ggplot2)
model_segmented = segmented(lm(fn~n, data=df), seg.Z = ~ n)
predict_segmented = data.frame(n = df$n, fn = broken.line(model_segmented)$fit)
ggplot(df, aes(x = n, y = fn)) +
    geom_point() + geom_line(data = predict_segmented, color = 'blue')

Gives:

plot of the raw data and its linear regression using segmented

Obviously, segmented expects the data to be continuous. It is not the case here, so the regression is not correct.

“Manual” method

This method is more tedious. First, we compute the break-point by trying all the possible break points and keeping the one which yields the lowest residual. Then, we add a new factor in the linear regression, which tells if the predictor variable is greater or lower than this breakpoint.

# Computation of the break-point
Break<-sort(unique(df$n))
Break<-Break[2:(length(Break)-1)]
d<-numeric(length(Break))
for (i in 1:length(Break)) {
    model_manual<-lm(fn~(n<Break[i])*n + (n>=Break[i])*n, data=df)
    d[i]<-summary(model_manual)[[6]]
}
breakpoint = Break[which.min(d)]

# Linear regression using this break-point
df$group = df$n >= breakpoint
model_manual<-lm(fn~n*group, data=df)
dat_pred = data.frame(n = df$n, fn = predict(model_manual, df))
ggplot(df, aes(x = n, y = fn)) +
    geom_point() +
    geom_line(data=dat_pred[dat_pred$n < breakpoint,], color = 'blue') +
    geom_line(data=dat_pred[dat_pred$n >= breakpoint,], color = 'blue')

Gives:

enter image description here

Here, the regression is great.

Question

Is there a better way to achieve this goal? Can the segmented package take discontinuous data, or is there a package that can do this?

My concern is that the second method is a bit long and not very readable.


Viewing all articles
Browse latest Browse all 204742

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>