# Re: segmented regression resend

Luc 2744 (LWOUTERS@JANBELC1.SSW.JNJ.COM)
09 Jan 1998 09:44:09 EST

I have the impression that the message I sent just before Christmas
didn't come through. That's why I resend it now.

===============

Some time ago, I came across the same problem. An algorithm, due to
Hudson, for the two-line segmented is given by Seber & Wild in
Nonlinear Regression, Wiley 1989, pp.456-457.

The following is a somewhat ugly implementation in S-Plus.

As I recall, the algorithm requires distinct x-values.

Luc Wouters lwouters@janbe.jnj.com
Janssen Research Foundation
Beerse
Belgium

"segmented.curve.fit"<-function(x, y)
{
n <- length(x)
X <- cbind(rep(1, n), x)
XX <- matrix(0, n, 4)
res <- rep(NA, 9)
res1 <- res
res2 <- res
SST <- 1e+038 # Type I minimum
Sol1 <- rep(1e+038, 9)
Sol2 <- Sol1
for(i in 2:(n - 2)) {
first <- (1:n) <= i
f1 <- lm(y[first] ~ x[first])
f2 <- lm(y[!first] ~ x[!first])
ss <- sum(f1\$residuals^2) + sum(f2\$residuals^2)
b1 <- f1\$coefficients
b2 <- f2\$coefficients
a <- (b1[1] - b2[1])/(b2[2] - b1[2])
res <- rbind(res, c(i, ss, a, x[i], x[i + 1], b1, b2))
}
res <- matrix(res[-1, ], ncol = 9)
Type1 <- (res[, 3] > res[, 4]) & (res[, 3] < res[, 5])
if(sum(Type1) > 0) {
res1 <- matrix(res[Type1, ], ncol = 9)
SS <- res1[, 2]
Sol1 <- res1[order(SS)[1], ]
SST <- Sol1[2]
}
Type2 <- (!Type1) & (res[, 2] < SST)
if(sum(Type2) > 0) {
res2 <- matrix(res[Type2, ], ncol = 9)
for(i in 1:sum(Type2)) {
SS1 <- res2[i, 2]
SS2s <- rep(1e+038, 2)
Theta <- matrix(res2[i, 6:9], nrow = 4)
temp <- matrix(1e+038, nrow = 2, ncol = 6)
for(j in 1:2) {
B <- matrix(c(1, res2[i, 3 + j], -1, -
res2[i, 3 + j]), nrow = 1)
XX <- matrix(0, n, 4)
n1 <- sum(x <= res2[i, 3 + j])
if(n - n1 > 1) {
XX[1:n1, 1:2] <- X[1:n1, ]
XX[(n1 + 1):n, 3:4] <- X[(n1 + 1):n, ]
XpXi <- solve(t(XX) %*% XX)
BXpXi <- solve(B %*% XpXi %*% t(B))
Comp2 <- t(Theta) %*% t(B) %*% BXpXi %*%
B %*% Theta
Thetac <- Theta - XpXi %*% t(B) %*% BXpXi
%*% B %*% Theta
temp[j, ] <- c((SS1 + Comp2), res2[i, 3
+ j], t(Thetac))
}
# end if
}
# end for loop j
sel <- order(temp[, 1])
res2[i, c(2, 3, 6:9)] <- temp[sel[1], ]
}
# end for loop i
res2 <- matrix(res2, ncol = 9)
SS <- res2[, 2]
Sol2 <- res2[order(SS)[1], ]
}
# end if Type2 > 0
if(Sol2[2] < SST)
Sol <- Sol2
else Sol <- Sol1
lin <- lm(y ~ x)
ssl <- sum(lin\$residuals^2)
mse <- Sol[2]/(n - 4)
F0 <- (ssl - Sol[2])/(3 * mse)
Prob <- 1 - pf(F0, 3, n - 4)
first <- x <= Sol[3]
XX[first, 1:2] <- X[first, ]
XX[!first, 3:4] <- X[!first, ]
XpXi <- solve(t(XX) %*% XX)
se <- sqrt(mse * diag(XpXi))
Probcoef <- 2 * (1 - pt(Sol[6:9]/se, n - 4))
res <- list(Sol[c(3, 6:9)], se, Probcoef, c(n, Sol[2], mse), c(F0,
Prob), res1, res2,
Type1, Type2)
names(res) <- c("coefficients", "stderr", "prob.coef",
"statistics", "test", "res1",
"res2", "Type1", "Type2")
names(res\$coefficients) <- c("changeover", "intercept1", "slope1",
"intercept2", "slope2")
names(res\$stderr) <- c("se.intercept1", "se.slope1",
"se.intercept2", "se.slope2")
names(res\$prob.coef) <- c("p.intercept1", "p.slope1",
"p.intercept2", "p.slope2")
names(res\$statistics) <- c("n", "SS", "MSE")
names(res\$test) <- c("F", "P-value")
res # result
}

Subject: segmented regression
Author: SSENDER (INTERNET.SSENDER1) at SWIFTGATE
Date: 24/11/97 16:00

(Soft*Switch Central V4L40P1A);
24 Nov 1997 16:00:29 EST
by imail-gw1.jnj.com (8.8.8/8.8.8) with SMTP id QAA03419
for <lwouters@JANBELC1.SSW.jnj.com>; Mon, 24 Nov 1997 16:00:09 -0500
(EST) Received: by utstat; Mon Nov 24 11:12 EST 1997 Received: from
smtpgate.nbs.gov by teal.cr.usgs.gov (SMI-8.6/SMI-SVR4)
id JAA13566; Mon, 24 Nov 1997 09:12:55 -0700
(IMA Internet Exchange 2.1 Enterprise) id 001105E7; Mon, 24 Nov 97 09:12:35
-0700 Mime-Version: 1.0 Date: Mon, 24 Nov 1997 11:12:31 -0700 Message-ID:
segmented regression To: s-news@utstat.toronto.edu Content-Type: text/plain;
charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Description: cc:Mail
note part

Has anyone carried out segmented regression in S-PLUS? I am
interested in finding the parameters (a, b, c, d, and k) that describe
two straight lines
y = a + bx, for x<k and
y = c + dx, for x>k
that intersect at the point k,
i.e., a + bk = c + dk.
This may also be known as two-phase regression or change-point
regression. I understand that this can be solved using maximum
likelihood, but I'm not sure how to program this. If anyone has any
suggestions, tips, or code they are willing to share, I would be most
grateful.

Thanks.

JVA