mcp: regression with multiple change points

I’ve made an R package, and I’m happy to report that it is likely to arrive on CRAN before Christmas! The last CRAN review asked me to change a comma, so the outlook is good.

mcp has grown quite ambitious and I think it now qualifies as a general-purpose package to analyze change points, superseding the other change point packages in most aspects. You can read a lot more about mcp in the extensive documentation at the mcp website.

This Twitter thread is a more bite-sized introduction:

Working on an R package `mcp` to infer Multiple Change Points. Good progress yesterday:

segments = list(
score ~ 1 + year, # intercept + slope
1 ~ 0 + year, # joined slope
1 ~ 0, # joined plateau
1 ~ 1 # disjoined plateau
)

fit = mcp(data, segments) pic.twitter.com/SY6L2rOiGL
— Jonas K. Lindeløv (@jonaslindeloev) October 8, 2019

Followed by this update:

mcp 0.2 is up and a CRAN submission is imminent! Let me introduce the new in this thread. (1/n)https://t.co/H2WrANjh1q #rstats
— Jonas K. Lindeløv (@jonaslindeloev) November 29, 2019

History

mcp is the consequence of a long sequence of events:

Looking at Complex Span data from a large project I did with researchers from Aarhus University, I saw strong performance discontinuities at a single-person level.
I then learned that performance discontinuities play an important role in the estimation of (human) working memory capacity.
I then learned that people mostly use ad-hoc methods to extract such points, including eye-balling graphs. That made me nervous.
I then remembered a very small example from the book Bayesian Cognitive Modeling on inferring change points. I remember it because I was puzzled that Gibbs samplers could sample such points effectively.
I then applied this to a Poisson model of our data and it worked beautifully. It outperformed all other models of this data.
Around here was the first time I looked for other R packages to identify change points. There are a lot of them, but none could handle the hierarchical model I needed and most of them yielded point estimates rather than posteriors.
As I began writing a paper, decided to apply the model to subitizing as well. However, while subitizing accuracy has a single change point, subitizing reaction times seem to have two or three.
I implemented a three-change-points model, and immediately saw how easy it would be to extend it to N change points. That took around a day. That was the time of the first tweet above.
Without much forethought, I took a brief look into R packages, pkgdown, etc., and before I knew it, it became a package.
I began to see a roadmap for the package around the fall holiday, and spent most evenings there re-factoring so that the unpacking of the formulas was uncoupled from the generation of the JAGS code.
The new architecture was incredibly easy to extend and then things went really fast. In particular, I looked a lot to brms for inspiration on the API. I wanted to make sure that the design could accommodate virtually endless oportunities.
Here we are!

Why “mcp”?

mcp was the first name I came up with. I like it because it means multiple things to me:

Multiple Change Points as in “more than one change point”
Multiple Change Points as in “multiple kinds of models and change points”.
MC for “Markov Chains” to hint at the underlying MCMC sampling. We could give the package the pet name “Markov Chainge Points”.

I think I would have renamed it to “rcp” or “cpr” (“Regression with change points”) if they weren’t taken. “changepoint” is also a great name that’s already taken.