TY - JOUR
T1 - gfpop
T2 - An R Package for Univariate Graph-Constrained Change-Point Detection
AU - Runge, Vincent
AU - Hocking, Toby Dylan
AU - Romano, Gaetano
AU - Afghah, Fatemeh
AU - Fearnhead, Paul
AU - Rigaill, Guillem
N1 - Publisher Copyright:
© 2023, American Statistical Association. All rights reserved.
PY - 2023
Y1 - 2023
N2 - In a world with data that change rapidly and abruptly, it is important to detect those changes accurately. In this paper we describe an R package implementing a generalized version of an algorithm recently proposed by Hocking, Rigaill, Fearnhead, and Bourque (2020) for penalized maximum likelihood inference of constrained multiple change-point models. This algorithm can be used to pinpoint the precise locations of abrupt changes in large data sequences. There are many application domains for such models, such as medicine, neuroscience or genomics. Often, practitioners have prior knowledge about the changes they are looking for. For example in genomic data, biologists sometimes expect peaks: up changes followed by down changes. Taking advantage of such prior information can substantially improve the accuracy with which we can detect and estimate changes. Hocking et al. (2020) described a graph framework to encode many examples of such prior information and a generic algorithm to infer the optimal model parameters, but implemented the algorithm for just a single scenario. We present the gfpop package that implements the algorithm in a generic manner in R/C++. gfpop works for a user-defined graph that can encode prior assumptions about the types of changes that are possible and implements several loss functions (Gauss, Poisson, binomial, biweight, and Huber). We then illustrate the use of gfpop on isotonic simulations and several applications in biology. For a number of graphs the algorithm runs in a matter of seconds or minutes for 105 data points.
AB - In a world with data that change rapidly and abruptly, it is important to detect those changes accurately. In this paper we describe an R package implementing a generalized version of an algorithm recently proposed by Hocking, Rigaill, Fearnhead, and Bourque (2020) for penalized maximum likelihood inference of constrained multiple change-point models. This algorithm can be used to pinpoint the precise locations of abrupt changes in large data sequences. There are many application domains for such models, such as medicine, neuroscience or genomics. Often, practitioners have prior knowledge about the changes they are looking for. For example in genomic data, biologists sometimes expect peaks: up changes followed by down changes. Taking advantage of such prior information can substantially improve the accuracy with which we can detect and estimate changes. Hocking et al. (2020) described a graph framework to encode many examples of such prior information and a generic algorithm to infer the optimal model parameters, but implemented the algorithm for just a single scenario. We present the gfpop package that implements the algorithm in a generic manner in R/C++. gfpop works for a user-defined graph that can encode prior assumptions about the types of changes that are possible and implements several loss functions (Gauss, Poisson, binomial, biweight, and Huber). We then illustrate the use of gfpop on isotonic simulations and several applications in biology. For a number of graphs the algorithm runs in a matter of seconds or minutes for 105 data points.
KW - change-point detection
KW - constrained inference
KW - dynamic programming
KW - maximum likelihood inference
KW - robust losses
UR - http://www.scopus.com/inward/record.url?scp=85152666905&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85152666905&partnerID=8YFLogxK
U2 - 10.18637/jss.v106.i06
DO - 10.18637/jss.v106.i06
M3 - Article
AN - SCOPUS:85152666905
SN - 1548-7660
VL - 106
JO - Journal of Statistical Software
JF - Journal of Statistical Software
ER -