Abstract
In data sequences measured over space or time, an important problem is accurate detection of abrupt changes. In partially labeled data, it is important to correctly predict presence/absence of changes in positive/negative labeled regions, in both the train and test sets. One existing dynamic programming algorithm is designed for prediction in unlabeled test regions (and ignores the labels in the train set); another is for accurate fitting of train labels (but does not predict changepoints in unlabeled test regions). We resolve these issues by proposing a new optimal changepoint detection model that is guaranteed to fit the labels in the train data, and can also provide predictions of unlabeled changepoints in test data. We propose a new dynamic programming algorithm, Labeled Optimal Partitioning, and we provide a formal proof that it solves the resulting non-convex optimization problem. We provide theoretical and empirical analysis of the time complexity of our algorithm, in terms of the number of labels and the size of the data sequence to segment. Finally, we provide empirical evidence that our algorithm is more accurate than the existing baselines, in terms of train and test label error.
Original language | English (US) |
---|---|
Pages (from-to) | 461-480 |
Number of pages | 20 |
Journal | Computational Statistics |
Volume | 38 |
Issue number | 1 |
DOIs | |
State | Published - Mar 2023 |
Externally published | Yes |
Keywords
- Changepoints
- Constraints
- Labels
- Penalized
- Segmentation
- Supervised
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty
- Computational Mathematics