Many learning algorithms are formulated in terms of finding model parameters which minimize a data-fitting loss function plus a regularizer. When the regularizer involves the ℓ0 pseudo-norm, the resulting regularization path consists of a finite set of models. The fastest existing algorithm for computing the breakpoints in the regularization path is quadratic in the number of models, so it scales poorly to high dimensional problems. We provide new formal proofs that a dynamic programming algorithm can be used to compute the breakpoints in linear time. Our empirical results include analysis of the proposed algorithm in the context of various learning problems (regression, changepoint detection, clustering, matrix factorization). We use a detailed analysis of changepoint detection problems to demonstrate the improved accuracy and speed of our approach relative to grid search and a previous quadratic time algorithm.