TY - JOUR
T1 - Learning smoothing models of copy number profiles using breakpoint annotations
AU - Hocking, Toby Dylan
AU - Schleiermacher, Gudrun
AU - Janoueix-Lerosey, Isabelle
AU - Boeva, Valentina
AU - Cappo, Julie
AU - Delattre, Olivier
AU - Bach, Francis
AU - Vert, Jean Philippe
N1 - Funding Information:
Thanks to Edouard Pauwels for many helpful discussions and comments to simplify the mathematics on an early draft of the paper. This work was supported by Digiteo [DIGITEO-BIOVIZ-2009-25D to T.D.H.]; the European Research Council [SIERRA-ERC-239993 to F.B; SMAC-ERC-280032 to J-P.V.]; the French National Research Agency [ANR-09-BLAN-0051-04 to J-P.V.]; the Annenberg Foundation [to G.S.]; the French Programme Hospitalier de Recherche Clinique [PHRC IC2007-09 to G.S.]; the French National Cancer Institute [INCA-2007-1-RT-4-IC to G.S.]; and the French Anti-Cancer League.
PY - 2013/5/22
Y1 - 2013/5/22
N2 - Background: Many models have been proposed to detect copy number alterations in chromosomal copy number profiles, but it is usually not obvious to decide which is most effective for a given data set. Furthermore, most methods have a smoothing parameter that determines the number of breakpoints and must be chosen using various heuristics.Results: We present three contributions for copy number profile smoothing model selection. First, we propose to select the model and degree of smoothness that maximizes agreement with visual breakpoint region annotations. Second, we develop cross-validation procedures to estimate the error of the trained models. Third, we apply these methods to compare 17 smoothing models on a new database of 575 annotated neuroblastoma copy number profiles, which we make available as a public benchmark for testing new algorithms.Conclusions: Whereas previous studies have been qualitative or limited to simulated data, our annotation-guided approach is quantitative and suggests which algorithms are fastest and most accurate in practice on real data. In the neuroblastoma data, the equivalent pelt.n and cghseg.k methods were the best breakpoint detectors, and exhibited reasonable computation times.
AB - Background: Many models have been proposed to detect copy number alterations in chromosomal copy number profiles, but it is usually not obvious to decide which is most effective for a given data set. Furthermore, most methods have a smoothing parameter that determines the number of breakpoints and must be chosen using various heuristics.Results: We present three contributions for copy number profile smoothing model selection. First, we propose to select the model and degree of smoothness that maximizes agreement with visual breakpoint region annotations. Second, we develop cross-validation procedures to estimate the error of the trained models. Third, we apply these methods to compare 17 smoothing models on a new database of 575 annotated neuroblastoma copy number profiles, which we make available as a public benchmark for testing new algorithms.Conclusions: Whereas previous studies have been qualitative or limited to simulated data, our annotation-guided approach is quantitative and suggests which algorithms are fastest and most accurate in practice on real data. In the neuroblastoma data, the equivalent pelt.n and cghseg.k methods were the best breakpoint detectors, and exhibited reasonable computation times.
UR - http://www.scopus.com/inward/record.url?scp=84878010425&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84878010425&partnerID=8YFLogxK
U2 - 10.1186/1471-2105-14-164
DO - 10.1186/1471-2105-14-164
M3 - Article
C2 - 23697330
AN - SCOPUS:84878010425
SN - 1471-2105
VL - 14
JO - BMC Bioinformatics
JF - BMC Bioinformatics
IS - 1
M1 - 164
ER -