Abstract
Joint peak detection is a central problem when comparing samples in epigenomic data anal-ysis, but current algorithms for this task are unsupervised and limited to at most 2 sample types. We propose PeakSegPipeline, a new genome-wide multi-sample peak calling pipeline for epigenomic data sets. It performs peak detection using a constrained maximum likeli-hood segmentation model with essentially only one free parameter that needs to be tuned: The number of peaks. To select the number of peaks, we propose to learn a penalty function based on user-provided labels that indicate genomic regions with or without peaks in specific samples. In comparisons with state-of-The-Art peak detection algorithms, PeakSegPipeline achieves similar or better accuracy, and a more interpretable model with overlapping peaks that occur in exactly the same positions across all samples. Our novel approach is able to learn that predicted peak sizes vary by experiment type.
Original language | English (US) |
---|---|
Pages (from-to) | 367-378 |
Number of pages | 12 |
Journal | Pacific Symposium on Biocomputing |
Volume | 25 |
Issue number | 2020 |
State | Published - 2020 |
Externally published | Yes |
Event | 25th Pacific Symposium on Biocomputing, PSB 2020 - Big Island, United States Duration: Jan 3 2020 → Jan 7 2020 |
Keywords
- ATAC-seq
- ChIP-seq
- Epigenome
- Joint
- Machine Learnin
- Multi-sample
- Peak Detection
- Super-vised
ASJC Scopus subject areas
- Biomedical Engineering
- Computational Theory and Mathematics