Machine learning algorithms for simultaneous supervised detection of peaks in multiple samples and cell types

Toby Dylan Hocking, Guillaume Bourque

Research output: Contribution to journalConference articlepeer-review

2 Scopus citations

Abstract

Joint peak detection is a central problem when comparing samples in epigenomic data anal-ysis, but current algorithms for this task are unsupervised and limited to at most 2 sample types. We propose PeakSegPipeline, a new genome-wide multi-sample peak calling pipeline for epigenomic data sets. It performs peak detection using a constrained maximum likeli-hood segmentation model with essentially only one free parameter that needs to be tuned: The number of peaks. To select the number of peaks, we propose to learn a penalty function based on user-provided labels that indicate genomic regions with or without peaks in specific samples. In comparisons with state-of-The-Art peak detection algorithms, PeakSegPipeline achieves similar or better accuracy, and a more interpretable model with overlapping peaks that occur in exactly the same positions across all samples. Our novel approach is able to learn that predicted peak sizes vary by experiment type.

Original languageEnglish (US)
Pages (from-to)367-378
Number of pages12
JournalPacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
Volume25
Issue number2020
StatePublished - 2020
Event25th Pacific Symposium on Biocomputing, PSB 2020 - Big Island, United States
Duration: Jan 3 2020Jan 7 2020

Keywords

  • ATAC-seq
  • ChIP-seq
  • Epigenome
  • Joint
  • Machine Learnin
  • Multi-sample
  • Peak Detection
  • Super-vised

ASJC Scopus subject areas

  • Biomedical Engineering
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Machine learning algorithms for simultaneous supervised detection of peaks in multiple samples and cell types'. Together they form a unique fingerprint.

Cite this