Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Detection of Differentially Interacting Chromatin Regions From Multiple Hi-C Datasets

#bioc2020 workshop

Mikhail Dozmorov

Virginia Commonwealth University

2020-07-30, 10:00-10:55am ET

1 / 15

HiCcompareWorkshop resources

2 / 15

The 3D structure of the genome

  • Human genome is big - ~3.2 billion base pairs
  • ~2 meters (~6ft) of DNA in one cell are packed into the 10μm nucleus
  • ~500 times distance from Earth to Sun in all cells from human body

3 / 15

Chromosome conformation capture technologies

The core strategy in 3D genome mapping is nuclear proximity ligation (Cullen et al., 1993), which allows detection of distant genomic segments residing in close spatial proximity to one another, yet are linearly far away.

Lieberman-Aiden, Erez, Nynke L. van Berkum, Louise Williams, Maxim Imakaev, Tobias Ragoczy, Agnes Telling, Ido Amit, et al. “Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome.” Science, 2009

4 / 15

Hi-C Data as a matrix

  • The genome (chromosome) is split into equally sized regions
  • Data is represented by a symmetric matrix of contacts Cij where entry ij corresponds to the number of times region i comes into contact with region j
  • Off-diagonal data view - increasing distance between interacting regions
  • Power-law decay of interactions with increasing distance

5 / 15

Biases in Hi-C data

  • Hi-C data suffers from many biases: sequence-driven (e.g., mappability, CG content) & technology-driven (e.g., type of restriction enzyme, sequencing platform)

  • Most normalization methods work only on individual Hi-C dataset, one at a time

  • Individual normalization methods do not perform well when the goal is comparison

Lyu, Hongqiang, Erhu Liu, and Zhifang Wu. “Comparison of Normalization Methods for Hi-C Data.” BioTechniques 68, no. 2 (2020)

Zheng, Ye, Peigen Zhou, and Sündüz Keleş. “FreeHi-C Spike-in Simulations for Benchmarking Differential Chromatin Interaction Detection.” Methods, July 2020

6 / 15

Joint Normalization on the MD plot

  • MD plot represents data from two Hi-C matrices on one plot

  • Similar to the MA plot (Bland-Altman plot)

  • Y-axis: Mean differences in interaction frequencies = log2(IF2/IF1)

  • X-axis: Genomic Distance

7 / 15

Joint Loess Normalization of Hi-C Data

Lyu, Hongqiang, Erhu Liu, and Zhifang Wu. “Comparison of Normalization Methods for Hi-C Data.” BioTechniques, 2020

8 / 15

Cyclic loess normalization of multiple Hi-C datasets

Cyclic loess (Ballman et al. 2004) - take each pair of datasets, normalize, repeat until convergence

  1. Choose two out of the N total samples then generate an MD plot
  2. Fit a loess curve f(d) to the MD plot
  3. Subtract f(d)/2 from the first dataset and add f(d)/2 to the second
  4. Repeat until all unique pairs have been compared
  5. Repeat until convergence

Ballman, Karla V., Diane E. Grill, Ann L. Oberg, and Terry M. Therneau. “Faster Cyclic Loess: Normalizing RNA Arrays via Linear Models.” Bioinformatics (Oxford, England) 20, no. 16 (November 1, 2004)

9 / 15

Distance-centric chromatin interaction difference detection

Zheng, Ye, Peigen Zhou, and Sündüz Keleş. “FreeHi-C Spike-in Simulations for Benchmarking Differential Chromatin Interaction Detection.” Methods, 2020

10 / 15

Distance-centric chromatin interaction difference detection

  • Exact test

    • For comparing 2 groups without other covariates
    • Similar to Fisher's exact test
  • GLM Methods

    • For more complex experiments utilize the GLM framework
    • The vector of covariates xi can be linked with μdgj through a log-linear model log(μdgj)=xTiβdg+log(Mdj)
  • Implemented in edgeR R package

11 / 15

Interpretation of differentially interacting chromatin regions (DIRs)

  • Visualization of DIRs. A Manhattan-like plot of DIRs may inform us about abnormalities or reveal chromosome site-specific enrichment of differentially interacting regions

  • Overlap between differentially expressed genes and DIRs. If gene expression measurements are available, differentially expressed genes may be tested for overlap with DIRs - test the link between DIRs and changed gene expression

  • Functional enrichment of genes overlapping DIRs. DIRs may disrupt specific pathways/functions - test whether genes overlapping DIRs are enriched in a canonical pathway or share a common function

12 / 15

Interpretation of differentially interacting chromatin regions (DIRs)

  • Overlap enrichment between TAD boundaries and DIRs. DIRs may correspond to TAD boundaries that are deleted or created - test DIRs for significant overlap with TAD boundaries detected in either condition or only in boundaries changed between the conditions

  • Overlap between DIRs and binding sites. DIRs may correspond to locations where proteins bind with DNA, such as CTCF sites - test for overlap between binding site locations and DIRs.

13 / 15

Summary

14 / 15

HiCcompareWorkshop resources

Get in touch on Twitter @mikhaildozmorov
or by e-mail mdozmorov at vcu dot edu

This research was supported by the American Cancer Society [IRG-14-192-40] and the National Institute of Environmental Health Sciences of the National Institutes of Health [T32ES007334]

15 / 15

HiCcompareWorkshop resources

2 / 15
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow