1. R tools for HiC data visualizationR tools for HiC data visualization
Nathalie Vialaneix, INRAE/MIATNathalie Vialaneix, INRAE/MIAT
Chrocogen, January 31st 2020Chrocogen, January 31st 2020
1 / 271 / 27
2. First ofall... a bit ofbibliographyFirst ofall... a bit ofbibliography
2 / 272 / 27
3. What did I use to make this presentation?
a github repository with a bunch of references, classified by themes
https://github.com/mdozmorov/HiC_tools#visualization
two reviews on the topic: [Yardimci & Noble, 2017] (5 tools, no R package),
[Ing-Simons & Vaquerizas, CB 2019] (12 tools, 2/3 R packages, half of the
tools are interactive)
bioconductor research tool
and I identi ed...
HiT-C - HiCBricks - DNARchitect (with a shiny interface) - GENOVA - Gviz
and GenomicInteraction - Sushi - HiCeekR (the last one sent by a colleague
the day after I finished my slides)
(+ Pierre's package adjclust that is on CRAN)
3 / 27
4. What did I learn fromthat?
a large number of interactive tools already exist
in another review [Lin et al, WSBM 2019], you can also find tools for 3D
visualization of Hi-C data
however, most tools seem to propose very common approaches for Hi-C
data visualization, even the interactive tools
a problem still remains: find appropriate standards to store the data and
load them into the software (multiple standards currently exist, with no
clear consensus yet)
4 / 27
6. Import and format ofthe di erent tools
HiTC (bioconductor, 7.5 years): (function importC)
input: mandatory: a CSV (tab separated) file with bin pairs and a BED
file describing the bins (chr | start | end | bin nb) outputs of HiC-
Pro
class: HTCexp (for submatrices) or HTClist (for all matrices) with
slots intdata (interaction matrix; can be sparse) and xgi/ygi
(GRanges objects describing the bins) can be used directly
HiCBricks (bioconductor, 1 year): (functions Create_many_Bricks +
Brick_load_matrix / Create_many_Bricks_from_mcool)
input: mandatory TXT (space separated) files with the count matrices
for every chromosome and a BED file describing the bins (chr | start |
end) by order of appearance in the matrix OR .mcool files AND soon
available .hic files
class: BrickContainer that does not incorporate the data
themselves but only information on the chromosomes (names and
lengths) and on files in which the information (bin description and
interactions) is stored. When creating this object, a directory is
created with HDF (Hierarchical Data Format) files with the data in
them
⇒
⇒
6 / 27
7. Import and format ofthe di erent tools
DNA_Rchitect (web shiny interface at
http://shiny.immgen.org/DNARchitect/)
input: TXT file, separated by comma, semicolumn or tabulations, with
the following columns (chrom1 | start1 | end1 | chrom2 | start2 |
end2 | score | samplenumber) BEDPE files
GENOVA (github repository https://github.com/robinweide/GENOVA, not
properly documented and full of bugs): (functions read_bedpe,
read.hicpro.matrix)
input: mandatory: a CSV (tab separated) file with bin pairs and a BED
file describing the bins (chr | start | end | bin nb) outputs of HiC-
Pro OR BEDPE files. It is said that it can handle .cool files OR .hic
files but I haven't found where
class(?): contacts that contains the slots MAT (triplet interaction
matrix), IDX (bin descriptions, BED), CHRS (chr description),
CENTROMERES (location of the centromeres, BED)
⇒
⇒
7 / 27
8. Import and format ofthe di erent tools
GenomicInteractions (bioconductor, 5 years, based on Gviz): (functions
makeGenomicInteractionsFromFile or directly using
GenomicInteractions)
input: mandatory: BEDPE files OR HOMER files (TXT files with 20
columns;
http://homer.ucsd.edu/homer/interactions/HiCinteractions.html)
class: GenomicInteractions that contains two GRanges objects (bin
pairs) and a count object (numeric vector) can be used directly
Sushi (bioconductor, 5.5 years):
input: BEDPE files or interaction matrix (with genomic coordinates in
row/column names) as TXT files. No dedicated import function; data
passed to the package functions as simple data.frame
⇒
8 / 27
9. Import and format ofthe di erent tools
HiCeekR (github repository https://github.com/lucidif/HiCeekR, 1 year,
well documented, shiny application to run locally)
input: BAM file and FASTA reference. Makes all the processing and
creates local files and stores intermediate results (report also created)
adjclust (CRAN, 2 years)
input: a CSV (tab separated) file with bin pairs OR an interaction
matrix OR HTC-exp objects. No dedicated import function; data
passed to the package functions as simple (sparse) matrices
9 / 27
10. Summary
X: possible
XX: tested (by myself)
~: possible but not quite direct
TXT file
(bin pairs)
TXT file
(matrix)
BEDPE .cool .hic custom
HiTC XX ~XX ~X XX
HiCBricks X X X?
DNA_Rchitect X
GENOVA X X X X ?
GenomicInteractions ~XX X XX
Sushi X X X X
adjclust XX XX ~X XX
Only very recent (and still unmature) tools handle HiC specific formats like
.cool and .hic. HiCeekR handles only raw BAM files.
10 / 27
13. Recommandations for heatmaps
whole-genome heatmaps used to highlight genomic rearrangement /
zoomed heatmaps used to highlight TADs and loops
colour coding should scale with log$_10$ rather than linearly and should
be made with a colour scale consisting of only one color to avoid artificial
transitions (also use multiple hues for colorblinds). Two colour scales can
be used to represent a correlation matrix (compartments) or a comparison
between matrices (see below)
comparisons can be made with side by side heatmaps or (better) with a
heatmap of the log$_2$ ratio
linear tracks can be added to heatmaps and in this case triangular
heatmaps should be preferred (the tracks are then placed below)
tools that contain heatmaps: HiTC, HiCBricks, GENOVA, sushi and
adjclust (no heatmaps in DNA_Rchitect or in GenomicInteractions)
13 / 27
14. Features for heatmaps
rectang-
ular
triangular
custom
colors
zoom comparison
linear
tracks
HiTC
XX
(genome)
XX (chr)
log,
pos/neg
col
prior to plot
(start/end)
X (2,
triangular)
X (only
genomic
int.)
HiCBricks X X
X
(palette
and log)
X
(start/end/dist)
X (2)
GENOVA XX X(?)
X (but
limited)
X (2) X (?)
Sushi X X
X
(palette)
X
(start/end/dist)
HiCeekR X X (start/end)
X
(numeric/2)
adjclust XX
XX
(palette
and log)
14 / 27
15. Features for heatmaps
In addition: HiCBricks and adjclust can show TADs on the heatmap (maybe
also GENOVA) and GENOVA can highlight loops with circles on the maps.
15 / 27
17. Critical assessment ofthe tools
The simplest, more complete and nicest visualization function for heatmaps is
in HiCBricks (even if it can not display linear tracks) but unfortunately, the
import format of the tools is rather hard to use.
GENOVA is promising (including many functions to extract features (IS, TADs,
loops, ...) from HiC matrices) but impossible to use at that stage because of the 17 / 27
20. Interactions as arcs (or networks)
interaction_track <- InteractionTrack(maria_90_chr7, name = "HiC",
chromosome = "7")
plotTracks(interaction_track, chromosome = "7", from = 0, to = 50000
20 / 27
21. Interactions as arcs (or networks)
plotTracks(interaction_track, chromosome = "7", from = 0, to = 50000
21 / 27
22. Recommandations for arcs
usefull mainly to superimpose annotations or qualitative/quantitative
tracks (Gviz offers plently of solutions to do so)
but becomes unreadable for large regions and is unable to show the
interaction intensity (a solution would be to threshold the interaction
intensity before)
alternatives display the data as networks (but the genome linear structure
is lost and it is also restricted to very small regions) or as circos plot
(thresholding of interactions to keep only the strongest is mandatory, even
for a single chromosome)
22 / 27
23. Critical assessment oftools
DNA_Rchitect, Sushi and GenomicInteractions display the
interactions as arcs
DNA_Rchitect is interactive but I never managed to use it, even on the
example dataset (two many annotation information is required for a
proper use)
the other two propose approximately the same types of features
(GenomicInteractions is maybe more complete but Sushi easier to
customize)
HiCeekr can represent the data as a(n interactive) network, for a whole
chromosome or a selected region and with/without a threshold for the
edge value
23 / 27
25. Example ofvisualization with annotation tracks
with circlize (a bit sophisticated to use, similar to Gviz)
25 / 27
26. Other (quality control) graphics
in HiCeekR: quality control of the alignment (fragment length
distribution, insert size distributions)
in HiTC: inter/intra interaction barplot, interaction versus distance dot
plot, interaction distribution (histogram) for CIS/TRANS
in GenomicInteractions: inter/intra donut graphs (forget them!),
interaction distribution (histogram but cut; also forget them), donut
graphs with annotation of the interactions
26 / 27
27. References
Ing-Simmons E, Vaquerizas JM (2019) Visualising three-dimensional genome
organisation in two dimensions. Development, 146(19): dev177162.
Lin D, Bonora G, Yardimci GG, Noble WS (2017) Computational methods for
analyzing and modeling genome structure and organization. WIREs Systems
Biology and Medicine, 11: e1435.
Yardimci GG, Noble WS (2017) Software tools for visualizing Hi-C data. Genome
Biology, 18: 26.
27 / 27