SlideShare ist ein Scribd-Unternehmen logo
1 von 73
Integrative & Network Analysis of
Transcriptomics and Proteomics Data
Prof. N. Krasnogor
Biology, Neuroscience and Computing (BNC) Research Group
School of Computing Science
Centre for Synthetic Biology and Bioexploitation
Newcastle University

E-mail:
Twitter:
Skype:

Natalio.Krasnogor@Newcastle.ac.uk
@Nkrasnogor
Natalio.Krasnogor
Outline
•

Introduction: goals and data sets

•

ArrayMining.net: tool set for microarray
analysis.
Part I

•

TopoGSA: topological analysis of genes /
proteins networks.

•

PathExpand: inference of missing pathways
links.

•

Enrichnet: network-based enrichment.
Part II

•

Visualisation & Exploration: networks
surfing.
Part III

Gibson G (2003) Microarray Analysis.
PLoS Biol 1(1): e15.
doi:10.1371/journal.pbio.0000015
Introduction

•

Typical problem in biosciences: How to make effective use of
multiple, large-scale data sources?

•

Typical problem in computer science: How to exploit the
strengths of different algorithms?

 GOAL: Develop new (& existing) methods combining diverse
data sources and algorithms
Reference data set
Armstrong et al. Leukemia data set

72 samples and 12,626 genes

•

3 leukemia sub-types: ALL (24), AML (28),
MLL (20)

•

Heat map: 30 most differentially
expressed genes vs. samples

Normalisation: Variance Stabilizing
Normalisation (Huber et al., 2002)

•

samples

Platform: Affymetrix UV95A oligonucleotide
array

•

genes

•

Thresholding/Filtering steps: see
Armstrong et al. (2001, Nat. Genet.)

•

Public access to data set:
http://www.broadinstitute.org/cgi-bin/cancer/da
Main data set

QMC breast cancer microarray data set

Pre-normalized data (log-scale, min: 4.9,
max: 13.3)

•

128 samples and 47,293 genes

•

3 tumour grades: 1 (33), 2 (52), 3 (43)

•

grade1

Platform: Illumina Sentrix Human-6
BeadChips

•

genes

•

Probe level data analysis: Bioconductor
beadarray package

•

Public access to data set:
http://www.ebi.ac.uk/microarray-as/ae
accession number: E-TABM-576

grade 3

Heat map: 30 most differentially
expressed genes vs. samples
(grade 1 and grade 3)
Breast cancer data - difficulties

Breast cancer outcome is hard to predict:

Van‘t Veer et al.

Alon et al.

Golub et al.

Large degree of class-overlap in Breast cancer microarray data, whereas
Leukemia decision boundaries are easy to find (Blazadonakis, 2009).
Data Fusion
Other biological data sources used:
Breast cancer microarray data:
• additional public
data sets: Huang
et al., Veer et al.
• pre-processing:
GC-RMA
Cellular pathway data:
• obtained from GO,
BioCarta, Reactome,
KEGG and InterPro
• total: approx. 3000
pathways (size > 10)

Protein interaction data:
• unweighted binary
interactions (MIPS,
DIP, BIND, HPRD,
IntAct - only human)
• 9392 nodes,
38857 edges
Cancer gene sets:
• mutated genes in
different human
cancer types
(Breast, Liver,...)
• 30 gene sets of
size > 10 genes
Methods overview
Methods overview: ArrayMining & TopoGSA
Part I
Tools for Automated Microarray
Data Analysis

9
Web-tool: ArrayMining.net
What is ArrayMining.net?
ArrayMinining.net is an online microarray
analysis tool set integrating multiple data
sources and algorithms.

www.arraymining.org

Goal: A “swiss knife“ for
microarray analysis
tasks

6 analysis modules:
1. Gene selection
2. Sample clustering
classical
3. Sample classification
4. Gene Set Analysis
new
5. Gene Network Analysis
6. Cross-Study Normalization
Methods Overview

Methods overview: ArrayMining & TopoGSA
ArrayMining.net: Gene selection
Gene selection module
• Applies supervised feature selection algorithms (CFS, eBayes, SAM, etc.)
• Compares multiple algorithms or combines them into an ensemble
• Example: ENSEMBLE feature selection for Armstrong et al. (2001) dataset:
Affymetrix ID

Gene symbol

Gene descriptions – source:

F-statistic

32847_at



MYLK

myosin, light polypeptide kinase

159.59

1389_at



MME

membrane metallo-endopeptidase (neutral endopeptidase, enkephalinase)

137.53

35164_at



WFS1

wolfram syndrome 1 (wolframin)

36239_at



POU2AF1

pou domain, class 2, associating factor 1

116.75

1325_at



SMAD1

smad, mothers against dpp homolog 1 (drosophila)

110.37

963_at



LIG4

ligase iv, dna, atp-dependent

89.77

34168_at



DNTT

deoxynucleotidyltransferase, terminal

89.31

40570_at



FOXO1

forkhead box o1a (rhabdomyosarcoma)

86.89

33412_at



LGALS1

lectin, galactoside-binding, soluble, 1 (galectin 1)

81.31

 previously identified by Armstrong et al.

128

 newly identified
ArrayMining.net: Gene selection
Gene selection module (2): Armstrong et al. dataset




• Automatic generation of box
plots with gene and sample
class annotations
• The first row shows the box
plots for the two best-ranked
newly identified genes in the
Armstrong et al. dataset ()
• The second row shows two
top-ranked previously identified genes ()
• The user can easily compare
and combine the results
from different selection
methods




ArrayMining.net: Examples

genes

Further examples: Gene selection and Clustering module

samples

Automatic generation of heatmaps and PCA Cluster plots (Armstrong et al. dataset)
ArrayMining.net: Examples
Further examples: 3D-ICA and Co-Expression analysis
Sample space:

Gene space:

ALL
AML
MLL

3D Independent Component Analysis plot (left) and the largest connected components
from a gene co-expression network (right) for the Armstrong et al. dataset
ArrayMining.net: In-house data
Apply the tools on new data: QMC Breast cancer data
Expression levels across 3 tumour grades:

STK6

KIF2C

Heat map: 50 most significant genes

MYBL2

AURKb

Box plot: 4 most significant genes
ArrayMining.net: QMC dataset
QMC Breast cancer data set – selected genes
• all top-ranked genes are known or likely to be involved in breast cancer
• the selection is robust with regard to cross-validation cycles and algorithms

PC (gene vs. outcome):

Fold
Change

ESTROGEN RECEPTOR 1

-0.75

0.16

1.6e-20 (1.)

RAS-LIKE, ESTROGEN-REGULATED,
GROWTH INHIBITOR

-0.66

0.46

5.3e-14 (2.)

WD REPEAT DOMAIN 19

-0.66

0.73

1.2e-13 (3.)

CARBONIC ANHYDRASE XII

-0.65

0.28

2.7e-13 (4.)

ARP3 ACTIN-RELATED PROTEIN 3
HOMOLOG (YEAST)

0.64

1.37

9.6e-13 (5.)

TETRATRICOPEPTIDE REPEAT DOMAIN
8

-0.63

0.82

2.2e-12 (6.)

BREAST CANCER MEMBRANE PROTEIN
11

-0.62

0.24

7.1e-12 (7.)

Gene name

Q-value
(Rank)
Methods overview
Methods overview: ArrayMining & TopoGSA
ArrayMining.net: Example
ArrayMining - Class Discovery Analysis module:
•

Motiviation:
Exploiting the synergies between partition-based and hierarchical clustering
algorithms

•

Approach:
Consensus clustering based on the agreement of clustering results for pairs
of objects (details on next slide).
- equivalent to median partition problem (NP-complete)
- Simulated Annealing (SA) has been shown to provide good solutions

•

Our solution:
- Compare SA (Aarts et al. cooling scheme) with thermodynamic SA (TSA)
and fast SA (FSA)  FSA provides fastest convergence
- Initialization: Input clustering with highest agreement to other inputs
ArrayMining.net: Consensus Clustering
ArrayMining‘s consensus clustering approach:
Clustering Agreement:= No. of times pairs of samples are assigned to the
same cluster across all input clusterings
Idea: Reward objects in the same cluster, if they have a high agreement.

Agreement matrix:
all clusterings for
samples i and j

Fitness function:

α:= (max(A)+min(A))/2

Sample 1

Aij := #agreements across

Sample 2
Clustering Methods and Validity Indices
•

Sample clustering methods: 8 different methods considered:
- partition-based: k-Means, PAM, CLARA, SOM, SOTA
- hierarchical: AL-HCL, DIANA, AGNES

•

Scoring & number of clusters selection: 5 validity indices / splitting rules used:
- Silhouette width, Calinski-Harabasz, Dunn, C-index, knn-Connectivity.
Example: Silhouette width
a(i) = avg. distance of obj(i) to all others
in the same cluster
b(i) = avg. distance of obj(i) to all
others in closest distinct cluster

- good validity indices should have: no multivariate normality assumptions (Gower,
1981), small or no bias (Milligan & Cooper, 1985)
•

Standardization: classical (mean 0, stddev. 1) or median absolute deviation
Many Clustering Methods/Validity Indexes

22
Consensus Clustering: example
Example application: QMC breast cancer dataset
•

Separate sub-classes in 84 luminal samples with consensus clustering

•

Input algorithms: k-Means, SOM, SOTA, PAM, HCL, DIANA, HYBRID-HCL

low confidence
(silhouette widths)

best separation
for two clusters
External Validation
Clustering results – external validation (tumour grades)
Measure similarity of
clusterings with the rand
index R:

a, b, c and d are the #pairs
of objects assigned to:
- the same cluster in
both clusterings (a)

Reference
clustering: 3 tumour
grades (low,
medium, high)
Random model
Single clustering
Consensus

- different clusters in
both clusterings (b)
- the same cluster in
clustering 1/2 and
different clusters in
clustering 2/1 (c/d)
- Corrected for chance:
 adjusted rand index

10000 random
clusterings
Methods overview
Methods overview: ArrayMining & TopoGSA
ArrayMining.net: Genes Set Analysis
Extension: Gene set analysis
•
•
•

Expression levels for a single gene are often unreliable
Similar genes might contain complementary information
We want to integrate functional annotation data
samples



pathways

Gene Set Analysis (GSA):
1) Identify sets of functionally
similar genes (GO, KEGG, etc.)
2) Summarize gene sets to “Meta”genes (PCA, MDS, etc.)
3) Apply statistical analysis
(example: Van Andel institute cancer gene sets)

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide
expression profiles. Subramanian et al. PNAS October 25, 2005 vol. 102 no. 43 15545–15550
ArrayMining.net: Examples
Gene Set Analysis module – example analysis
we apply the Gene Set Analysis module to the Armstrong et al. dataset
• with known cancer gene sets the class separation is better than for single genes
•

Heat map for the Armstrong et al. dataset based on pathway meta-genes
Consensus Clustering: example (2)
Combine consensus clustering with gene set analysis
•

Map genes onto Gene Ontology (GO), reduce dimensionality (MDS)

•

Apply same consensus clustering as before on GO-based „meta-genes“

~3 times higher
confidence

better separation
External Validation

Single clustering
Consensus clustering
Consensus
(PAM+SOTA)
10000 random
clusterings
Interim Summary
ArrayMining Integrative Clustering - Summary

•

Consensus clustering (CC) results tend to be similar to or slightly better than the best single clusterings
in terms of adj. rand index and validity indices (but longer runtime)

•

The input clusterings should include diverse methods and exclude similar methods

•

Using gene sets (GS) representing cellular pathways instead of single genes results in better cluster
separation, adj. rand indices and validity indices (annotation data required)

 GS & CC provide improved results, but:
longer runtimes + annotation data required
Part II
Tools for Networks Analysis

31
Networks Biology
Describe network strategies to identify and prioritize gene sets or
Describe network strategies to identify and prioritize gene sets or
functional associations (arising in high throughput experiments)
functional associations (arising in high throughput experiments)
between aagenes/proteins set of interest (target set) and annotated
between genes/proteins set of interest (target set) and annotated
genes/proteins sets (reference set)
genes/proteins sets (reference set)
I will describe some of the functionalities and methods in:

Freely accessible at:
http://www.ncl.ac.uk/csbb/research/resources/

InIn collaboration with:
collaboration with:
Dr. Pawel Widera (Newcastle
Dr. Pawel Widera (Newcastle
University)
University)
Dr. Enrico Glaab (University ofof
Dr. Enrico Glaab (University
Luxembourg ) )
Luxembourg
Dr. Anais Baudot (Unversite d’AixDr. Anais Baudot (Unversite d’AixMarseille)
Marseille)
Prof. Reinhard Schneider (University
Prof. Reinhard Schneider (University
ofof Luxembourg )
Luxembourg )
Prof. Alfonso Valencia (CNIO ) )
Prof. Alfonso Valencia (CNIO
Methods Overview

Methods overview: ArrayMining & TopoGSA
TopoGSA: Network topological
analysis of gene sets
What is TopoGSA?
TopoGSA is a web-application mapping
gene sets onto a comprehensive human
protein interaction network and analysing
their network topological properties.

Two types of analysis:
1. Compare genes within a gene set:
e.g. up- vs. down-regulated genes
2. Compare a gene set against a
database of known gene sets
(e.g. KEGG, BioCarta, GO)
TopoGSA - Methods
TopoGSA computes topological properties for an uploaded gene set
TopoGSA computes topological properties for an uploaded gene set
and matched-size random gene sets
and matched-size random gene sets
•

the degree of each node in the gene set

•

the local clustering coefficient Ci for each node vi in the gene set:

where ki is the degree of vi and ejk is the edge between vj and vk
•

the shortest path length between pairs of nodes vi and vj in the gene set

•

the node betweenness B(v) for each node v in the gene set:

here σst(v) is the number of shortest paths from s to t passing through v
•

the eigenvector centrality for each node in the gene set
Human PPI network  MIPS, DIP, BIND, HPRD and InAct
Human PPI network  MIPS, DIP, BIND, HPRD and InAct
9393 proteins and 38857 interactions
9393 proteins and 38857 interactions
Also for yeast, fly, worm and arabidopsis
Also for yeast, fly, worm and arabidopsis

LEGEND:

Mean node
betweenness

• Cellular processes
• Environmental information processing
• Genetic information processing
• Human diseases
• Metabolism
• Cancer genes

General results:

tering
Mean clus
coefficient

Me
a
pa n sho
th
len rtest
gth

• Metabolic pathways
have high shortest path
lenghts and low betweenness
• Disease pathways and
cancer gene sets tend to
have high betweenness
and small shortest path
lenghts
Main data set
QMC breast cancer microarray data set

Pre-normalized data (log-scale, min: 4.9,
max: 13.3)

•

128 samples and 47,293 genes

•

3 tumour grades: 1 (33), 2 (52), 3 (43)

•

grade1

Platform: Illumina Sentrix Human-6
BeadChips

•

genes

•

Probe level data analysis: Bioconductor
beadarray package

•

Public access to data set:
http://www.ebi.ac.uk/microarray-as/ae
accession number: E-TABM-576

grade 3

Heat map: 30 most differentially
expressed genes vs. samples
(grade 1 and grade 3)
ArrayMining.net: In-house data

Apply the tools on new data: QMC Breast cancer data
www.arraymining.net

www.topogsa.net
Expression levels across 3 tumour grades:

STK6

KIF2C

Heat map: 50 most significant genes

MYBL2

AURKb

Box plot: 4 most significant genes
ArrayMining.net: QMC dataset
QMC Breast cancer data set – selected genes
• all top-ranked genes are known or likely to be involved in breast cancer
• the selection is robust with regard to cross-validation cycles and algorithms

PC (gene vs. outcome):

Fold
Change

ESTROGEN RECEPTOR 1

-0.75

0.16

1.6e-20 (1.)

RAS-LIKE, ESTROGEN-REGULATED,
GROWTH INHIBITOR

-0.66

0.46

5.3e-14 (2.)

WD REPEAT DOMAIN 19

-0.66

0.73

1.2e-13 (3.)

CARBONIC ANHYDRASE XII

-0.65

0.28

2.7e-13 (4.)

ARP3 ACTIN-RELATED PROTEIN 3
HOMOLOG (YEAST)

0.64

1.37

9.6e-13 (5.)

TETRATRICOPEPTIDE REPEAT DOMAIN
8

-0.63

0.82

2.2e-12 (6.)

BREAST CANCER MEMBRANE PROTEIN
11

-0.62

0.24

7.1e-12 (7.)

Gene name

Q-value
(Rank)
ArrayMining  TopoGSA
Topological analysis of Selected Genes
• Results of within-gene-set comparison:
Estrogen receptor 1 gene and apoptosis regulator Bcl2, both up-regulated
in luminal samples, have outstanding network topological properties (higher
betweenness, higher degree, higher centrality) in comparison to other genes.
• Results of comparison against reference databases:
- Metabolic KEGG pathways are most similar to the uploaded gene set in
terms of network topological properties.
- Most similar BioCarta pathways: Cytokine, Differentiation and inflammatory
pathways.
Real-world application of tools sets
ArrayMining identifies RERG as a tumour marker
• RERG (Ras-related and oestrogen-regulated growth-inhibitor) was identified
as a new candidate marker of ER-positive luminal-like breast cancer subtype
• Validation using immunohistochemistry on Tissue Microarrays containing
1,140 invasive breast cancers confirmed RERG‘s utility as a marker gene

TMAs of invasive breast cancer
show strong RERG expression
RERG Protein Expression VS BCSS & DMFI
H.O. Habashy, D.G. Powe, E. Glaab, N. Krasnogor, J.M. Garibaldi, E.A. Rakha, G. Ball, A.R.
Green, and I.O. Ellis. Rerg (ras-related and oestrogen-regulated growth-inhibitor) expression
in breast cancer: A marker of er-positive luminal-like subtype. Breast Cancer Research and
Treatment, (on line first):1-12, 2010.
1.0

0.8

Negative RERG expression

Cumulative Survival

Without adjuvant treatment

Cumulative Survival

Positive RERG expression

0.6

1.0

0.8

0.6

p=0.002

0.4
0

50

100

150

200

250

0

50

BCSS in months

1.0

0.8

200

250

0.6

Negative RERG expression

p= 0.007

0.4
0

50

100

150

200

DMFI in months

Kaplan Meier plot of RERG
protein expression with respect
to BCSS in ER+ only

250

Positive RERG expression
Cumulative Survival

Without Tamoxifen treatment

Cumulative Survival

150

1.0

Positive RERG expression

Kaplan Meier plot of RERG
protein expression with respect
to BCSS in ER+ U ER- cohort

100

BCSS in months

0.8

Negative RERG expression

0.6

p=0.027
0

50

100

150

BCSS in months

200

250
PathExpand: Expanding
pathways and cellular processes

• Utilize same PPI network as in TopoGSA; derived from
– MIPS, DIP, MINT, HPRD and IntAct
– only experimental evidence of binary PPI
– final protein interaction network contained 9392 proteins
(nodes) and 38857 interactions (edges)
• Process Mapping:
– KEGG, Biocarta and Reactome were mapped
– Only 60% of pathways members existed in PPI Network
PPI-based pathway-enlargement
Extension Procedure

Idea: Enlarge pathways by adding genes that are “strongly connected“ to the pathway-nodes or
increase the pathway-“compactness“
••
••

Our procedure extended 159 pathways from
Our procedure extended 159 pathways from
BioCarta, 90 from KEGG and 52 from
BioCarta, 90 from KEGG and 52 from
Reactome.
Reactome.
The pathway sizes increased on average from
The pathway sizes increased on average from
113% to 126% of the original size.
113% to 126% of the original size.

Validation shows:
1)The proteins added are well connected and central in the protein interaction
network
2)The added proteins display gene ontology annotations matching better to the
original cellular pathway/process annotations than random proteins
3)Are enriched in processes known to be related to cellular signalling
4)Our method is able to recover known cellular pathway/process proteins in a
cross-validation experiment

Added cancer gene

Example case: BioCarta BTG family proteins and cell cycle regulation
Black: Original pathway nodes – Green: Nodes added based on connectivity
Pathway enlargment – Example 1
Example: Alzheimer disease pathway
•More than 20 proteins
annotated in our
PPIN
•5 added proteins by the
extension process
•3 known disease
associated
•2 candidates: METTL2B,
TMED10
Pathway enlargment – Example 2
Example: Interleukin signaling pathways
•Complex signaling
system, sharing
intracellular
Cascades
•New regulators
•New crosstalk
proteins
Network-based Functional
Association Ranking
Describe network strategies to identify and prioritize
Describe network strategies to identify and prioritize
functional associations (arising in high throughput
functional associations (arising in high throughput
experiments) between a genes/proteins set of interest
experiments) between a genes/proteins set of interest
(target set) and annotated genes/proteins sets (reference
(target set) and annotated genes/proteins sets (reference
set)
set)
2) Feed

1) Produce

Reference Set
Reference Set
11

Target
Target
Set
Set

3) Filter/Compare/
Overlap/Etc

4) Transfer

Reference Set
Reference Set
nn

5) New Hypothesis/Experiments/Insights

Multiple Approaches for “Enrichment Analysis”:
•Over-representation analysis (ORA)
•Gene set enrichment analysis (GSEA)
•Integrative and modular enrichment analysis (MEA)

Huang, D. et al., Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of
large gene lists. Nucleic Acids Res., 37 (1), 2009
Key limitations of previous
“Enrichment Analysis”:
•ORA techniques often have low discriminative power with scores varying
widely with small changes in the overlap size.
•Functional information captured in the graph structure of a molecular
interaction network connecting the gene/protein sets of interest is
disregarded.
•Genes and proteins in the network neighborhood, in particular those with
missing annotations, are not taken into account.
•The recognition of tissue-specific gene/protein set associations is often
statistically infeasible.
Perturbations in molecular networks disrupt biological pathways and result in human
diseases.

Wang X et al. Briefings in Functional Genomics 2011;10:280-293.
© The Author 2011. Published by Oxford University Press. All rights reserved
Key idea behind EnrichNet
Gene1
Gene1
..
..
..
GeneN
GeneN

Target Set
Target Set

Gene1
Gene1
Target Set . .
Target Set
..
..
GeneN
GeneN

Gene1
Gene1
..
..
..
GeneN
GeneN

∩

∪

Protein1
Protein1
..
..
..
ProteinM
ProteinM

Gene1
Gene1
..
..
Reference Set 1 1 Reference Set 1 1
Reference Set
Reference Set
..
GeneN
GeneN

∪
Protein1
Protein1
..
..
Reference Set nn Reference Set n n
Reference Set
Reference Set
..
ProteinM
ProteinM
What does it do?
Input:
– 10 or more human gene or protein identifiers
– Selection of reference database, e.g., KEGG, BioCarta, WikiPathways, Reactome,
PID, Interpro, GO.
Processing:
–

Maps target and reference sets to genome scale molecular interaction networks
•

Two default ones available

•

User can provide her/his own

– RWR to calculate distance between target and reference sets mapped into the large
network
– Comparison of these scores against a background model

Output:
– Ranking table of reference dataset
•

cellular pathways, processes and complexes

– Network and Tissue (60) specific scores
– For each pathway  interactive visualization of its embedding network
– Zoom, search, highlight, retrieve annotation/topological data
How does it work?
Connected human interactome graph
Connected human interactome graph
derived from STRING 9.0 database
derived from STRING 9.0 database
Edges weighted by the STRING
Edges weighted by the STRING
combined confidence score
combined confidence score
normalized to range [0,1])
normalized to range [0,1])

Distances (~F(Pt)) from target nodes
Distances (~F(Pt)) from target nodes
to reference ones is calculated by aa
to reference ones is calculated by
Random Walk with Restart (RWR)
Random Walk with Restart (RWR)
procedure.
procedure.
These distances are then linked to aa
These distances are then linked to
background model…
background model…
How does it work?
Distances are related to aa
Distances are related to
background model via:
background model via:

• Distance dependent weighting factor also accounts for the supposition that an over-representation of
small distance scores is more likely to reflect strong associations than an over-representation of large
distance scores.
• Classical statistical tests for comparing differences in the centre or shape of two distributions, e.g. the
Mann–Whitney U-test or the Kolmogorov–Smirnov test, are not applicable as they lack a distancedependent weighting.
• Randomly matched-size gene sets do not provide an adequate background model, since their members
can only have similar connectivity properties as pathway-representing gene sets if they are allowed to
significantly overlap with real pathways in the network.
• Tissue specific scores are easily computable by only considering distance scores to nodes
labeled with the tissue in question
Results: EnrichNet scores compared to over-representation
analysis scores
•

Assessed EnrichNet on two gene
sets representing different tumour
types:

•

genes mutated in bladder
and gastric cancer

•

genes associated with
Parkinson’s disease

•

Compared the results with a
conventional ORA on all pathway
databases.

absolute Pearson correlations between
0.50 and 0.95 for the different
Gene set pairs with equal ORA scores can be
datasets compared
differentiated using their Xd-distances

Datasets that share none of their genes/proteins with a pathway of interest (hence cannot be scored with ORA) and those with large
overlap- sizes (hence very similar ORA scores) mostly receive different Xd-scores

Thus enables a more sensitive and comprehensive ranking of gene set pairs
Results: Xd-score ranking for top 20 functional associations between genes
mutated in gastric cancer and pathways in the BioCarta database

Datasets that share none of their genes/proteins with a pathway of interest (hence cannot be scored with ORA) and those with large overlap
sizes (hence very similar ORA scores) mostly receive different Xd-scores

Thus enables a more sensitive and comprehensive ranking of gene set pairs
Results: Comparative validation on benchmark gene
expression data

EnrichNet and ORA scores
(Fisher’s exact test) across
five (reference) microarray
gene expression datasets X
two (target) gene set
collections
Results: Identification of novel functional associations
The network-based Xd-score ranking identifies several new associations
missed by the classical approach (ORA)

We use dataset examples pairs such that:

•target gene sets are all mutated in different diseases without additionally
available expression level data  they cannot be analyzed with “expressionaware” gene set enrichment analysis techniques

•zero or insignificant overlap size  receive low ORA scores (Q-values> 0.05)
but Xd-scores above the significance threshold obtained from the linear
regression fit

•these results point to functional associations that reflect dense networks of
Results: Identification of novel functional associations
•

Largest connected component for the network structure obtained when
comparing the gastric cancer mutated gene set against the pathway
Role of Erk5 (Extracellular signal-related kinase 5) in Neuronal Survival
(h_erk5Pathway) from the BioCarta database, describing a signalling
cascade which induces transcriptional events promoting neuronal survival.

•

These datasets have an intersection of only three genes (HRAS, NRAS &
KRAS) and would therefore not have been considered as significantly
associated ORA.

•

The network-based Xd-score (0.26, which is a significant threshold VS
only 0.08 Q-value in ORA with Fisher test), highlights functional
associations.

•

These

reflect

abundance

of

molecular

interactions

between

the

corresponding proteins for these gene sets and their shared network
neighborhoods.

•

This dense network of interactions corroborates previous findings linking
extracellular signal-related kinases (ERKs) to gastric cancer via an
induction of the putative tumor suppressor gene DDMBT1 (deleted in
Results: Identification of novel functional associations
•

Largest connected component for the network structure obtained
when comparing two datasets, bladder cancer mutated genes
and the genes for the Gene Ontology (GO) term ‘tyrosine
phosphorylation of Stat3

•

These share only a single gene (NF2)  no association can be
inferred from ORA

•

The high Xd-score for this gene set pair (0.80) points to a
functional

association

via

multiple

connecting

molecular

interactions, which is confirmed by the visualization.

•

In agreement with the previous observation that the downregulation of STAT3 phosphorylation by means of silencing the
Rho GTPase CDC42 is linked to the suppression of tumour
growth in bladder cancer.

•

Rho GTPases like CDC42 are known to frequently participate in
carcinogenic processes

•

Their involvement in bladder cancer is also reflected by a high
Xd-score of 0.71 for the GO biological process ‘regulation of Rho
GTPase activity’ (GO:0032319), which also shares only one
Results: Identification of novel functional associations
c)

•

Largest connected component for the network structure
obtained

when

comparing

two

datasets,

genes

implicated in Parkinson’s disease (PD) and the
‘regulation of interleukin-6 biosynthetic process’ from
the Gene Ontology database

•

A significant XD-score (0.77, significance threshold: 0.73)
even when sharing only one gene (IL1B)

•

The corresponding sub-network reveals a dense cluster
of interactions that interlink the PD gene set with the
interleukin-6 pathway.

•

This corroborates previously identified links between PD
and inflammation and reports of elevated levels of
interleukin-6 in the cerebrospinal fluid of PD patients
Results: Evaluation of the tissue specificity of gene set
associations
The network-based Xd-score is capable of computing tissue-specific
association scores

•

Tissue-specific analyses can in principle be realized with enrichment analysis techniques other than
EnrichNet

•

In practice, however, is often infeasible for conventional ORA methods  the subset of genes with
available tissue-specific annotations is too small to obtain reliable over-representation statistics.

•

EnrichNet alleviates this limitation of ORA approaches by taking tissue specificity annotations into account
from all non-overlapping gene/protein pairs that are connected through paths of interactions in a
molecular network.

•

Example: we took a set of genes with known implications in PD and measured the tissue-specific
associations with the high-scoring KEGG ‘Neurodegenerative Diseases’ (hsa01510) pathway  high Xdscores were over-represented in the group of brain tissues, whereas the centre of the Xd- score
Part III
Tools for Networks Visualisation &
Exploration

64
FruitNet
Main question: what is
the regulation
mechanism?
-measure gene expression
over time
-correlate expression
profiles
-connect genes if ρ >
threshold
The Tomato Genome Consortium, "The tomato
genome sequence provides insights into fleshy fruit
evolution", Nature 485, 635-641 (31 May 2012),
doi:10.1038/nature11119

65
FruitNet Nodes

66
FruitNet Edges

67
•

FruitNet [in prep]: work in progress, interactions in
tomato fruit development

•

SeedNet [1]:
interactions in dormant and
germinating Arabidopsis seeds

•

SCoPNet [2]:
predicted network of
interactions in germinating and dormant Arabidopsis
seeds generated with BioHELL

•

EndoNet [3]:
interactions in the micropylar
endosperm of germinating Arabidopsis seed

•

RadNet [3]:
interactions in the lower axis of
germinating Arabidopsis seeds

[1] G.W. Bassel, H. Lan, E. Glaab, D.J. Gibbs, T. Gerjets, N. Krasnogor, A.J. Bonner, M.J. Holdsworth, N.J. Provart,
"Genome-wide network model capturing seed germination reveals coordinated regulation of plant cellular phase transitions",
PNAS, 108(23):9709-9714, June 2011 doi:10.1073/pnas.1100958108
[2] G.W. Bassel, E. Glaab, J. Marquez, M.J. Holdsworth, J. Bacardit, "Functional Network Construction in Arabidopsis Using
Rule-Based Machine Learning on Large-Scale Data Sets", The Plant Cell, 23(9):3101-3116, September 2011
doi:10.1105/tpc.111.088153
[3] B.J.W. Dekkers, S. Pearce, R.P. van Bolderen-Veldkamp, A. Marshall, P. Widera, J. Gilbert, H. Drost, G.W. Bassel, K.
Muller, J.R. King, A.T.A. Wood, I. Grosse, M. Quint, N. Krasnogor, G. Leubner-Metzger, M.J. Holdsworth, L. Bentsink,
"Transcriptional Dynamics of Two Seed Compartments with Opposing Roles in Arabidopsis Seed Germination”, PLANT
PHYSIOLOGY, 163(1):205-215, September 2013 doi:10.1104/pp.113.223511
68
Conclusions
•

Combining algorithms in a sequential and/or parallel fashion can
provide performance improvements and new biological insights

•

Microarray and gene set analysis tasks can be interlinked flexibly
in an (almost) completely automated process
[www.ArrayMining.Net]

•

New analysis types like network-based topology analysis and coexpression analysis complement existing tools [www.
{TopoGSA,PathExpan,EnrichNet}.net]

•

In the case of BC it allowed us to identify candidate genes to
characterise ER+ luminal-like BC.
– RERG gene is a key marker of the luminal BC class
– It can be used to separate distinct prognostic subgroups
Conclusions(I): Feature comparison with similar tools
ArrayMining &
TopoGSA

GEPAS
(Tarraga et al.)

Expression Profiler
(Kapushesky et al.)

Pre-processing:
Image analysis, single- and
dimensionality reduction,
gene name normalization,
cross-study normalization,
covariance-based filtering

Pre-processing:
Image analysis, missing
value imputation, multiple
single study normalization
methods, dimensionality
reduction, ID converter

Pre-processing:
Image analysis, single
study normalization,
missing value imputation,
dimensionality reduction,
advanced data selection

Analysis:
Classification, Clustering,
Gene selection, GSEA,
PCA, ICA, Co-expression
analysis, PPI-topology
analysis, Ensembles/Cons.

Analysis:
Classification, Clustering,
Gene selection, GSEA,
PCA, CGH arrays, Tissue
mining,Text mining, TFbinding site prediction

Analysis:
Clustering, Gene selection,
PCA, Co-expression
analysis (different from
ArrayMining), COA,
Similarity search

Usability/features:
PDF-reports, sortable
ranking tables, data annotation, 2D/3D plots, e-mail
notification, video tutorials

Usability/features:
special tree visualization
(Caat, SotaTree, Newick
Trees), 2D plots, data
annotation (Babelomics),

Usability/features:
Excel export, XML queries,
2D plots, data annotation
(GO, chromosome
location)
Conclusions
•

Example applications of (simple) network-based scoring methodology

•

Illustrate the utility of the approach for identifying novel functional associations between
gene/protein sets

•

Reflect known direct and indirect molecular interactions between their members rather than only
the size of their overlap.

•

Intelligent Visualisation of Networks Communities Possible

B.J.W. Dekkers, S. Pearce, R.P. van Bolderen-Veldkamp, A. Marshall, P. Widera, J. Gilbert, H.G. Drost, G.W. Bassel, K. Muller, J.R. King,
A.T. Wood, I. Grosse, M. Quint, N. Krasnogor, G. Leubner-Metzger, M.J. Holdsworth, and L. Bentsink. Transcriptional dynamics of two seed
compartments with opposing roles in arabidopsis seed germination. Plant Physiology, (to appear, first published online):113.223511, 2013
E. Glaab, J. Bacardit, J.M. Garibaldi, and N. Krasnogor. Using rule-based machine learning for candidate disease gene prioritization and sample
classification of cancer gene expression data. PLoS One, 7(7):e39932, 2012

E. Glaab, A. Baudot, N. Krasnogor, R.Schneider, and A. Valencia. Enrichnet: network-based gene set enrichment analysis. Bioinformatics,
2012. This paper was also accepted as a full oral presentation at the 2012 European Conference on Computational Biology.
G.W. Bassel, H. Lanc, E. Glaa, D.J. Gibbs, T. Gerjets, N. Krasnogor, A.J. Bonner, M.J. Holdsworth, and N.J. Provart. Genome-wide
network model capturing seed germination reveals coordinated rregulation of plant cellular phase transitions. Proceedings of the National
Academy of Sciences of the United States of America (PNAS), 2011.
E. Glaab, A. Baudot, N. Krasnogor, and A. Valencia. Topogsa: network topological gene set analysis. Bioinformatics, 26(9):1271, March
2010.
E. Glaab, A. Baudot, N. Krasnogor, and A. Valencia. Extending pathways and processes using molecular interaction networks to analyse
cancer genome data. BMC Bioinformatics, 11(597), 2010.
H.O. Habashy, D.G. Powe, E. Glaab, N. Krasnogor, J.M. Garibaldi, E.A. Rakha, G. Ball, A.R. Green, and I.O. Ellis. Rerg (ras-related
and oestrogen-regulated growth-inhibitor) expression in breast cancer: A marker of er-positive luminal-like subtype. Breast Cancer
Research and Treatment, (on line first):1-12, 2010.
E. Glaab, J. Garibaldi, and N. Krasnogor. Arraymining: a modular web-application for microarray analysis combining ensemble and
consensus methods with cross-study normalization. BMC Bioinformatics, 10(1)(1):358, 2009.
Availability

Freely accessible at: http://www.ncl.ac.uk/csbb/research/resources/
This work was supported by the UK’s Biotechnology and Biological Sciences Research Council [BB/F01855X/1], the Engineering and
Physical Sciences Research Council (EP/J004111/1).
Acknowledgements
Dr. Pawel Widera (Newcastle University, UK)
Dr. Jaume Bacardit (Newcastle University, UK)
Dr. Enrico Glaab (University of Luxembourg, Luxemburg )
Dr. Anais Baudot (Unversite d’Aix-Marseille, France)
Prof. Reinhard Schneider (University of Luxembourg, Luxemburg )
Prof. Alfonso Valencia (CNIO, Spain )
Prof. Mike Holdsworth (University of Nottingham, UK)
Prof. Graham Seymour (University of Nottingham, UK)
Prof. Doron Lancet (Weizmann Institute of Science, Israel)
Dr. Marylin Safran (Weizmann Institute of Science, Israel)

Weitere ähnliche Inhalte

Was ist angesagt?

Evidence for tissue and stage-specific composition of the ribosome
Evidence for tissue and stage-specific composition of the ribosomeEvidence for tissue and stage-specific composition of the ribosome
Evidence for tissue and stage-specific composition of the ribosomeMichaelBiehl7
 
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...mlaij
 
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...Arinze Akutekwe
 
Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Maulik Kamdar
 
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGA SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
 
Gene Selection for Patient Clustering by Gaussian Mixture Model
Gene Selection for Patient Clustering by Gaussian Mixture ModelGene Selection for Patient Clustering by Gaussian Mixture Model
Gene Selection for Patient Clustering by Gaussian Mixture ModelCSCJournals
 
2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vuploadProf. Wim Van Criekinge
 
Feature Selection Approach based on Firefly Algorithm and Chi-square
Feature Selection Approach based on Firefly Algorithm and Chi-square Feature Selection Approach based on Firefly Algorithm and Chi-square
Feature Selection Approach based on Firefly Algorithm and Chi-square IJECEIAES
 
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Amit Sheth
 
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictionsDeep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictionsValery Tkachenko
 
NetBioSIG2013-Talk Thomas Kelder
NetBioSIG2013-Talk Thomas KelderNetBioSIG2013-Talk Thomas Kelder
NetBioSIG2013-Talk Thomas KelderAlexander Pico
 
CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Projectbutest
 
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...aimsnist
 
Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDatabricks
 
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...Spark Summit
 
Mining Big datasets to create and validate machine learning models
Mining Big datasets to create and validate machine learning modelsMining Big datasets to create and validate machine learning models
Mining Big datasets to create and validate machine learning modelsSean Ekins
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 

Was ist angesagt? (20)

Evidence for tissue and stage-specific composition of the ribosome
Evidence for tissue and stage-specific composition of the ribosomeEvidence for tissue and stage-specific composition of the ribosome
Evidence for tissue and stage-specific composition of the ribosome
 
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
 
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
 
Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...
 
B.3.5
B.3.5B.3.5
B.3.5
 
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGA SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
 
Gene Selection for Patient Clustering by Gaussian Mixture Model
Gene Selection for Patient Clustering by Gaussian Mixture ModelGene Selection for Patient Clustering by Gaussian Mixture Model
Gene Selection for Patient Clustering by Gaussian Mixture Model
 
2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload
 
Feature Selection Approach based on Firefly Algorithm and Chi-square
Feature Selection Approach based on Firefly Algorithm and Chi-square Feature Selection Approach based on Firefly Algorithm and Chi-square
Feature Selection Approach based on Firefly Algorithm and Chi-square
 
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
 
NTU-2019
NTU-2019NTU-2019
NTU-2019
 
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictionsDeep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
 
NetBioSIG2013-Talk Thomas Kelder
NetBioSIG2013-Talk Thomas KelderNetBioSIG2013-Talk Thomas Kelder
NetBioSIG2013-Talk Thomas Kelder
 
CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Project
 
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...
 
Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge Graphs
 
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
 
Mining Big datasets to create and validate machine learning models
Mining Big datasets to create and validate machine learning modelsMining Big datasets to create and validate machine learning models
Mining Big datasets to create and validate machine learning models
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
Cukoo srch
Cukoo srchCukoo srch
Cukoo srch
 

Ähnlich wie Integrative Networks Centric Bioinformatics

Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례mothersafe
 
Design of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer DiseasesDesign of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer DiseasesMohamed Loey
 
Large scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyLarge scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyMaté Ongenaert
 
Multi-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/BioconductorMulti-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/BioconductorLevi Waldron
 
Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchFranciscoJAzuajeG
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...DataScienceConferenc1
 
Pathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer SurveillancePathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer SurveillanceJoel Saltz
 
Medical Segmentation Decathalon
Medical Segmentation DecathalonMedical Segmentation Decathalon
Medical Segmentation Decathalonimgcommcall
 
Data integration lab_meeting
Data integration lab_meetingData integration lab_meeting
Data integration lab_meetingLiangqun Lu
 
2014 Gene expressionmicroarrayclassification usingPCA–BEL.
2014 Gene expressionmicroarrayclassification usingPCA–BEL.2014 Gene expressionmicroarrayclassification usingPCA–BEL.
2014 Gene expressionmicroarrayclassification usingPCA–BEL.Ehsan Lotfi
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
Clustering and Classification of Cancer Data Using Soft Computing Technique
Clustering and Classification of Cancer Data Using Soft Computing Technique Clustering and Classification of Cancer Data Using Soft Computing Technique
Clustering and Classification of Cancer Data Using Soft Computing Technique IOSR Journals
 
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...Enrico Glaab
 
Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Damian R. Mingle, MBA
 
Classification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining TechniquesClassification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining Techniquesinventionjournals
 

Ähnlich wie Integrative Networks Centric Bioinformatics (20)

Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례
 
Design of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer DiseasesDesign of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer Diseases
 
Large scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyLarge scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biology
 
Multi-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/BioconductorMulti-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/Bioconductor
 
Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical research
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
 
Pathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer SurveillancePathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer Surveillance
 
Medical Segmentation Decathalon
Medical Segmentation DecathalonMedical Segmentation Decathalon
Medical Segmentation Decathalon
 
Project_702
Project_702Project_702
Project_702
 
Data integration lab_meeting
Data integration lab_meetingData integration lab_meeting
Data integration lab_meeting
 
Updated proposal powerpoint.pptx
Updated proposal powerpoint.pptxUpdated proposal powerpoint.pptx
Updated proposal powerpoint.pptx
 
2014 Gene expressionmicroarrayclassification usingPCA–BEL.
2014 Gene expressionmicroarrayclassification usingPCA–BEL.2014 Gene expressionmicroarrayclassification usingPCA–BEL.
2014 Gene expressionmicroarrayclassification usingPCA–BEL.
 
May workshop
May workshopMay workshop
May workshop
 
May 15 workshop
May 15  workshopMay 15  workshop
May 15 workshop
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Clustering and Classification of Cancer Data Using Soft Computing Technique
Clustering and Classification of Cancer Data Using Soft Computing Technique Clustering and Classification of Cancer Data Using Soft Computing Technique
Clustering and Classification of Cancer Data Using Soft Computing Technique
 
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
 
Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...
 
Classification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining TechniquesClassification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining Techniques
 

Mehr von Natalio Krasnogor

Designing for Addressability, Bio-orthogonality and Abstraction Scalability a...
Designing for Addressability, Bio-orthogonality and Abstraction Scalability a...Designing for Addressability, Bio-orthogonality and Abstraction Scalability a...
Designing for Addressability, Bio-orthogonality and Abstraction Scalability a...Natalio Krasnogor
 
Pathology is being disrupted by Data Integration, AI & Blockchain
Pathology is being disrupted by Data Integration, AI & BlockchainPathology is being disrupted by Data Integration, AI & Blockchain
Pathology is being disrupted by Data Integration, AI & BlockchainNatalio Krasnogor
 
Biological Apps: Rapidly Converging Technologies for Living Information Proce...
Biological Apps: Rapidly Converging Technologies for Living Information Proce...Biological Apps: Rapidly Converging Technologies for Living Information Proce...
Biological Apps: Rapidly Converging Technologies for Living Information Proce...Natalio Krasnogor
 
Advanced computationalsyntbio
Advanced computationalsyntbioAdvanced computationalsyntbio
Advanced computationalsyntbioNatalio Krasnogor
 
Introduction to biocomputing
 Introduction to biocomputing Introduction to biocomputing
Introduction to biocomputingNatalio Krasnogor
 
Evolvability of Designs and Computation with Porphyrins-based Nano-tiles
Evolvability of Designs and Computation with Porphyrins-based Nano-tilesEvolvability of Designs and Computation with Porphyrins-based Nano-tiles
Evolvability of Designs and Computation with Porphyrins-based Nano-tilesNatalio Krasnogor
 
Plenary Speaker slides at the 2016 International Workshop on Biodesign Automa...
Plenary Speaker slides at the 2016 International Workshop on Biodesign Automa...Plenary Speaker slides at the 2016 International Workshop on Biodesign Automa...
Plenary Speaker slides at the 2016 International Workshop on Biodesign Automa...Natalio Krasnogor
 
An Unorthodox View on Memetic Algorithms
An Unorthodox View on Memetic AlgorithmsAn Unorthodox View on Memetic Algorithms
An Unorthodox View on Memetic AlgorithmsNatalio Krasnogor
 
Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...
Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...
Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...Natalio Krasnogor
 
Towards a Rapid Model Prototyping Strategy for Systems & Synthetic Biology
Towards a Rapid Model Prototyping  Strategy for Systems & Synthetic BiologyTowards a Rapid Model Prototyping  Strategy for Systems & Synthetic Biology
Towards a Rapid Model Prototyping Strategy for Systems & Synthetic BiologyNatalio Krasnogor
 
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...Natalio Krasnogor
 
HUMIES presentation: Evolutionary design of energy functions for protein str...
HUMIES presentation: Evolutionary design of energy functions  for protein str...HUMIES presentation: Evolutionary design of energy functions  for protein str...
HUMIES presentation: Evolutionary design of energy functions for protein str...Natalio Krasnogor
 
Evolutionary Symbolic Discovery for Bioinformatics, Systems and Synthetic Bi...
Evolutionary Symbolic Discovery for Bioinformatics,  Systems and Synthetic Bi...Evolutionary Symbolic Discovery for Bioinformatics,  Systems and Synthetic Bi...
Evolutionary Symbolic Discovery for Bioinformatics, Systems and Synthetic Bi...Natalio Krasnogor
 
Computational Synthetic Biology
Computational Synthetic BiologyComputational Synthetic Biology
Computational Synthetic BiologyNatalio Krasnogor
 
Synthetic Biology - Modeling and Optimisation
Synthetic Biology -  Modeling and OptimisationSynthetic Biology -  Modeling and Optimisation
Synthetic Biology - Modeling and OptimisationNatalio Krasnogor
 
A Genetic Programming Challenge: Evolving the Energy Function for Protein Str...
A Genetic Programming Challenge: Evolving the Energy Function for Protein Str...A Genetic Programming Challenge: Evolving the Energy Function for Protein Str...
A Genetic Programming Challenge: Evolving the Energy Function for Protein Str...Natalio Krasnogor
 
Building Executable Biology Models for Synthetic Biology
Building Executable Biology Models for Synthetic BiologyBuilding Executable Biology Models for Synthetic Biology
Building Executable Biology Models for Synthetic BiologyNatalio Krasnogor
 
Extended Compact Genetic Algorithms and Learning Classifier Systems for Dimen...
Extended Compact Genetic Algorithms and Learning Classifier Systems for Dimen...Extended Compact Genetic Algorithms and Learning Classifier Systems for Dimen...
Extended Compact Genetic Algorithms and Learning Classifier Systems for Dimen...Natalio Krasnogor
 

Mehr von Natalio Krasnogor (20)

Designing for Addressability, Bio-orthogonality and Abstraction Scalability a...
Designing for Addressability, Bio-orthogonality and Abstraction Scalability a...Designing for Addressability, Bio-orthogonality and Abstraction Scalability a...
Designing for Addressability, Bio-orthogonality and Abstraction Scalability a...
 
Pathology is being disrupted by Data Integration, AI & Blockchain
Pathology is being disrupted by Data Integration, AI & BlockchainPathology is being disrupted by Data Integration, AI & Blockchain
Pathology is being disrupted by Data Integration, AI & Blockchain
 
Biological Apps: Rapidly Converging Technologies for Living Information Proce...
Biological Apps: Rapidly Converging Technologies for Living Information Proce...Biological Apps: Rapidly Converging Technologies for Living Information Proce...
Biological Apps: Rapidly Converging Technologies for Living Information Proce...
 
DNA data-structure
DNA data-structureDNA data-structure
DNA data-structure
 
Advanced computationalsyntbio
Advanced computationalsyntbioAdvanced computationalsyntbio
Advanced computationalsyntbio
 
The Infobiotics workbench
The Infobiotics workbenchThe Infobiotics workbench
The Infobiotics workbench
 
Introduction to biocomputing
 Introduction to biocomputing Introduction to biocomputing
Introduction to biocomputing
 
Evolvability of Designs and Computation with Porphyrins-based Nano-tiles
Evolvability of Designs and Computation with Porphyrins-based Nano-tilesEvolvability of Designs and Computation with Porphyrins-based Nano-tiles
Evolvability of Designs and Computation with Porphyrins-based Nano-tiles
 
Plenary Speaker slides at the 2016 International Workshop on Biodesign Automa...
Plenary Speaker slides at the 2016 International Workshop on Biodesign Automa...Plenary Speaker slides at the 2016 International Workshop on Biodesign Automa...
Plenary Speaker slides at the 2016 International Workshop on Biodesign Automa...
 
An Unorthodox View on Memetic Algorithms
An Unorthodox View on Memetic AlgorithmsAn Unorthodox View on Memetic Algorithms
An Unorthodox View on Memetic Algorithms
 
Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...
Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...
Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...
 
Towards a Rapid Model Prototyping Strategy for Systems & Synthetic Biology
Towards a Rapid Model Prototyping  Strategy for Systems & Synthetic BiologyTowards a Rapid Model Prototyping  Strategy for Systems & Synthetic Biology
Towards a Rapid Model Prototyping Strategy for Systems & Synthetic Biology
 
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
 
HUMIES presentation: Evolutionary design of energy functions for protein str...
HUMIES presentation: Evolutionary design of energy functions  for protein str...HUMIES presentation: Evolutionary design of energy functions  for protein str...
HUMIES presentation: Evolutionary design of energy functions for protein str...
 
Evolutionary Symbolic Discovery for Bioinformatics, Systems and Synthetic Bi...
Evolutionary Symbolic Discovery for Bioinformatics,  Systems and Synthetic Bi...Evolutionary Symbolic Discovery for Bioinformatics,  Systems and Synthetic Bi...
Evolutionary Symbolic Discovery for Bioinformatics, Systems and Synthetic Bi...
 
Computational Synthetic Biology
Computational Synthetic BiologyComputational Synthetic Biology
Computational Synthetic Biology
 
Synthetic Biology - Modeling and Optimisation
Synthetic Biology -  Modeling and OptimisationSynthetic Biology -  Modeling and Optimisation
Synthetic Biology - Modeling and Optimisation
 
A Genetic Programming Challenge: Evolving the Energy Function for Protein Str...
A Genetic Programming Challenge: Evolving the Energy Function for Protein Str...A Genetic Programming Challenge: Evolving the Energy Function for Protein Str...
A Genetic Programming Challenge: Evolving the Energy Function for Protein Str...
 
Building Executable Biology Models for Synthetic Biology
Building Executable Biology Models for Synthetic BiologyBuilding Executable Biology Models for Synthetic Biology
Building Executable Biology Models for Synthetic Biology
 
Extended Compact Genetic Algorithms and Learning Classifier Systems for Dimen...
Extended Compact Genetic Algorithms and Learning Classifier Systems for Dimen...Extended Compact Genetic Algorithms and Learning Classifier Systems for Dimen...
Extended Compact Genetic Algorithms and Learning Classifier Systems for Dimen...
 

Kürzlich hochgeladen

Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxJanEmmanBrigoli
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxruthvilladarez
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxRosabel UA
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 

Kürzlich hochgeladen (20)

Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptx
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptx
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 

Integrative Networks Centric Bioinformatics

  • 1. Integrative & Network Analysis of Transcriptomics and Proteomics Data Prof. N. Krasnogor Biology, Neuroscience and Computing (BNC) Research Group School of Computing Science Centre for Synthetic Biology and Bioexploitation Newcastle University E-mail: Twitter: Skype: Natalio.Krasnogor@Newcastle.ac.uk @Nkrasnogor Natalio.Krasnogor
  • 2. Outline • Introduction: goals and data sets • ArrayMining.net: tool set for microarray analysis. Part I • TopoGSA: topological analysis of genes / proteins networks. • PathExpand: inference of missing pathways links. • Enrichnet: network-based enrichment. Part II • Visualisation & Exploration: networks surfing. Part III Gibson G (2003) Microarray Analysis. PLoS Biol 1(1): e15. doi:10.1371/journal.pbio.0000015
  • 3. Introduction • Typical problem in biosciences: How to make effective use of multiple, large-scale data sources? • Typical problem in computer science: How to exploit the strengths of different algorithms?  GOAL: Develop new (& existing) methods combining diverse data sources and algorithms
  • 4. Reference data set Armstrong et al. Leukemia data set 72 samples and 12,626 genes • 3 leukemia sub-types: ALL (24), AML (28), MLL (20) • Heat map: 30 most differentially expressed genes vs. samples Normalisation: Variance Stabilizing Normalisation (Huber et al., 2002) • samples Platform: Affymetrix UV95A oligonucleotide array • genes • Thresholding/Filtering steps: see Armstrong et al. (2001, Nat. Genet.) • Public access to data set: http://www.broadinstitute.org/cgi-bin/cancer/da
  • 5. Main data set QMC breast cancer microarray data set Pre-normalized data (log-scale, min: 4.9, max: 13.3) • 128 samples and 47,293 genes • 3 tumour grades: 1 (33), 2 (52), 3 (43) • grade1 Platform: Illumina Sentrix Human-6 BeadChips • genes • Probe level data analysis: Bioconductor beadarray package • Public access to data set: http://www.ebi.ac.uk/microarray-as/ae accession number: E-TABM-576 grade 3 Heat map: 30 most differentially expressed genes vs. samples (grade 1 and grade 3)
  • 6. Breast cancer data - difficulties Breast cancer outcome is hard to predict: Van‘t Veer et al. Alon et al. Golub et al. Large degree of class-overlap in Breast cancer microarray data, whereas Leukemia decision boundaries are easy to find (Blazadonakis, 2009).
  • 7. Data Fusion Other biological data sources used: Breast cancer microarray data: • additional public data sets: Huang et al., Veer et al. • pre-processing: GC-RMA Cellular pathway data: • obtained from GO, BioCarta, Reactome, KEGG and InterPro • total: approx. 3000 pathways (size > 10) Protein interaction data: • unweighted binary interactions (MIPS, DIP, BIND, HPRD, IntAct - only human) • 9392 nodes, 38857 edges Cancer gene sets: • mutated genes in different human cancer types (Breast, Liver,...) • 30 gene sets of size > 10 genes
  • 8. Methods overview Methods overview: ArrayMining & TopoGSA
  • 9. Part I Tools for Automated Microarray Data Analysis 9
  • 10. Web-tool: ArrayMining.net What is ArrayMining.net? ArrayMinining.net is an online microarray analysis tool set integrating multiple data sources and algorithms. www.arraymining.org Goal: A “swiss knife“ for microarray analysis tasks 6 analysis modules: 1. Gene selection 2. Sample clustering classical 3. Sample classification 4. Gene Set Analysis new 5. Gene Network Analysis 6. Cross-Study Normalization
  • 11. Methods Overview Methods overview: ArrayMining & TopoGSA
  • 12. ArrayMining.net: Gene selection Gene selection module • Applies supervised feature selection algorithms (CFS, eBayes, SAM, etc.) • Compares multiple algorithms or combines them into an ensemble • Example: ENSEMBLE feature selection for Armstrong et al. (2001) dataset: Affymetrix ID Gene symbol Gene descriptions – source: F-statistic 32847_at  MYLK myosin, light polypeptide kinase 159.59 1389_at  MME membrane metallo-endopeptidase (neutral endopeptidase, enkephalinase) 137.53 35164_at  WFS1 wolfram syndrome 1 (wolframin) 36239_at  POU2AF1 pou domain, class 2, associating factor 1 116.75 1325_at  SMAD1 smad, mothers against dpp homolog 1 (drosophila) 110.37 963_at  LIG4 ligase iv, dna, atp-dependent 89.77 34168_at  DNTT deoxynucleotidyltransferase, terminal 89.31 40570_at  FOXO1 forkhead box o1a (rhabdomyosarcoma) 86.89 33412_at  LGALS1 lectin, galactoside-binding, soluble, 1 (galectin 1) 81.31  previously identified by Armstrong et al. 128  newly identified
  • 13. ArrayMining.net: Gene selection Gene selection module (2): Armstrong et al. dataset   • Automatic generation of box plots with gene and sample class annotations • The first row shows the box plots for the two best-ranked newly identified genes in the Armstrong et al. dataset () • The second row shows two top-ranked previously identified genes () • The user can easily compare and combine the results from different selection methods  
  • 14. ArrayMining.net: Examples genes Further examples: Gene selection and Clustering module samples Automatic generation of heatmaps and PCA Cluster plots (Armstrong et al. dataset)
  • 15. ArrayMining.net: Examples Further examples: 3D-ICA and Co-Expression analysis Sample space: Gene space: ALL AML MLL 3D Independent Component Analysis plot (left) and the largest connected components from a gene co-expression network (right) for the Armstrong et al. dataset
  • 16. ArrayMining.net: In-house data Apply the tools on new data: QMC Breast cancer data Expression levels across 3 tumour grades: STK6 KIF2C Heat map: 50 most significant genes MYBL2 AURKb Box plot: 4 most significant genes
  • 17. ArrayMining.net: QMC dataset QMC Breast cancer data set – selected genes • all top-ranked genes are known or likely to be involved in breast cancer • the selection is robust with regard to cross-validation cycles and algorithms PC (gene vs. outcome): Fold Change ESTROGEN RECEPTOR 1 -0.75 0.16 1.6e-20 (1.) RAS-LIKE, ESTROGEN-REGULATED, GROWTH INHIBITOR -0.66 0.46 5.3e-14 (2.) WD REPEAT DOMAIN 19 -0.66 0.73 1.2e-13 (3.) CARBONIC ANHYDRASE XII -0.65 0.28 2.7e-13 (4.) ARP3 ACTIN-RELATED PROTEIN 3 HOMOLOG (YEAST) 0.64 1.37 9.6e-13 (5.) TETRATRICOPEPTIDE REPEAT DOMAIN 8 -0.63 0.82 2.2e-12 (6.) BREAST CANCER MEMBRANE PROTEIN 11 -0.62 0.24 7.1e-12 (7.) Gene name Q-value (Rank)
  • 18. Methods overview Methods overview: ArrayMining & TopoGSA
  • 19. ArrayMining.net: Example ArrayMining - Class Discovery Analysis module: • Motiviation: Exploiting the synergies between partition-based and hierarchical clustering algorithms • Approach: Consensus clustering based on the agreement of clustering results for pairs of objects (details on next slide). - equivalent to median partition problem (NP-complete) - Simulated Annealing (SA) has been shown to provide good solutions • Our solution: - Compare SA (Aarts et al. cooling scheme) with thermodynamic SA (TSA) and fast SA (FSA)  FSA provides fastest convergence - Initialization: Input clustering with highest agreement to other inputs
  • 20. ArrayMining.net: Consensus Clustering ArrayMining‘s consensus clustering approach: Clustering Agreement:= No. of times pairs of samples are assigned to the same cluster across all input clusterings Idea: Reward objects in the same cluster, if they have a high agreement. Agreement matrix: all clusterings for samples i and j Fitness function: α:= (max(A)+min(A))/2 Sample 1 Aij := #agreements across Sample 2
  • 21. Clustering Methods and Validity Indices • Sample clustering methods: 8 different methods considered: - partition-based: k-Means, PAM, CLARA, SOM, SOTA - hierarchical: AL-HCL, DIANA, AGNES • Scoring & number of clusters selection: 5 validity indices / splitting rules used: - Silhouette width, Calinski-Harabasz, Dunn, C-index, knn-Connectivity. Example: Silhouette width a(i) = avg. distance of obj(i) to all others in the same cluster b(i) = avg. distance of obj(i) to all others in closest distinct cluster - good validity indices should have: no multivariate normality assumptions (Gower, 1981), small or no bias (Milligan & Cooper, 1985) • Standardization: classical (mean 0, stddev. 1) or median absolute deviation
  • 23. Consensus Clustering: example Example application: QMC breast cancer dataset • Separate sub-classes in 84 luminal samples with consensus clustering • Input algorithms: k-Means, SOM, SOTA, PAM, HCL, DIANA, HYBRID-HCL low confidence (silhouette widths) best separation for two clusters
  • 24. External Validation Clustering results – external validation (tumour grades) Measure similarity of clusterings with the rand index R: a, b, c and d are the #pairs of objects assigned to: - the same cluster in both clusterings (a) Reference clustering: 3 tumour grades (low, medium, high) Random model Single clustering Consensus - different clusters in both clusterings (b) - the same cluster in clustering 1/2 and different clusters in clustering 2/1 (c/d) - Corrected for chance:  adjusted rand index 10000 random clusterings
  • 25. Methods overview Methods overview: ArrayMining & TopoGSA
  • 26. ArrayMining.net: Genes Set Analysis Extension: Gene set analysis • • • Expression levels for a single gene are often unreliable Similar genes might contain complementary information We want to integrate functional annotation data samples  pathways Gene Set Analysis (GSA): 1) Identify sets of functionally similar genes (GO, KEGG, etc.) 2) Summarize gene sets to “Meta”genes (PCA, MDS, etc.) 3) Apply statistical analysis (example: Van Andel institute cancer gene sets) Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Subramanian et al. PNAS October 25, 2005 vol. 102 no. 43 15545–15550
  • 27. ArrayMining.net: Examples Gene Set Analysis module – example analysis we apply the Gene Set Analysis module to the Armstrong et al. dataset • with known cancer gene sets the class separation is better than for single genes • Heat map for the Armstrong et al. dataset based on pathway meta-genes
  • 28. Consensus Clustering: example (2) Combine consensus clustering with gene set analysis • Map genes onto Gene Ontology (GO), reduce dimensionality (MDS) • Apply same consensus clustering as before on GO-based „meta-genes“ ~3 times higher confidence better separation
  • 29. External Validation Single clustering Consensus clustering Consensus (PAM+SOTA) 10000 random clusterings
  • 30. Interim Summary ArrayMining Integrative Clustering - Summary • Consensus clustering (CC) results tend to be similar to or slightly better than the best single clusterings in terms of adj. rand index and validity indices (but longer runtime) • The input clusterings should include diverse methods and exclude similar methods • Using gene sets (GS) representing cellular pathways instead of single genes results in better cluster separation, adj. rand indices and validity indices (annotation data required)  GS & CC provide improved results, but: longer runtimes + annotation data required
  • 31. Part II Tools for Networks Analysis 31
  • 32. Networks Biology Describe network strategies to identify and prioritize gene sets or Describe network strategies to identify and prioritize gene sets or functional associations (arising in high throughput experiments) functional associations (arising in high throughput experiments) between aagenes/proteins set of interest (target set) and annotated between genes/proteins set of interest (target set) and annotated genes/proteins sets (reference set) genes/proteins sets (reference set) I will describe some of the functionalities and methods in: Freely accessible at: http://www.ncl.ac.uk/csbb/research/resources/ InIn collaboration with: collaboration with: Dr. Pawel Widera (Newcastle Dr. Pawel Widera (Newcastle University) University) Dr. Enrico Glaab (University ofof Dr. Enrico Glaab (University Luxembourg ) ) Luxembourg Dr. Anais Baudot (Unversite d’AixDr. Anais Baudot (Unversite d’AixMarseille) Marseille) Prof. Reinhard Schneider (University Prof. Reinhard Schneider (University ofof Luxembourg ) Luxembourg ) Prof. Alfonso Valencia (CNIO ) ) Prof. Alfonso Valencia (CNIO
  • 33. Methods Overview Methods overview: ArrayMining & TopoGSA
  • 34. TopoGSA: Network topological analysis of gene sets What is TopoGSA? TopoGSA is a web-application mapping gene sets onto a comprehensive human protein interaction network and analysing their network topological properties. Two types of analysis: 1. Compare genes within a gene set: e.g. up- vs. down-regulated genes 2. Compare a gene set against a database of known gene sets (e.g. KEGG, BioCarta, GO)
  • 35. TopoGSA - Methods TopoGSA computes topological properties for an uploaded gene set TopoGSA computes topological properties for an uploaded gene set and matched-size random gene sets and matched-size random gene sets • the degree of each node in the gene set • the local clustering coefficient Ci for each node vi in the gene set: where ki is the degree of vi and ejk is the edge between vj and vk • the shortest path length between pairs of nodes vi and vj in the gene set • the node betweenness B(v) for each node v in the gene set: here σst(v) is the number of shortest paths from s to t passing through v • the eigenvector centrality for each node in the gene set
  • 36. Human PPI network  MIPS, DIP, BIND, HPRD and InAct Human PPI network  MIPS, DIP, BIND, HPRD and InAct 9393 proteins and 38857 interactions 9393 proteins and 38857 interactions Also for yeast, fly, worm and arabidopsis Also for yeast, fly, worm and arabidopsis LEGEND: Mean node betweenness • Cellular processes • Environmental information processing • Genetic information processing • Human diseases • Metabolism • Cancer genes General results: tering Mean clus coefficient Me a pa n sho th len rtest gth • Metabolic pathways have high shortest path lenghts and low betweenness • Disease pathways and cancer gene sets tend to have high betweenness and small shortest path lenghts
  • 37. Main data set QMC breast cancer microarray data set Pre-normalized data (log-scale, min: 4.9, max: 13.3) • 128 samples and 47,293 genes • 3 tumour grades: 1 (33), 2 (52), 3 (43) • grade1 Platform: Illumina Sentrix Human-6 BeadChips • genes • Probe level data analysis: Bioconductor beadarray package • Public access to data set: http://www.ebi.ac.uk/microarray-as/ae accession number: E-TABM-576 grade 3 Heat map: 30 most differentially expressed genes vs. samples (grade 1 and grade 3)
  • 38. ArrayMining.net: In-house data Apply the tools on new data: QMC Breast cancer data www.arraymining.net www.topogsa.net Expression levels across 3 tumour grades: STK6 KIF2C Heat map: 50 most significant genes MYBL2 AURKb Box plot: 4 most significant genes
  • 39. ArrayMining.net: QMC dataset QMC Breast cancer data set – selected genes • all top-ranked genes are known or likely to be involved in breast cancer • the selection is robust with regard to cross-validation cycles and algorithms PC (gene vs. outcome): Fold Change ESTROGEN RECEPTOR 1 -0.75 0.16 1.6e-20 (1.) RAS-LIKE, ESTROGEN-REGULATED, GROWTH INHIBITOR -0.66 0.46 5.3e-14 (2.) WD REPEAT DOMAIN 19 -0.66 0.73 1.2e-13 (3.) CARBONIC ANHYDRASE XII -0.65 0.28 2.7e-13 (4.) ARP3 ACTIN-RELATED PROTEIN 3 HOMOLOG (YEAST) 0.64 1.37 9.6e-13 (5.) TETRATRICOPEPTIDE REPEAT DOMAIN 8 -0.63 0.82 2.2e-12 (6.) BREAST CANCER MEMBRANE PROTEIN 11 -0.62 0.24 7.1e-12 (7.) Gene name Q-value (Rank)
  • 40. ArrayMining  TopoGSA Topological analysis of Selected Genes • Results of within-gene-set comparison: Estrogen receptor 1 gene and apoptosis regulator Bcl2, both up-regulated in luminal samples, have outstanding network topological properties (higher betweenness, higher degree, higher centrality) in comparison to other genes. • Results of comparison against reference databases: - Metabolic KEGG pathways are most similar to the uploaded gene set in terms of network topological properties. - Most similar BioCarta pathways: Cytokine, Differentiation and inflammatory pathways.
  • 41. Real-world application of tools sets ArrayMining identifies RERG as a tumour marker • RERG (Ras-related and oestrogen-regulated growth-inhibitor) was identified as a new candidate marker of ER-positive luminal-like breast cancer subtype • Validation using immunohistochemistry on Tissue Microarrays containing 1,140 invasive breast cancers confirmed RERG‘s utility as a marker gene TMAs of invasive breast cancer show strong RERG expression
  • 42. RERG Protein Expression VS BCSS & DMFI H.O. Habashy, D.G. Powe, E. Glaab, N. Krasnogor, J.M. Garibaldi, E.A. Rakha, G. Ball, A.R. Green, and I.O. Ellis. Rerg (ras-related and oestrogen-regulated growth-inhibitor) expression in breast cancer: A marker of er-positive luminal-like subtype. Breast Cancer Research and Treatment, (on line first):1-12, 2010. 1.0 0.8 Negative RERG expression Cumulative Survival Without adjuvant treatment Cumulative Survival Positive RERG expression 0.6 1.0 0.8 0.6 p=0.002 0.4 0 50 100 150 200 250 0 50 BCSS in months 1.0 0.8 200 250 0.6 Negative RERG expression p= 0.007 0.4 0 50 100 150 200 DMFI in months Kaplan Meier plot of RERG protein expression with respect to BCSS in ER+ only 250 Positive RERG expression Cumulative Survival Without Tamoxifen treatment Cumulative Survival 150 1.0 Positive RERG expression Kaplan Meier plot of RERG protein expression with respect to BCSS in ER+ U ER- cohort 100 BCSS in months 0.8 Negative RERG expression 0.6 p=0.027 0 50 100 150 BCSS in months 200 250
  • 43. PathExpand: Expanding pathways and cellular processes • Utilize same PPI network as in TopoGSA; derived from – MIPS, DIP, MINT, HPRD and IntAct – only experimental evidence of binary PPI – final protein interaction network contained 9392 proteins (nodes) and 38857 interactions (edges) • Process Mapping: – KEGG, Biocarta and Reactome were mapped – Only 60% of pathways members existed in PPI Network
  • 44. PPI-based pathway-enlargement Extension Procedure Idea: Enlarge pathways by adding genes that are “strongly connected“ to the pathway-nodes or increase the pathway-“compactness“
  • 45. •• •• Our procedure extended 159 pathways from Our procedure extended 159 pathways from BioCarta, 90 from KEGG and 52 from BioCarta, 90 from KEGG and 52 from Reactome. Reactome. The pathway sizes increased on average from The pathway sizes increased on average from 113% to 126% of the original size. 113% to 126% of the original size. Validation shows: 1)The proteins added are well connected and central in the protein interaction network 2)The added proteins display gene ontology annotations matching better to the original cellular pathway/process annotations than random proteins 3)Are enriched in processes known to be related to cellular signalling 4)Our method is able to recover known cellular pathway/process proteins in a cross-validation experiment Added cancer gene Example case: BioCarta BTG family proteins and cell cycle regulation Black: Original pathway nodes – Green: Nodes added based on connectivity
  • 46. Pathway enlargment – Example 1 Example: Alzheimer disease pathway •More than 20 proteins annotated in our PPIN •5 added proteins by the extension process •3 known disease associated •2 candidates: METTL2B, TMED10
  • 47. Pathway enlargment – Example 2 Example: Interleukin signaling pathways •Complex signaling system, sharing intracellular Cascades •New regulators •New crosstalk proteins
  • 48. Network-based Functional Association Ranking Describe network strategies to identify and prioritize Describe network strategies to identify and prioritize functional associations (arising in high throughput functional associations (arising in high throughput experiments) between a genes/proteins set of interest experiments) between a genes/proteins set of interest (target set) and annotated genes/proteins sets (reference (target set) and annotated genes/proteins sets (reference set) set)
  • 49. 2) Feed 1) Produce Reference Set Reference Set 11 Target Target Set Set 3) Filter/Compare/ Overlap/Etc 4) Transfer Reference Set Reference Set nn 5) New Hypothesis/Experiments/Insights Multiple Approaches for “Enrichment Analysis”: •Over-representation analysis (ORA) •Gene set enrichment analysis (GSEA) •Integrative and modular enrichment analysis (MEA) Huang, D. et al., Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res., 37 (1), 2009
  • 50. Key limitations of previous “Enrichment Analysis”: •ORA techniques often have low discriminative power with scores varying widely with small changes in the overlap size. •Functional information captured in the graph structure of a molecular interaction network connecting the gene/protein sets of interest is disregarded. •Genes and proteins in the network neighborhood, in particular those with missing annotations, are not taken into account. •The recognition of tissue-specific gene/protein set associations is often statistically infeasible.
  • 51. Perturbations in molecular networks disrupt biological pathways and result in human diseases. Wang X et al. Briefings in Functional Genomics 2011;10:280-293. © The Author 2011. Published by Oxford University Press. All rights reserved
  • 52. Key idea behind EnrichNet Gene1 Gene1 .. .. .. GeneN GeneN Target Set Target Set Gene1 Gene1 Target Set . . Target Set .. .. GeneN GeneN Gene1 Gene1 .. .. .. GeneN GeneN ∩ ∪ Protein1 Protein1 .. .. .. ProteinM ProteinM Gene1 Gene1 .. .. Reference Set 1 1 Reference Set 1 1 Reference Set Reference Set .. GeneN GeneN ∪ Protein1 Protein1 .. .. Reference Set nn Reference Set n n Reference Set Reference Set .. ProteinM ProteinM
  • 53. What does it do? Input: – 10 or more human gene or protein identifiers – Selection of reference database, e.g., KEGG, BioCarta, WikiPathways, Reactome, PID, Interpro, GO. Processing: – Maps target and reference sets to genome scale molecular interaction networks • Two default ones available • User can provide her/his own – RWR to calculate distance between target and reference sets mapped into the large network – Comparison of these scores against a background model Output: – Ranking table of reference dataset • cellular pathways, processes and complexes – Network and Tissue (60) specific scores – For each pathway  interactive visualization of its embedding network – Zoom, search, highlight, retrieve annotation/topological data
  • 54. How does it work? Connected human interactome graph Connected human interactome graph derived from STRING 9.0 database derived from STRING 9.0 database Edges weighted by the STRING Edges weighted by the STRING combined confidence score combined confidence score normalized to range [0,1]) normalized to range [0,1]) Distances (~F(Pt)) from target nodes Distances (~F(Pt)) from target nodes to reference ones is calculated by aa to reference ones is calculated by Random Walk with Restart (RWR) Random Walk with Restart (RWR) procedure. procedure. These distances are then linked to aa These distances are then linked to background model… background model…
  • 55. How does it work? Distances are related to aa Distances are related to background model via: background model via: • Distance dependent weighting factor also accounts for the supposition that an over-representation of small distance scores is more likely to reflect strong associations than an over-representation of large distance scores. • Classical statistical tests for comparing differences in the centre or shape of two distributions, e.g. the Mann–Whitney U-test or the Kolmogorov–Smirnov test, are not applicable as they lack a distancedependent weighting. • Randomly matched-size gene sets do not provide an adequate background model, since their members can only have similar connectivity properties as pathway-representing gene sets if they are allowed to significantly overlap with real pathways in the network. • Tissue specific scores are easily computable by only considering distance scores to nodes labeled with the tissue in question
  • 56. Results: EnrichNet scores compared to over-representation analysis scores • Assessed EnrichNet on two gene sets representing different tumour types: • genes mutated in bladder and gastric cancer • genes associated with Parkinson’s disease • Compared the results with a conventional ORA on all pathway databases. absolute Pearson correlations between 0.50 and 0.95 for the different Gene set pairs with equal ORA scores can be datasets compared differentiated using their Xd-distances Datasets that share none of their genes/proteins with a pathway of interest (hence cannot be scored with ORA) and those with large overlap- sizes (hence very similar ORA scores) mostly receive different Xd-scores Thus enables a more sensitive and comprehensive ranking of gene set pairs
  • 57. Results: Xd-score ranking for top 20 functional associations between genes mutated in gastric cancer and pathways in the BioCarta database Datasets that share none of their genes/proteins with a pathway of interest (hence cannot be scored with ORA) and those with large overlap sizes (hence very similar ORA scores) mostly receive different Xd-scores Thus enables a more sensitive and comprehensive ranking of gene set pairs
  • 58. Results: Comparative validation on benchmark gene expression data EnrichNet and ORA scores (Fisher’s exact test) across five (reference) microarray gene expression datasets X two (target) gene set collections
  • 59. Results: Identification of novel functional associations The network-based Xd-score ranking identifies several new associations missed by the classical approach (ORA) We use dataset examples pairs such that: •target gene sets are all mutated in different diseases without additionally available expression level data  they cannot be analyzed with “expressionaware” gene set enrichment analysis techniques •zero or insignificant overlap size  receive low ORA scores (Q-values> 0.05) but Xd-scores above the significance threshold obtained from the linear regression fit •these results point to functional associations that reflect dense networks of
  • 60. Results: Identification of novel functional associations • Largest connected component for the network structure obtained when comparing the gastric cancer mutated gene set against the pathway Role of Erk5 (Extracellular signal-related kinase 5) in Neuronal Survival (h_erk5Pathway) from the BioCarta database, describing a signalling cascade which induces transcriptional events promoting neuronal survival. • These datasets have an intersection of only three genes (HRAS, NRAS & KRAS) and would therefore not have been considered as significantly associated ORA. • The network-based Xd-score (0.26, which is a significant threshold VS only 0.08 Q-value in ORA with Fisher test), highlights functional associations. • These reflect abundance of molecular interactions between the corresponding proteins for these gene sets and their shared network neighborhoods. • This dense network of interactions corroborates previous findings linking extracellular signal-related kinases (ERKs) to gastric cancer via an induction of the putative tumor suppressor gene DDMBT1 (deleted in
  • 61. Results: Identification of novel functional associations • Largest connected component for the network structure obtained when comparing two datasets, bladder cancer mutated genes and the genes for the Gene Ontology (GO) term ‘tyrosine phosphorylation of Stat3 • These share only a single gene (NF2)  no association can be inferred from ORA • The high Xd-score for this gene set pair (0.80) points to a functional association via multiple connecting molecular interactions, which is confirmed by the visualization. • In agreement with the previous observation that the downregulation of STAT3 phosphorylation by means of silencing the Rho GTPase CDC42 is linked to the suppression of tumour growth in bladder cancer. • Rho GTPases like CDC42 are known to frequently participate in carcinogenic processes • Their involvement in bladder cancer is also reflected by a high Xd-score of 0.71 for the GO biological process ‘regulation of Rho GTPase activity’ (GO:0032319), which also shares only one
  • 62. Results: Identification of novel functional associations c) • Largest connected component for the network structure obtained when comparing two datasets, genes implicated in Parkinson’s disease (PD) and the ‘regulation of interleukin-6 biosynthetic process’ from the Gene Ontology database • A significant XD-score (0.77, significance threshold: 0.73) even when sharing only one gene (IL1B) • The corresponding sub-network reveals a dense cluster of interactions that interlink the PD gene set with the interleukin-6 pathway. • This corroborates previously identified links between PD and inflammation and reports of elevated levels of interleukin-6 in the cerebrospinal fluid of PD patients
  • 63. Results: Evaluation of the tissue specificity of gene set associations The network-based Xd-score is capable of computing tissue-specific association scores • Tissue-specific analyses can in principle be realized with enrichment analysis techniques other than EnrichNet • In practice, however, is often infeasible for conventional ORA methods  the subset of genes with available tissue-specific annotations is too small to obtain reliable over-representation statistics. • EnrichNet alleviates this limitation of ORA approaches by taking tissue specificity annotations into account from all non-overlapping gene/protein pairs that are connected through paths of interactions in a molecular network. • Example: we took a set of genes with known implications in PD and measured the tissue-specific associations with the high-scoring KEGG ‘Neurodegenerative Diseases’ (hsa01510) pathway  high Xdscores were over-represented in the group of brain tissues, whereas the centre of the Xd- score
  • 64. Part III Tools for Networks Visualisation & Exploration 64
  • 65. FruitNet Main question: what is the regulation mechanism? -measure gene expression over time -correlate expression profiles -connect genes if ρ > threshold The Tomato Genome Consortium, "The tomato genome sequence provides insights into fleshy fruit evolution", Nature 485, 635-641 (31 May 2012), doi:10.1038/nature11119 65
  • 68. • FruitNet [in prep]: work in progress, interactions in tomato fruit development • SeedNet [1]: interactions in dormant and germinating Arabidopsis seeds • SCoPNet [2]: predicted network of interactions in germinating and dormant Arabidopsis seeds generated with BioHELL • EndoNet [3]: interactions in the micropylar endosperm of germinating Arabidopsis seed • RadNet [3]: interactions in the lower axis of germinating Arabidopsis seeds [1] G.W. Bassel, H. Lan, E. Glaab, D.J. Gibbs, T. Gerjets, N. Krasnogor, A.J. Bonner, M.J. Holdsworth, N.J. Provart, "Genome-wide network model capturing seed germination reveals coordinated regulation of plant cellular phase transitions", PNAS, 108(23):9709-9714, June 2011 doi:10.1073/pnas.1100958108 [2] G.W. Bassel, E. Glaab, J. Marquez, M.J. Holdsworth, J. Bacardit, "Functional Network Construction in Arabidopsis Using Rule-Based Machine Learning on Large-Scale Data Sets", The Plant Cell, 23(9):3101-3116, September 2011 doi:10.1105/tpc.111.088153 [3] B.J.W. Dekkers, S. Pearce, R.P. van Bolderen-Veldkamp, A. Marshall, P. Widera, J. Gilbert, H. Drost, G.W. Bassel, K. Muller, J.R. King, A.T.A. Wood, I. Grosse, M. Quint, N. Krasnogor, G. Leubner-Metzger, M.J. Holdsworth, L. Bentsink, "Transcriptional Dynamics of Two Seed Compartments with Opposing Roles in Arabidopsis Seed Germination”, PLANT PHYSIOLOGY, 163(1):205-215, September 2013 doi:10.1104/pp.113.223511 68
  • 69. Conclusions • Combining algorithms in a sequential and/or parallel fashion can provide performance improvements and new biological insights • Microarray and gene set analysis tasks can be interlinked flexibly in an (almost) completely automated process [www.ArrayMining.Net] • New analysis types like network-based topology analysis and coexpression analysis complement existing tools [www. {TopoGSA,PathExpan,EnrichNet}.net] • In the case of BC it allowed us to identify candidate genes to characterise ER+ luminal-like BC. – RERG gene is a key marker of the luminal BC class – It can be used to separate distinct prognostic subgroups
  • 70. Conclusions(I): Feature comparison with similar tools ArrayMining & TopoGSA GEPAS (Tarraga et al.) Expression Profiler (Kapushesky et al.) Pre-processing: Image analysis, single- and dimensionality reduction, gene name normalization, cross-study normalization, covariance-based filtering Pre-processing: Image analysis, missing value imputation, multiple single study normalization methods, dimensionality reduction, ID converter Pre-processing: Image analysis, single study normalization, missing value imputation, dimensionality reduction, advanced data selection Analysis: Classification, Clustering, Gene selection, GSEA, PCA, ICA, Co-expression analysis, PPI-topology analysis, Ensembles/Cons. Analysis: Classification, Clustering, Gene selection, GSEA, PCA, CGH arrays, Tissue mining,Text mining, TFbinding site prediction Analysis: Clustering, Gene selection, PCA, Co-expression analysis (different from ArrayMining), COA, Similarity search Usability/features: PDF-reports, sortable ranking tables, data annotation, 2D/3D plots, e-mail notification, video tutorials Usability/features: special tree visualization (Caat, SotaTree, Newick Trees), 2D plots, data annotation (Babelomics), Usability/features: Excel export, XML queries, 2D plots, data annotation (GO, chromosome location)
  • 71. Conclusions • Example applications of (simple) network-based scoring methodology • Illustrate the utility of the approach for identifying novel functional associations between gene/protein sets • Reflect known direct and indirect molecular interactions between their members rather than only the size of their overlap. • Intelligent Visualisation of Networks Communities Possible B.J.W. Dekkers, S. Pearce, R.P. van Bolderen-Veldkamp, A. Marshall, P. Widera, J. Gilbert, H.G. Drost, G.W. Bassel, K. Muller, J.R. King, A.T. Wood, I. Grosse, M. Quint, N. Krasnogor, G. Leubner-Metzger, M.J. Holdsworth, and L. Bentsink. Transcriptional dynamics of two seed compartments with opposing roles in arabidopsis seed germination. Plant Physiology, (to appear, first published online):113.223511, 2013 E. Glaab, J. Bacardit, J.M. Garibaldi, and N. Krasnogor. Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data. PLoS One, 7(7):e39932, 2012 E. Glaab, A. Baudot, N. Krasnogor, R.Schneider, and A. Valencia. Enrichnet: network-based gene set enrichment analysis. Bioinformatics, 2012. This paper was also accepted as a full oral presentation at the 2012 European Conference on Computational Biology. G.W. Bassel, H. Lanc, E. Glaa, D.J. Gibbs, T. Gerjets, N. Krasnogor, A.J. Bonner, M.J. Holdsworth, and N.J. Provart. Genome-wide network model capturing seed germination reveals coordinated rregulation of plant cellular phase transitions. Proceedings of the National Academy of Sciences of the United States of America (PNAS), 2011. E. Glaab, A. Baudot, N. Krasnogor, and A. Valencia. Topogsa: network topological gene set analysis. Bioinformatics, 26(9):1271, March 2010. E. Glaab, A. Baudot, N. Krasnogor, and A. Valencia. Extending pathways and processes using molecular interaction networks to analyse cancer genome data. BMC Bioinformatics, 11(597), 2010. H.O. Habashy, D.G. Powe, E. Glaab, N. Krasnogor, J.M. Garibaldi, E.A. Rakha, G. Ball, A.R. Green, and I.O. Ellis. Rerg (ras-related and oestrogen-regulated growth-inhibitor) expression in breast cancer: A marker of er-positive luminal-like subtype. Breast Cancer Research and Treatment, (on line first):1-12, 2010. E. Glaab, J. Garibaldi, and N. Krasnogor. Arraymining: a modular web-application for microarray analysis combining ensemble and consensus methods with cross-study normalization. BMC Bioinformatics, 10(1)(1):358, 2009.
  • 72. Availability Freely accessible at: http://www.ncl.ac.uk/csbb/research/resources/ This work was supported by the UK’s Biotechnology and Biological Sciences Research Council [BB/F01855X/1], the Engineering and Physical Sciences Research Council (EP/J004111/1).
  • 73. Acknowledgements Dr. Pawel Widera (Newcastle University, UK) Dr. Jaume Bacardit (Newcastle University, UK) Dr. Enrico Glaab (University of Luxembourg, Luxemburg ) Dr. Anais Baudot (Unversite d’Aix-Marseille, France) Prof. Reinhard Schneider (University of Luxembourg, Luxemburg ) Prof. Alfonso Valencia (CNIO, Spain ) Prof. Mike Holdsworth (University of Nottingham, UK) Prof. Graham Seymour (University of Nottingham, UK) Prof. Doron Lancet (Weizmann Institute of Science, Israel) Dr. Marylin Safran (Weizmann Institute of Science, Israel)

Hinweis der Redaktion

  1. .
  2. Red labels represent features that are unique to only one of the tools (however, also the other features are implemented differently). More importantly, ArrayMining combines algorithms using ensembles and consensus clustering. Abbreviations: GSEA = gene set enrichment analysis ICA = Independent component analysis CGH = Comparative genomic hybridization COA = Correspondence analysis TF = Transcription factor CAAT = a hierarchical cluster analyser (abbreviation not explained) Similarity search = tool for identifying co-expressed genes, given a set of genes as a query