call girls in munirka DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entità biologiche
1. Dipartimento DI
INFORMATICA
Tecniche di data mining per la
caratterizzazione di entità biologiche
Michelangelo Ceci, Corrado Loglisci, Gianvito Pio,
Fabio Fumarola, Pasqua Fabiana Lanotte,
Donato Malerba
BiP-Day 2013
Tecniche di data mining per la caratterizzazione
di entità biologiche
1
2. Dipartimento DI
INFORMATICA
RL1: Discovery of Frequent Syntactic Structures
“....Presenilin mutations have been hypothesised to cause Alzheimer disease either by altering
amyloid precursor protein metabolism or ...”
“... PS mutations cause the same functional consequence as mutations on amyloid precursor
protein ...”
A frequent syntactic structure:
mesh_vb_mesh(T, M1, cause, M2), mesh_vb_mesh(T, M1, cause, M3).
is_a(M1, mutat), is a(M2, protein), is a(M3, amyloid). [frequency:80%]
Application: Integration of the
syntactic structures into
Pubmed search engine
A. Appice, M. Ceci, C. Loglisci: Discovering Informative Syntactic Relationships between Named Entities in Biomedical Literature.
DBKDA 2010:120-125
BiP-Day 2013
Tecniche di data mining per la caratterizzazione
di entità biologiche
2
3. Dipartimento DI
INFORMATICA
RL1: Discovery of Temporal Links
1983
Migraine
1984
Mesh-Terms
...
Mesh-Terms
1997
Magnesium
Deficiency
Migraine → TermA,TermB
TermB,TermC → Term D
Association rules mined
from the literature published
in a time-interval
TermD, Term, → Term F,Term G
Term F, TermG → Magnesium
Deficiency
Application: Generation of hypothesis on the biological associations developed over time
C. Loglisci. Time-based Discovery in Biomedical Literature: Mining Temporal Links. International Journal of Data Analysis
Techniques and Strategies (IJDATS), Vol. 5, No. 2, 2013
C. Loglisci, M. Ceci: Discovering Temporal Bisociations for Linking Concepts over Time. ECML/PKDD 2011:358-373
BiP-Day 2013
Tecniche di data mining per la caratterizzazione
di entità biologiche
3
4. Dipartimento DI
INFORMATICA
RL1: Extraction of Bio-molecular Events
Bio-molecular events are processes which involve and transform biological entities.
They can be formalized as conceptual frames.
The frames are characterized by entities associated to specific roles played in the
event. For instance, for the event catalyse:
catalyse
catalyst
reaction being catalysed
Application: extraction from the literature of the entities involved in an event and
classification of their roles:
“Helicases not only catalyse the disruption of hydrogen boding between
complementary regions of nucleic acids, but also move along nucleic acid strands in a
polar fashion.”
catalyse
Helicases
the disruption of hydrogen boding between complementary
regions of nucleic acids
C. Loglisci, A. Appice, M. Ceci, D. Malerba, F. Esposito: MBlab: Molecular Biodiversity Laboratory. IRCDL 2011:132-135
C. Loglisci, M. Ceci, A. Consiglio, D. D'Elia, G. Grillo, F. Licciulli, D. Malerba, S. Liuni: Functional Analysis and annotation of noncoding RNAs: a Text Mining approach, Decimo Meeting Annuale della Società Italiana di Bioinformatica 2013
BiP-Day 2013
Tecniche di data mining per la caratterizzazione
di entità biologiche
4
5. Dipartimento DI
INFORMATICA
RL2: Discoverying miRNA-mRNA
interaction networks
miRNAs
microRNAs (miRNAs) are small noncoding RNAs acting as post-transcriptional
regulators of gene expression.
mRNAs
miRNAs-mRNAs
networks
Goal: identification of miRNA-mRNA interaction networks through biclustering/co-clustering approaches
• discovery of miRNA regulatory modules/networks
• identification of unknown functional properties
BiP-Day 2013
Tecniche di data mining per la caratterizzazione
di entità biologiche
5
6. Dipartimento DI
INFORMATICA
HOCCLUS2: Hierarchical and Overlapping Co-CLUStering 2
1)
2)
3)
Bottom-up approach, from single miRNA-mRNA interactions
Discovery of (possibly) overlapping biclusters
Hierarchical organization of the discovered biclusters
1)
2)
3)
G. Pio, M. Ceci, D. D'Elia, C. Loglisci, D. Malerba, A Novel Biclustering Algorithm for the Discovery of Meaningful Biological
Correlations between microRNAs and their Target Genes, BMC Bioinformatics 14 (Suppl 7), S8 (2013)
BiP-Day 2013
Tecniche di data mining per la caratterizzazione
di entità biologiche
6
7. Dipartimento DI
INFORMATICA
HOCCLUS2 - Input data
Predicted interactions (mirDip)
Verified interactions (miRTarBase)
o Large datasets
o Context-specific
o High level of noise (false positives)
o Small datasets
A semi-supervised ensemble
learning approach which
learns to combine
the score of different
prediction algorithms
G. Pio, M. Ceci, D. D'Elia, D. Malerba, Integrating microRNA target predictions for the discovery of gene regulatory networks: a
semi-supervised ensemble learning approach, BMC Bioinformatics (in press)
BiP-Day 2013
Tecniche di data mining per la caratterizzazione
di entità biologiche
7
8. Dipartimento DI
INFORMATICA
RL3: Gene Function Hierarchical Multi-label Classification
•
Instances to be classified may belong to multiple classes at the same time.
•
Hierarchical organization of the classes (hierarchical constraint)
Gene function prediction (e.g. FUN or GO)
BiP-Day 2013
Tecniche di data mining per la caratterizzazione
di entità biologiche
8
9. Dipartimento DI
INFORMATICA
DIP Yeast network (PPI network)
(b) Examples
are randomly
arranged along
the border
(a)Non-connected
examples
arranged along
the border
(c) Examples grouped
according to the 1st level
of FUN
BiP-Day 2013
(d) Examples grouped
according to the 2nd
level of FUN
Tecniche di data mining per la caratterizzazione
di entità biologiche
9
10. Dipartimento DI
INFORMATICA
The Basic Idea
•
We develop a tree-based algorithm NHMC (Network Hierarchical Multi-label
Classification) for considering network autocorrelation in the setting of Hierarchical
Multi-label Classification (HMC)
•
It learns Predictive Clustering Trees (PCTs) for HMC and the network is used as
background knowledge during training
•
Clustering is based on autocorrelation: each cluster should contain highly
autocorrelated entities for the considered level
Daniela Stojanova, Michelangelo Ceci, Donato Malerba, Saso Dzeroski: Using PPI network autocorrelation in hierarchical multilabel classification trees for gene function prediction. BMC Bioinformatics 14: 285 (2013)
Daniela Stojanova, Michelangelo Ceci, Donato Malerba, Saso Dzeroski: Learning Hierarchical Multi-label Classification Trees from
Network Data. Discovery Science 2013: 233-248
BiP-Day 2013
Tecniche di data mining per la caratterizzazione
di entità biologiche
10
11. Dipartimento DI
INFORMATICA
RL4: IS-BioBank project
• A framework for:
– enabling the interoperability among different biological data sources
and
– supporting expert users in the complex process of studying of cancer
microenvironments.
• This framework is obtained by extending Connectivity Map with
databases, data repositories, and ontologies.
Michelangelo Ceci, Pietro Hiram Guzzi, Elio Masciari, Mauro Coluccia, Federica Mandreoli, Massimo Mecella, Fabio Fumarola,
Riccardo Martoglia, Wilma Penzo: The IS-BioBank project: a framework for biological data normalization, interoperability, and
mining for cancer microenvironment analysis. SIGHIT Record 2(2): 16-21 (2012)
BiP-Day 2013
Tecniche di data mining per la caratterizzazione
di entità biologiche
11
12. Dipartimento DI
INFORMATICA
IS-BioBank project
• Our goal is to develop a Web delivery system which:
– enables the interoperability among queryable data sources,
– captures the different kinds of relationships that exist among
them,
– reinforces the cooperation of heterogeneous and distributed
data bank sources
– supports the users in the complex process of extraction,
navigation and visualization of the knowledge base
BiP-Day 2013
Tecniche di data mining per la caratterizzazione
di entità biologiche
12