M&A Integration Best Practices From The Front, What Works and What Doesn’t
Ähnlich wie INDUS: A System for Information Integration and Knowledge Acquisition from Autonomous, Distributed and Semantically Heterogeneous Data Sources
Ähnlich wie INDUS: A System for Information Integration and Knowledge Acquisition from Autonomous, Distributed and Semantically Heterogeneous Data Sources (20)
How to Troubleshoot Apps for the Modern Connected Worker
INDUS: A System for Information Integration and Knowledge Acquisition from Autonomous, Distributed and Semantically Heterogeneous Data Sources
1. INDUS: A System for Information Integration and Knowledge Acquisition from Autonomous, Distributed and Semantically Heterogeneous Data Sources Jie Bao, Doina Caragea, Jyotishman Pathak, Adrian Silvescu, Carson Andorf, Changhui Yan, Drena Dobbs and Vasant Honavar June 28, 2005
5. Semantically Heterogeneous Data Sources D 1 D 2 Aspartyl/asparaginyl beta-hydroxylase Beta-adrenergic receptor kinase 2 Protein Name 1.14.11.16 Peptide-aspartate beta-dioxygenase TPR TPR_REGION TPR MAQRKNAKSS GNSSSSGSGS … Q12797 2.7.1.126 Beta-adrenergic receptor kinase RGS PROT_KIN_DOM PH_DOMAIN MADLEAVLAD VSYLMAMEKS … P35626 EC Number Prosite Motifs Protein Sequence Protein ID RIIa HSP70 Pfam Domains 415 692 Length BCY1 SSE1 Gene 16.19.01 cyclic nucleotide binding (cAMP, cGMP, etc.) VSSLPKESQA ELQLFQNEIN … P07278 16.01 protein binding STPFGLDLGN NNSVLAVARN … P32589 MIPS Funcat AA Sequence Accession Number AN
6.
7.
8.
9.
10.
11.
12.
13. Semantically Heterogeneous Data Data sources need to be made self-describing by specifying the relevant meta data D 1 D 2 Aspartyl/asparaginyl beta-hydroxylase Beta-adrenergic receptor kinase 2 Protein Name 1.14.11.16 Peptide-aspartate beta-dioxygenase TPR TPR_REGION TPR MAQRKNAKSS GNSSSSGSGS … Q12797 2.7.1.126 Beta-adrenergic receptor kinase RGS PROT_KIN_DOM PH_DOMAIN MADLEAVLAD VSYLMAMEKS … P35626 EC Number Prosite Motifs Protein Sequence Protein ID RIIa HSP70 Pfam Domains 415 692 Length BCY1 SSE1 Gene 16.19.01 cyclic nucleotide binding (cAMP, cGMP, etc.) VSSLPKESQA ELQLFQNEIN … P07278 16.01 protein binding STPFGLDLGN NNSVLAVARN … P32589 MIPS Funcat AA Sequence Accession Number AN
14.
15. Attribute value hierarchy An attribute value hierarchy (AVH) is a partial order ontology over the values of attributes of data Example: MIPS Funcat Hierarchy
16. Making data sources self-describing - Ontology-extended data source Data Schema Ontology + + MIPS Funcat: MIPS Hierarchy Prosite Motifs: Motifs Length: Positive Integer Gene: Gene ID Accession Number: MIPS ID RIIa HSP70 415 692 BCY1 SSE1 16.19.01 cyclic nucleotide binding (cAMP, cGMP.) VSSLPKESQA ELQLFQNEIN P07278 16.01 protein binding STPFGLDLGN NNSVLAVARN P32589
22. Mappings at schema level Protein ID: Swissprot ID Protein Name: String Protein Sequence: AA String Prosite Motifs: AA String EC Number: EC Hierarchy Accession No AN: MIPS ID Gene: Gene ID AA Sequence: AA String Length: Pos Integer MIPS Funcat: MIPS Hierarchy Pfam Motifs: Motifs D 1 D 2 PID: Swissprot ID Protein: AA String GO Function: GO Hierarchy D U Source: Species String
23. Mappings at schema level Protein ID : D 1 ≡ PID : D U Accession Number AN : D 2 ≡ PID : D U Protein ID: Swissprot ID Protein Name: String Protein Sequence: AA String Prosite Motifs: AA String EC Number: EC Hierarchy Accession No AN: MIPS ID Gene: Gene Set AA Sequence: AA String Length: Pos Integer MIPS Funcat: MIPS Hierarchy Pfam Motifs: Motifs D 1 D 2 PID: Swissprot ID Protein: AA String GO Function: GO Hierarchy D U Source: Species String
24. Mappings at schema level Protein ID : D 1 ≡ PID : D U Accession Number AN : D 2 ≡ PID : D U Protein Sequence : D 1 ≡ AA Composition : D U AA Sequence : D 2 ≡ AA Composition : D U Protein ID: Swissprot ID Protein Name: String Protein Sequence: AA String Prosite Motifs: AA String EC Number: EC Hierarchy Accession No AN: MIPS ID Gene: Gene ID AA Sequence: AA String Length: Pos Integer MIPS Funcat: MIPS Hierarchy Pfam Motifs: Motifs D 1 D 2 PID: Swissprot ID Protein: AA String GO Function: GO Hierarchy D U Source: Species String
25. Mappings at schema level Protein ID : D 1 ≡ PID : D U Accession Number AN : D 2 ≡ PID : D U Protein Sequence : D 1 ≡ AA Composition : D U AA Sequence : D 2 ≡ AA Composition : D U EC Number : D 1 ≡ GO Function : D U’ MIPS Funcat : D 2 ≡ GO Function : D U Protein ID: SwissProt ID Protein Name: String Protein Sequence: AA String Prosite Motifs: AA String EC Number: EC Hierarchy Accession No AN: MIPS ID Gene: Gene ID AA Sequence: AA String Length: Pos Integer MIPS Funcat: MIPS Hierarchy Pfam Motifs: Motifs D 1 D 2 PID: SwissProt ID Protein: AA String GO Function: GO Hierarchy D U Source: Species String
INDUS – a federated, query centric approach to the problem of knowledge acquisition from distributed, semantically heterogeneous, autonomous data sources Learning algorithms that can be decomposed into information gathering (obtained by answering queries) and hypothesis generation can be easily linked to INDUS INDUS makes possible the exchange of data and findings between scientists or institutions working on related problems (e.g., bioinformatics)
Design that is tailored for predictive model building using machine learning algorithms from distributed, semantically heterogeneous, autonomous data sources
INDUS – a federated, query centric approach to the problem of knowledge acquisition from distributed, semantically heterogeneous, autonomous data sources Learning algorithms that can be decomposed into information gathering (obtained by answering queries) and hypothesis generation can be easily linked to INDUS INDUS makes possible the exchange of data and findings between scientists or institutions working on related problems (e.g., bioinformatics)