SlideShare ist ein Scribd-Unternehmen logo
1 von 1
Downloaden Sie, um offline zu lesen
iBioSearch: The Integrated Biological Database Search
Ritu Khare and Yuan An
PROBLEM
Presence, of a very large number of biological Web databases and
their interfaces, makes it difficult for biologists to search for any
biological entity (See Fig. 1). Currently, the only option biologists
have is to search each of these numerous interfaces individually.

WI Metamodel: We observe that all input Web Interfaces (WIs) have an
underlying global model. We created this global model manually and termed
it as the "WI Metamodel". See Fig. 2.
WI: Every Web Interface (WI) can be represented as an instance of the
metamodel.

Fig. 1: Problem - biologist
searching for an entity

META-SEARCH
INTERFACE

GENERATION OF
GLOBAL
BIOLOGICAL WI
SCHEMA

RE
VE
RS

CLUSTERING
SEARCH ENTITIES
AND LABELS

FUTURE WORK

EE
INE
NG

In future, we intend to dynamically update biological databases
repository, maintain semantic mappings when base
databases evolve, translate user queries, and consolidate,
reconcile, and rank the query results using data cleansing and
relevance computing algorithms. In addition to this, our plan
includes performing usability testing of iBioSearch system with
the help of biologists.

ER

MAPPING WI
WITH
METAMODEL

WI MetaModel

ING

We aim to provide a unified search interface with capability of
searching multiple (1000+) biological databases. This interface
would be a representation of the biological search interface
ontology. For finding the global search ontology, we take a novel
approach of reverse engineering individual search interface into a
conceptual model, and then finding an integrated model that would
be consistent with all the interfaces up to a level of significance.

HYPOTHESIS & ASSUMPTIONS

Fig.2: WI Metamodel

www.ischool.drexel.edu

INFORMATION
RETRIEVAL

INFORMATION
EXTRACTION

OUR SOLUTION

OLDB

OLDB

OLDB

The GBWS or ontology could be represented as a meta-search
interface for biologists wherein they can search for most of the
biological entities on several search criteria available on
different databases.
Eventually, we aim to find the answers to other research
questions such as:
1. Differences between commercial and biological databases.
2. Automatic identification of biological search interfaces.
3. Reverse Engineering of a WI into an ER diagram.
4. Integration of multiple ER diagrams
5. Extracting relationships between biological search entities.

METHODOLOGY
Which interface to search?
Which database to access?
What all search criteria do I have?
How many sources to consider?

CURRENT AND PREDICTED RESULTS

OLDB

OLDB

Fig. 3: Methodology

REFERENCES
1. Web Interface (Wis) Collection: Collect WIs to biological databases.
2. Information Extraction: For each WI, extract attributes corresponding to
the WI metamodel. Broadly, a WI can be represented as a collection of
search entities and their respective labels (search criteria).
3. Mapping WI- metamodel: Map each WI to the WI metamodel to generate
the instances of the metamodel. Then, we have a list of search entities and
their respective criteria (labels). For a given search entity Si , there will be
label set (li1, li2, li3,
, lim).
4. Clustering: Find non-overlapping classes of search entities representing
synonyms, and for each class, find a list of non-redundant labels.
5. Generation of GBWS: Eventually, we generate another conceptual model
that we call as a “Global Biological WI Schema“ (GBWS). It would represent
all possible input WIs in a non-redundant manner, and capture matchings
between individual instances of the WI metamodel.

1. Arasu, A., & Garcia-Molina, H. (2003). Extracting structured data from
web pages. Proceedings of the 2003 ACM SIGMOD International
Conference on Management of Data , San Diego, California. 337-348.
2. Barbosa, L., Tandon, S., & Freire, J. (2007). Automatically constructing
a directory of molecular biology databases. Proceedings of the
International Workshop on Data Integration in the Life Sciences 2007
(DILS), Philadelphia, PA.
3. He, B., & Chang, K. C. (2003). Statistical schema matching across web
query interfaces. 2003 ACM SIGMOD International Conference on
Management of Data , San Diego, Californi. 217-228.
4. Wang, J., Wen, J., Lochovsky, F., & Ma, W. (2004). Instance-based
schema matching for web databases by domain-specific query probing.
Thirtieth International Conference on very Large Data Bases, 30, 408 419.

Weitere Àhnliche Inhalte

Was ist angesagt?

From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...
Catherine Canevet
 

Was ist angesagt? (20)

DAS game: how a programmer thinks
DAS game: how a programmer thinksDAS game: how a programmer thinks
DAS game: how a programmer thinks
 
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
 
2015 Summer - Araport Project Overview Leaflet
2015 Summer - Araport Project Overview Leaflet2015 Summer - Araport Project Overview Leaflet
2015 Summer - Araport Project Overview Leaflet
 
Presentation from Code Camp 2017
Presentation from Code Camp 2017Presentation from Code Camp 2017
Presentation from Code Camp 2017
 
FAIR data and the Etsin service
FAIR data and the Etsin serviceFAIR data and the Etsin service
FAIR data and the Etsin service
 
Biositemaps: A Framework for Biomedical Resource Discovery
Biositemaps: A Framework for Biomedical Resource DiscoveryBiositemaps: A Framework for Biomedical Resource Discovery
Biositemaps: A Framework for Biomedical Resource Discovery
 
ICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportICAR 2015 Poster - Araport
ICAR 2015 Poster - Araport
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant Science
 
The Uniform Resource Layer
The Uniform Resource LayerThe Uniform Resource Layer
The Uniform Resource Layer
 
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific ExperimentsAn Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
 
NCBO Overview and Biositemaps
NCBO Overview and BiositemapsNCBO Overview and Biositemaps
NCBO Overview and Biositemaps
 
2016 Summer - Araport Project Overview Leaflet
2016 Summer - Araport Project Overview Leaflet2016 Summer - Araport Project Overview Leaflet
2016 Summer - Araport Project Overview Leaflet
 
Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...
Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...
Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...
 
From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...
 
Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...
 
BibBase Linked Data Triplification Challenge 2010 Presentation
BibBase Linked Data Triplification Challenge 2010 PresentationBibBase Linked Data Triplification Challenge 2010 Presentation
BibBase Linked Data Triplification Challenge 2010 Presentation
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literature
 
Wheat Data Interoperability (2) by Esther DZALE YEUMO KABORE and Richard FULSS
Wheat Data Interoperability (2) by Esther DZALE YEUMO KABORE and Richard FULSSWheat Data Interoperability (2) by Esther DZALE YEUMO KABORE and Richard FULSS
Wheat Data Interoperability (2) by Esther DZALE YEUMO KABORE and Richard FULSS
 
CEDAR: Web-Based Tools for Accelerating the Creation of Standardized Metadata
CEDAR: Web-Based Tools for Accelerating the Creation of Standardized MetadataCEDAR: Web-Based Tools for Accelerating the Creation of Standardized Metadata
CEDAR: Web-Based Tools for Accelerating the Creation of Standardized Metadata
 
Neuroscience as networked science
Neuroscience as networked scienceNeuroscience as networked science
Neuroscience as networked science
 

Andere mochten auch

Word Document Format
Word Document FormatWord Document Format
Word Document Format
butest
 
Dn13 u3 a18_hbra
Dn13 u3 a18_hbraDn13 u3 a18_hbra
Dn13 u3 a18_hbra
Raul13_11
 
Exploiting Semantic Structure for Mapping Clinician-specified Form Terms to S...
Exploiting Semantic Structure for Mapping Clinician-specified Form Terms to S...Exploiting Semantic Structure for Mapping Clinician-specified Form Terms to S...
Exploiting Semantic Structure for Mapping Clinician-specified Form Terms to S...
The Children's Hospital of Philadelphia
 
Summary to cv
Summary to cvSummary to cv
Summary to cv
aalmarques
 
An atlas of_predicted_exotic_gravitational_lenses
An atlas of_predicted_exotic_gravitational_lensesAn atlas of_predicted_exotic_gravitational_lenses
An atlas of_predicted_exotic_gravitational_lenses
SĂ©rgio Sacani
 
1988 a+a 203-355-vrot-massloss
1988 a+a 203-355-vrot-massloss1988 a+a 203-355-vrot-massloss
1988 a+a 203-355-vrot-massloss
Kees De Jager
 
Star formation history_in_the_smc_the_case_of_ngc602
Star formation history_in_the_smc_the_case_of_ngc602Star formation history_in_the_smc_the_case_of_ngc602
Star formation history_in_the_smc_the_case_of_ngc602
SĂ©rgio Sacani
 
Three newly discovered_globular_clusters_in_ngc6822
Three newly discovered_globular_clusters_in_ngc6822Three newly discovered_globular_clusters_in_ngc6822
Three newly discovered_globular_clusters_in_ngc6822
SĂ©rgio Sacani
 
VersĂŁo 1.66
VersĂŁo 1.66VersĂŁo 1.66
VersĂŁo 1.66
EZ Commerce
 

Andere mochten auch (20)

Word Document Format
Word Document FormatWord Document Format
Word Document Format
 
Trust or Control ?
Trust or Control ? Trust or Control ?
Trust or Control ?
 
Dn13 u3 a18_hbra
Dn13 u3 a18_hbraDn13 u3 a18_hbra
Dn13 u3 a18_hbra
 
Unwrapping a standard2
Unwrapping a standard2Unwrapping a standard2
Unwrapping a standard2
 
WebShoppers 22ÂȘ Edição
WebShoppers 22ÂȘ EdiçãoWebShoppers 22ÂȘ Edição
WebShoppers 22ÂȘ Edição
 
Exploiting Semantic Structure for Mapping Clinician-specified Form Terms to S...
Exploiting Semantic Structure for Mapping Clinician-specified Form Terms to S...Exploiting Semantic Structure for Mapping Clinician-specified Form Terms to S...
Exploiting Semantic Structure for Mapping Clinician-specified Form Terms to S...
 
Summary to cv
Summary to cvSummary to cv
Summary to cv
 
2 bra aktier för den lÄngsiktige
2 bra aktier för den lÄngsiktige2 bra aktier för den lÄngsiktige
2 bra aktier för den lÄngsiktige
 
Eclass Model
Eclass ModelEclass Model
Eclass Model
 
Collaborative and agile development of mobile applications
Collaborative and agile development of mobile applicationsCollaborative and agile development of mobile applications
Collaborative and agile development of mobile applications
 
An atlas of_predicted_exotic_gravitational_lenses
An atlas of_predicted_exotic_gravitational_lensesAn atlas of_predicted_exotic_gravitational_lenses
An atlas of_predicted_exotic_gravitational_lenses
 
Outlook
OutlookOutlook
Outlook
 
2001 mnras 32-452-instabregions
2001 mnras 32-452-instabregions2001 mnras 32-452-instabregions
2001 mnras 32-452-instabregions
 
1988 a+a 203-355-vrot-massloss
1988 a+a 203-355-vrot-massloss1988 a+a 203-355-vrot-massloss
1988 a+a 203-355-vrot-massloss
 
Star formation history_in_the_smc_the_case_of_ngc602
Star formation history_in_the_smc_the_case_of_ngc602Star formation history_in_the_smc_the_case_of_ngc602
Star formation history_in_the_smc_the_case_of_ngc602
 
Three newly discovered_globular_clusters_in_ngc6822
Three newly discovered_globular_clusters_in_ngc6822Three newly discovered_globular_clusters_in_ngc6822
Three newly discovered_globular_clusters_in_ngc6822
 
Publicar banners (wordpress)
Publicar banners (wordpress)Publicar banners (wordpress)
Publicar banners (wordpress)
 
RFID in Austria
RFID in AustriaRFID in Austria
RFID in Austria
 
VersĂŁo 1.66
VersĂŁo 1.66VersĂŁo 1.66
VersĂŁo 1.66
 
Report
ReportReport
Report
 

Ähnlich wie iBioSearch: The Integrated Biological Database Search

NLP_BioAssayPoster
NLP_BioAssayPosterNLP_BioAssayPoster
NLP_BioAssayPoster
Suman Lama
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
c.titus.brown
 
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
IJERD Editor
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD Editor
 

Ähnlich wie iBioSearch: The Integrated Biological Database Search (20)

Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysis
 
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
 
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
 
Chemspider Presentation at the ACS Meeting in New orleans
Chemspider Presentation at the ACS Meeting in New orleansChemspider Presentation at the ACS Meeting in New orleans
Chemspider Presentation at the ACS Meeting in New orleans
 
Presentationonline
PresentationonlinePresentationonline
Presentationonline
 
Semantic Conflicts and Solutions in Integration of Fuzzy Relational Databases
Semantic Conflicts and Solutions in Integration of Fuzzy Relational DatabasesSemantic Conflicts and Solutions in Integration of Fuzzy Relational Databases
Semantic Conflicts and Solutions in Integration of Fuzzy Relational Databases
 
Academic Linkage A Linkage Platform For Large Volumes Of Academic Information
Academic Linkage  A Linkage Platform For Large Volumes Of Academic InformationAcademic Linkage  A Linkage Platform For Large Volumes Of Academic Information
Academic Linkage A Linkage Platform For Large Volumes Of Academic Information
 
TWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLS
TWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLSTWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLS
TWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLS
 
NLP_BioAssayPoster
NLP_BioAssayPosterNLP_BioAssayPoster
NLP_BioAssayPoster
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
 
A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...
 
IU Data Visualization Class Final Project: Visualizing Missing Species Intera...
IU Data Visualization Class Final Project: Visualizing Missing Species Intera...IU Data Visualization Class Final Project: Visualizing Missing Species Intera...
IU Data Visualization Class Final Project: Visualizing Missing Species Intera...
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequences
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
Bio4j
Bio4jBio4j
Bio4j
 
Data retriveal ,srg and dbget
Data retriveal ,srg and dbgetData retriveal ,srg and dbget
Data retriveal ,srg and dbget
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials Design
 

KĂŒrzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

KĂŒrzlich hochgeladen (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

iBioSearch: The Integrated Biological Database Search

  • 1. iBioSearch: The Integrated Biological Database Search Ritu Khare and Yuan An PROBLEM Presence, of a very large number of biological Web databases and their interfaces, makes it difficult for biologists to search for any biological entity (See Fig. 1). Currently, the only option biologists have is to search each of these numerous interfaces individually. WI Metamodel: We observe that all input Web Interfaces (WIs) have an underlying global model. We created this global model manually and termed it as the "WI Metamodel". See Fig. 2. WI: Every Web Interface (WI) can be represented as an instance of the metamodel. Fig. 1: Problem - biologist searching for an entity META-SEARCH INTERFACE GENERATION OF GLOBAL BIOLOGICAL WI SCHEMA RE VE RS CLUSTERING SEARCH ENTITIES AND LABELS FUTURE WORK EE INE NG In future, we intend to dynamically update biological databases repository, maintain semantic mappings when base databases evolve, translate user queries, and consolidate, reconcile, and rank the query results using data cleansing and relevance computing algorithms. In addition to this, our plan includes performing usability testing of iBioSearch system with the help of biologists. ER MAPPING WI WITH METAMODEL WI MetaModel ING We aim to provide a unified search interface with capability of searching multiple (1000+) biological databases. This interface would be a representation of the biological search interface ontology. For finding the global search ontology, we take a novel approach of reverse engineering individual search interface into a conceptual model, and then finding an integrated model that would be consistent with all the interfaces up to a level of significance. HYPOTHESIS & ASSUMPTIONS Fig.2: WI Metamodel www.ischool.drexel.edu INFORMATION RETRIEVAL INFORMATION EXTRACTION OUR SOLUTION OLDB OLDB OLDB The GBWS or ontology could be represented as a meta-search interface for biologists wherein they can search for most of the biological entities on several search criteria available on different databases. Eventually, we aim to find the answers to other research questions such as: 1. Differences between commercial and biological databases. 2. Automatic identification of biological search interfaces. 3. Reverse Engineering of a WI into an ER diagram. 4. Integration of multiple ER diagrams 5. Extracting relationships between biological search entities. METHODOLOGY Which interface to search? Which database to access? What all search criteria do I have? How many sources to consider? CURRENT AND PREDICTED RESULTS OLDB OLDB Fig. 3: Methodology REFERENCES 1. Web Interface (Wis) Collection: Collect WIs to biological databases. 2. Information Extraction: For each WI, extract attributes corresponding to the WI metamodel. Broadly, a WI can be represented as a collection of search entities and their respective labels (search criteria). 3. Mapping WI- metamodel: Map each WI to the WI metamodel to generate the instances of the metamodel. Then, we have a list of search entities and their respective criteria (labels). For a given search entity Si , there will be label set (li1, li2, li3,
, lim). 4. Clustering: Find non-overlapping classes of search entities representing synonyms, and for each class, find a list of non-redundant labels. 5. Generation of GBWS: Eventually, we generate another conceptual model that we call as a “Global Biological WI Schema“ (GBWS). It would represent all possible input WIs in a non-redundant manner, and capture matchings between individual instances of the WI metamodel. 1. Arasu, A., & Garcia-Molina, H. (2003). Extracting structured data from web pages. Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data , San Diego, California. 337-348. 2. Barbosa, L., Tandon, S., & Freire, J. (2007). Automatically constructing a directory of molecular biology databases. Proceedings of the International Workshop on Data Integration in the Life Sciences 2007 (DILS), Philadelphia, PA. 3. He, B., & Chang, K. C. (2003). Statistical schema matching across web query interfaces. 2003 ACM SIGMOD International Conference on Management of Data , San Diego, Californi. 217-228. 4. Wang, J., Wen, J., Lochovsky, F., & Ma, W. (2004). Instance-based schema matching for web databases by domain-specific query probing. Thirtieth International Conference on very Large Data Bases, 30, 408 419.