SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
How do you solve a problem like a
biological database?
(BNF 216 - Database Modeling and Design for Bioinformatics)
Arjei Balandra
Software Developer
National Telehealth Center
University of the Philippines – Manila
http://bumblebest.net
Database
• A database is a set of data that has a regular
structure and that is organized in such a way
that a computer can easily find the
desired information.
– The Linux Information Project
(http://www.linfo.org/database.html)
Biological Database
• Biological databases are libraries of life
sciences information collected from scientific
experiments, published literature, high-
throughput experiment technology, and
computational analyses.
- Wikipedia (en.wikipedia.org/wiki/Biological_database)
NCBI - GenBank
European Nucleotide Archive –
EMBL-EBI
DDBJ – DNA Data Bank Of Japan
Why Database?
• Data-intensive techniques such as high-
throughput screening and gene expression
experiments demand methods to correlate
large and diverse datasets.
• Databases integrate information from a
variety of sources allowing faster and more
powerful searches.
DO A “GOOD” DATABASE DESIGN
Tip #1:
Good Database Design
• Provides easy access to previous results.
• Supports both expert- and machine-guided
searches for novel correlations in data.
Bad Database Design
• Obfuscates the correlations for which the user
is searching
• makes it difficult for biologists to fit their data
into the database or to find previously stored
data resulting to user contempt.
• ‘brittle’
LEARN FROM EXISTING LITERATURE
Tip #2:
• Generalizations
• Incorporate existing schema into the database
design
• Use existing structures for common data
Generalizations
aMAZE (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC308873/figure/gkh139f2/)
RESPECT THE UNIQUE NEEDS OF
BIOLOGISTS (AND USERS)
Tip #3:
Business rules
• constraints
– based on data derived from the real-world
entities
– specific to the needs of the organization.
What they need?
– Use free-text Comments
– Create user-specific
categories
Dealing with Business Rules
User-Specific Categories
DESIGN THE DATABASE BEFORE
BUILDING IT
Tip #4:
USE THE DATABASE TO ENFORCE
DATA INTEGRITY
Tip #5:
Normalization
Normalization
Normalization
KEEP THE DATABASE SCOPE
MANAGEABLE
Tip #6:
• In Biology, one size does not fit all
• Focus on a subset of Biology (ie. Genes,
Proteins)
• In large subsets, do it one at a time
• Inclusive
Keep the database scope manageable
LISTEN TO THE PEOPLE WHO HAVE TO
WRITE AND USE THE INTERFACE
Tip #7:
• Databases are successful only when people
use it
Users know what they want and need
+ Developers know what they can do
+ Designers know what must be done
---------------------------------------------------------
= Collaborative approach to develop a
successful database
TEST THE DESIGN WITH REALISTIC
DATA
Tip #8:
MAKE THE DATABASE STRUCTURE
UNDERSTANDABLE AND
EASY TO MAINTAIN
Tip #9:
THANK YOU!
REPLACE(quote,
”pagmamahal”,”
data”);
quote
References
• The Linux Information Project
(http://www.linfo.org/database.html)
• Nelson, M.R., Reisinger, S.J., Henry, S. (2003).Designing
databases to store biological information. BIOSILICO
Vol. 1, No. 4
• Wikipedia (en.wikipedia.org/wiki/Biological_database)
• Lemer, C., Antezana, E., Couche, F., Fays, F., Santolaria,
X., Janky, R., … Wodak, S. J. (2004). The aMAZE
LightBench: a web interface to a relational database
of cellular processes. Nucleic Acids
Research, 32(Database issue), D443–D448.
doi:10.1093/nar/gkh139

Weitere ähnliche Inhalte

Was ist angesagt?

Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignment
avrilcoghlan
 
Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical Notebook
Naima Tahsin
 

Was ist angesagt? (20)

Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignment
 
Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. local
 
Gene prediction method
Gene prediction method Gene prediction method
Gene prediction method
 
Scoring matrices
Scoring matricesScoring matrices
Scoring matrices
 
Secondary structure prediction
Secondary structure predictionSecondary structure prediction
Secondary structure prediction
 
Dot matrix
Dot matrixDot matrix
Dot matrix
 
TOOLS AND DATA BASES OF NCBI
TOOLS AND DATA BASES OF NCBITOOLS AND DATA BASES OF NCBI
TOOLS AND DATA BASES OF NCBI
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
Secondary protein structure prediction
Secondary protein structure predictionSecondary protein structure prediction
Secondary protein structure prediction
 
Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Protein structure visualization tools-RASMOL
Protein structure visualization tools-RASMOLProtein structure visualization tools-RASMOL
Protein structure visualization tools-RASMOL
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformatics
 
Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical Notebook
 
Database Searching
Database SearchingDatabase Searching
Database Searching
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 

Andere mochten auch (7)

Wyklad 2
Wyklad 2Wyklad 2
Wyklad 2
 
Bioinformatic tools in Pheromone technology
Bioinformatic tools in Pheromone technologyBioinformatic tools in Pheromone technology
Bioinformatic tools in Pheromone technology
 
Presentation4 - Microbio
Presentation4 - MicrobioPresentation4 - Microbio
Presentation4 - Microbio
 
Biological databases
Biological databasesBiological databases
Biological databases
 
2017 biological databasespart2
2017 biological databasespart22017 biological databasespart2
2017 biological databasespart2
 
Bioinformatics principles and applications
Bioinformatics principles and applicationsBioinformatics principles and applications
Bioinformatics principles and applications
 
2017 biological databases_part1_vupload
2017 biological databases_part1_vupload2017 biological databases_part1_vupload
2017 biological databases_part1_vupload
 

Ähnlich wie Designing Biological Databases

Module 1 - Chapter1.pptx
Module 1 - Chapter1.pptxModule 1 - Chapter1.pptx
Module 1 - Chapter1.pptx
SoniaDevi15
 

Ähnlich wie Designing Biological Databases (20)

Module 1 - Chapter1.pptx
Module 1 - Chapter1.pptxModule 1 - Chapter1.pptx
Module 1 - Chapter1.pptx
 
Designing a community resource - Sandra Orchard
Designing a community resource - Sandra OrchardDesigning a community resource - Sandra Orchard
Designing a community resource - Sandra Orchard
 
Datat and donuts: how to write a data management plan
Datat and donuts: how to write a data management planDatat and donuts: how to write a data management plan
Datat and donuts: how to write a data management plan
 
Biological data bioinformatics
Biological data bioinformatics Biological data bioinformatics
Biological data bioinformatics
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
Data retreival system
Data retreival systemData retreival system
Data retreival system
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
biological databases.pptx
biological databases.pptxbiological databases.pptx
biological databases.pptx
 
Bioinformatics__Lecture_1.ppt
Bioinformatics__Lecture_1.pptBioinformatics__Lecture_1.ppt
Bioinformatics__Lecture_1.ppt
 
Composite protein databases
Composite protein databasesComposite protein databases
Composite protein databases
 
The Progress on Sagace and Data Integration
The Progress on Sagace and Data IntegrationThe Progress on Sagace and Data Integration
The Progress on Sagace and Data Integration
 
Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystem
 
SciBite
SciBiteSciBite
SciBite
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 
Computing 7
Computing 7Computing 7
Computing 7
 
Clinical Anatomy 9566
Clinical Anatomy 9566Clinical Anatomy 9566
Clinical Anatomy 9566
 
Database part1-
Database part1-Database part1-
Database part1-
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Designing Biological Databases