SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Bioinformatics Resources and Tools
on the Web: A Primer
Joel H. Graber
Center for Advanced Biotechnology
Boston University
Outline
• Introduction: What is bioinformatics?
• The basics
– The five sites that all biologists should know
• Some examples
– Using the tools in a somewhat less-than-naïve manner
• Questions/comments are welcome at all points
• Much of this material comes from the Boston
University course: BF527 Bioinformatic
Applications (http://matrix.bu.edu/BF527/)
What is bioinformatics?
Examples of Bioinformatics
• Database interfaces
– Genbank/EMBL/DDBJ, Medline, SwissProt, PDB, …
• Sequence alignment
– BLAST, FASTA
• Multiple sequence alignment
– Clustal, MultAlin, DiAlign
• Gene finding
– Genscan, GenomeScan, GeneMark, GRAIL
• Protein Domain analysis and identification
– pfam, BLOCKS, ProDom,
• Pattern Identification/Characterization
– Gibbs Sampler, AlignACE, MEME
• Protein Folding prediction
– PredictProtein, SwissModeler
Things to know and remember about
using web server-based tools
• You are using someone else’s computer
• You are (probably) getting a reduced set of
options or capacity
• Servers are great for sporadic or proof-of-
principle work, but for intensive work, the
software should be obtained and run locally
Five websites that all biologists
should know
• NCBI (The National Center for Biotechnology Information;
– http://www.ncbi.nlm.nih.gov/
• EBI (The European Bioinformatics Institute)
– http://www.ebi.ac.uk/
• The Canadian Bioinformatics Resource
– http://www.cbr.nrc.ca/
• SwissProt/ExPASy (Swiss Bioinformatics Resource)
– http://expasy.cbr.nrc.ca/sprot/
• PDB (The Protein Databank)
– http://www.rcsb.org/PDB/
NCBI (http://www.ncbi.nlm.nih.gov/)
• Entrez interface to databases
– Medline/OMIM
– Genbank/Genpept/Structures
• BLAST server(s)
– Five-plus flavors of blast
• Draft Human Genome
• Much, much more…
EBI (http://www.ebi.ac.uk/)
• SRS database interface
– EMBL, SwissProt, and many more
• Many server-based tools
– ClustalW, DALI, …
SwissProt (http://expasy.cbr.nrc.ca/sprot/)
• Curation!!!
– Error rate in the information is greatly reduced in
comparison to most other databases.
• Extensive cross-linking to other data sources
• SwissProt is the ‘gold-standard’ by which
other databases can be measured, and is the
best place to start if you have a specific
protein to investigate
A few more resources to be aware of
• Human Genome Working Draft
– http://genome.ucsc.edu/
• TIGR (The Institute for Genomics Research)
– http://www.tigr.org/
• Celera
– http://www.celera.com/
• (Model) Organism specific information:
– Yeast: http://genome-www.stanford.edu/Saccharomyces/
– Arabidopis: http://www.tair.org/
– Mouse: http://www.jax.org/
– Fruitfly: http://www.fruitfly.org/
– Nematode: http://www.wormbase.org/
• Nucleic Acids Research Database Issue
– http://nar.oupjournals.org/ (First issue every year)
Example 1: Searching a new
genome for a specific protein
• Specific problem: We want to find the closest
match in C. elegans of D. melanogaster protein
NTF1, a transcription factor
• First- understanding the different forms of blast
The different versions of BLAST
1st Step: Search the proteins
• blastp is used to search for C. elegans
proteins that are similar to NTF1
• Two reasonable hits are found, but the hits
have suspicious characteristics
– besides the fact that they weren’t included in the
complete genome!
2nd Step: Search the nucleotides
• tblastn is used to search for translations of C.
elegans nucleotide that are similar to NTF1
• Now we have only one hit
– How are they related?
Conclusion: Incorrect gene
prediction/annotation
• The two predicted proteins have essentially
identical annotation
• The protein-protein alignments are disjoint
and consecutive on the protein
• The protein-nucleotide alignment includes
both protein-protein alignments in the proper
order
• Why/how does this happen?
Final(?) Check: Gene prediction
• Genscan is the best available ab initio gene
predictor
– http://genes.mit.edu/GENSCAN.html
• Genscan’s prediction spans both protein-
protein alignments, reinforcing our conclusion
of a bad prediction
Ab initio vs. similarity vs. hybrid
models for gene finding
• Ab initio: The gene looks like the average of
many genes
– Genscan, GeneMark, GRAIL…
• Similarity: The gene looks like a specific
known gene
– Procrustes,…
• Hybrid: A combination of both
– Genomescan (http://genes.mit.edu/genomescan/)
A similar example: Fruitfly homolog
of mRNA localization protein VERA
• Similar procedure as just described
– Tblastn search with BLOSUM45 produces an unexpected exon
• Conclusion: Incomplete (as opposed to incorrect)
annotation
– We have verified the existence of the rare isoform through RT-PCR
Another example: Find all genes with
pdz domains
• Multiple methods are possible
• The ‘best’ method will depend on many things
– How much do you know about the domain?
– Do you know the exact extent of the domain?
– How many examples do you expect to find?
Some possible methods if the domain
is a known domain:
• SwissProt
– text search capabilities
– good annotation of known domains
– crosslinks to other databases (domains)
• Databases of known domains:
– BLOCKS (http://blocks.fhcrc.org/)
– Pfam (http://pfam.wustl.edu/)
– Others (ProDom, ProSite, DOMO,…)
Determination of the nature of
conservation in a domain
• For new domains, multiple alignment is your
best option
– Global: clustalw
– Local: DiAlign
– Hidden Markov Model: HMMER
• For known domains, this work has largely
been done for you
– BLOCKS
– Pfam
If you have a protein, and want to
search it to known domains
• Search/Analysis tools
– Pfam
– BLOCKS
– PredictProtein
(http://cubic.bioc.columbia.edu/predictprotein/predictprotein.html)
Different representations of
conserved domains
• BLOCKS
– Gapless regions
– Often multiple blocks for one domain
• PFAM
– Statistical model, based on HMM
– Since gaps are allowed, most domains have only
one pfam model
Conclusions
• We have only touched small parts of the
elephant
• Trial and error (intelligently) is often your best
tool
• Keep up with the main five sites, and you’ll
have a pretty good idea of what is happening
and available

Weitere ähnliche Inhalte

Ähnlich wie using_webbased_tools.ppt

GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopMorgan Langille
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesLeighton Pritchard
 
Genome resource databases in horticutural crops
Genome resource databases in horticutural cropsGenome resource databases in horticutural crops
Genome resource databases in horticutural cropsPulipati Gangadhara Rao
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_uploadProf. Wim Van Criekinge
 
Bioinformatics__Lecture_1.ppt
Bioinformatics__Lecture_1.pptBioinformatics__Lecture_1.ppt
Bioinformatics__Lecture_1.pptsirwansleman
 
University of Manchester Symposium 2012: Extraction and Representation of in ...
University of Manchester Symposium 2012: Extraction and Representation of in ...University of Manchester Symposium 2012: Extraction and Representation of in ...
University of Manchester Symposium 2012: Extraction and Representation of in ...geraintduck
 
Reproducible research - to infinity
Reproducible research - to infinityReproducible research - to infinity
Reproducible research - to infinityPeterMorrell4
 
Genome science intermine
Genome science intermineGenome science intermine
Genome science intermineELIXIR UK
 
The UCSC genome browser: A Neuroscience focused overview
The UCSC genome browser: A Neuroscience focused overviewThe UCSC genome browser: A Neuroscience focused overview
The UCSC genome browser: A Neuroscience focused overviewVictoria Perreau
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
 
Browsing Genes, Variation and Regulation data with Ensembl
Browsing Genes, Variation and Regulation data with EnsemblBrowsing Genes, Variation and Regulation data with Ensembl
Browsing Genes, Variation and Regulation data with EnsemblDenise Carvalho-Silva, PhD
 
Computational biology bls 303
Computational biology bls 303Computational biology bls 303
Computational biology bls 303Bruno Mmassy
 
Data Base in Bioinformatics.ppt
Data Base in Bioinformatics.pptData Base in Bioinformatics.ppt
Data Base in Bioinformatics.pptBangaluru
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnected Data World
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128GenomeInABottle
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...Felipe Albrecht
 

Ähnlich wie using_webbased_tools.ppt (20)

GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In Sequences
 
Genome resource databases in horticutural crops
Genome resource databases in horticutural cropsGenome resource databases in horticutural crops
Genome resource databases in horticutural crops
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
 
Bioinformatics__Lecture_1.ppt
Bioinformatics__Lecture_1.pptBioinformatics__Lecture_1.ppt
Bioinformatics__Lecture_1.ppt
 
University of Manchester Symposium 2012: Extraction and Representation of in ...
University of Manchester Symposium 2012: Extraction and Representation of in ...University of Manchester Symposium 2012: Extraction and Representation of in ...
University of Manchester Symposium 2012: Extraction and Representation of in ...
 
Reproducible research - to infinity
Reproducible research - to infinityReproducible research - to infinity
Reproducible research - to infinity
 
proteomics.ppt
proteomics.pptproteomics.ppt
proteomics.ppt
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
Genome science intermine
Genome science intermineGenome science intermine
Genome science intermine
 
The UCSC genome browser: A Neuroscience focused overview
The UCSC genome browser: A Neuroscience focused overviewThe UCSC genome browser: A Neuroscience focused overview
The UCSC genome browser: A Neuroscience focused overview
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Browsing Genes, Variation and Regulation data with Ensembl
Browsing Genes, Variation and Regulation data with EnsemblBrowsing Genes, Variation and Regulation data with Ensembl
Browsing Genes, Variation and Regulation data with Ensembl
 
Computational biology bls 303
Computational biology bls 303Computational biology bls 303
Computational biology bls 303
 
Data Base in Bioinformatics.ppt
Data Base in Bioinformatics.pptData Base in Bioinformatics.ppt
Data Base in Bioinformatics.ppt
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics Institute
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
 
gene prediction methods.pptx
gene prediction methods.pptxgene prediction methods.pptx
gene prediction methods.pptx
 

Kürzlich hochgeladen

GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsNurulAfiqah307317
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 

Kürzlich hochgeladen (20)

GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 

using_webbased_tools.ppt

  • 1. Bioinformatics Resources and Tools on the Web: A Primer Joel H. Graber Center for Advanced Biotechnology Boston University
  • 2. Outline • Introduction: What is bioinformatics? • The basics – The five sites that all biologists should know • Some examples – Using the tools in a somewhat less-than-naïve manner • Questions/comments are welcome at all points • Much of this material comes from the Boston University course: BF527 Bioinformatic Applications (http://matrix.bu.edu/BF527/)
  • 4. Examples of Bioinformatics • Database interfaces – Genbank/EMBL/DDBJ, Medline, SwissProt, PDB, … • Sequence alignment – BLAST, FASTA • Multiple sequence alignment – Clustal, MultAlin, DiAlign • Gene finding – Genscan, GenomeScan, GeneMark, GRAIL • Protein Domain analysis and identification – pfam, BLOCKS, ProDom, • Pattern Identification/Characterization – Gibbs Sampler, AlignACE, MEME • Protein Folding prediction – PredictProtein, SwissModeler
  • 5. Things to know and remember about using web server-based tools • You are using someone else’s computer • You are (probably) getting a reduced set of options or capacity • Servers are great for sporadic or proof-of- principle work, but for intensive work, the software should be obtained and run locally
  • 6. Five websites that all biologists should know • NCBI (The National Center for Biotechnology Information; – http://www.ncbi.nlm.nih.gov/ • EBI (The European Bioinformatics Institute) – http://www.ebi.ac.uk/ • The Canadian Bioinformatics Resource – http://www.cbr.nrc.ca/ • SwissProt/ExPASy (Swiss Bioinformatics Resource) – http://expasy.cbr.nrc.ca/sprot/ • PDB (The Protein Databank) – http://www.rcsb.org/PDB/
  • 7. NCBI (http://www.ncbi.nlm.nih.gov/) • Entrez interface to databases – Medline/OMIM – Genbank/Genpept/Structures • BLAST server(s) – Five-plus flavors of blast • Draft Human Genome • Much, much more…
  • 8. EBI (http://www.ebi.ac.uk/) • SRS database interface – EMBL, SwissProt, and many more • Many server-based tools – ClustalW, DALI, …
  • 9. SwissProt (http://expasy.cbr.nrc.ca/sprot/) • Curation!!! – Error rate in the information is greatly reduced in comparison to most other databases. • Extensive cross-linking to other data sources • SwissProt is the ‘gold-standard’ by which other databases can be measured, and is the best place to start if you have a specific protein to investigate
  • 10. A few more resources to be aware of • Human Genome Working Draft – http://genome.ucsc.edu/ • TIGR (The Institute for Genomics Research) – http://www.tigr.org/ • Celera – http://www.celera.com/ • (Model) Organism specific information: – Yeast: http://genome-www.stanford.edu/Saccharomyces/ – Arabidopis: http://www.tair.org/ – Mouse: http://www.jax.org/ – Fruitfly: http://www.fruitfly.org/ – Nematode: http://www.wormbase.org/ • Nucleic Acids Research Database Issue – http://nar.oupjournals.org/ (First issue every year)
  • 11. Example 1: Searching a new genome for a specific protein • Specific problem: We want to find the closest match in C. elegans of D. melanogaster protein NTF1, a transcription factor • First- understanding the different forms of blast
  • 13. 1st Step: Search the proteins • blastp is used to search for C. elegans proteins that are similar to NTF1 • Two reasonable hits are found, but the hits have suspicious characteristics – besides the fact that they weren’t included in the complete genome!
  • 14. 2nd Step: Search the nucleotides • tblastn is used to search for translations of C. elegans nucleotide that are similar to NTF1 • Now we have only one hit – How are they related?
  • 15. Conclusion: Incorrect gene prediction/annotation • The two predicted proteins have essentially identical annotation • The protein-protein alignments are disjoint and consecutive on the protein • The protein-nucleotide alignment includes both protein-protein alignments in the proper order • Why/how does this happen?
  • 16. Final(?) Check: Gene prediction • Genscan is the best available ab initio gene predictor – http://genes.mit.edu/GENSCAN.html • Genscan’s prediction spans both protein- protein alignments, reinforcing our conclusion of a bad prediction
  • 17. Ab initio vs. similarity vs. hybrid models for gene finding • Ab initio: The gene looks like the average of many genes – Genscan, GeneMark, GRAIL… • Similarity: The gene looks like a specific known gene – Procrustes,… • Hybrid: A combination of both – Genomescan (http://genes.mit.edu/genomescan/)
  • 18. A similar example: Fruitfly homolog of mRNA localization protein VERA • Similar procedure as just described – Tblastn search with BLOSUM45 produces an unexpected exon • Conclusion: Incomplete (as opposed to incorrect) annotation – We have verified the existence of the rare isoform through RT-PCR
  • 19. Another example: Find all genes with pdz domains • Multiple methods are possible • The ‘best’ method will depend on many things – How much do you know about the domain? – Do you know the exact extent of the domain? – How many examples do you expect to find?
  • 20. Some possible methods if the domain is a known domain: • SwissProt – text search capabilities – good annotation of known domains – crosslinks to other databases (domains) • Databases of known domains: – BLOCKS (http://blocks.fhcrc.org/) – Pfam (http://pfam.wustl.edu/) – Others (ProDom, ProSite, DOMO,…)
  • 21. Determination of the nature of conservation in a domain • For new domains, multiple alignment is your best option – Global: clustalw – Local: DiAlign – Hidden Markov Model: HMMER • For known domains, this work has largely been done for you – BLOCKS – Pfam
  • 22. If you have a protein, and want to search it to known domains • Search/Analysis tools – Pfam – BLOCKS – PredictProtein (http://cubic.bioc.columbia.edu/predictprotein/predictprotein.html)
  • 23. Different representations of conserved domains • BLOCKS – Gapless regions – Often multiple blocks for one domain • PFAM – Statistical model, based on HMM – Since gaps are allowed, most domains have only one pfam model
  • 24. Conclusions • We have only touched small parts of the elephant • Trial and error (intelligently) is often your best tool • Keep up with the main five sites, and you’ll have a pretty good idea of what is happening and available