SlideShare ist ein Scribd-Unternehmen logo
1 von 240
Downloaden Sie, um offline zu lesen
BioBike: a web-based environment for integration and analysis of Biological knowledge Biniam Abebe
Scope  ,[object Object]
Introduction
What is BioBike ?
How can we use BioBike to solve some of the question we have on our research?  ,[object Object]
 High-level insightsUnfiltered output ,[object Object]
 Basic insights,[object Object]
Resources,[object Object]
We need… Biologists . . . . . . and Programmers
We need…
What is BioBike ?
biobike.csbc.vcu.edu
BioBIKE INSTANCES AND THEIR KNOWLEDGE AND DATA BASES  A BioBIKE instance provides a framework for all available information needed by a given research community Including  Sets of genomic sequences Gene annotations Functional descriptions Formal categories (e.g. COG) hierarchical groupings of metabolic reactions linked with genes (from KEGG More………….
Current BioBIKEs ,[object Object]
 42 – Cyanobacteria
Phantome/BioBIKE
6  - Archeal virus , 758 – Bacteriophage, 754- Eubacteria, 1 - Eukaryotic Virus
Sterptobike
stephylobike
ViroBike,[object Object]
Blast - for sequence searches
Clustal - for multiple sequence alignments
Meme - for motif discovery;
RNAz - for discovery of conserved RNA sequences;
Phylip - for construction of phylogenetic trees. All are accessed through the same interface,
Way BioBIKE? ,[object Object]
 Computability of results and nesting
 Small working vocabulary
 Implied iteration
 Extensibility,[object Object]
Function palette Workspace The BioBIKE environment is divided into three areas as shown. You'll bring functions down from the function palette to the workspace, execute them, and note the results in the results window Results window
Construct the code you want to execute here! For a visual guide to the VPL, click here HELP! PROBLEM Two very important buttons on the function palette:            On-line help (general)                  Something went wrong?                 Tell us!
Construct the code you want to execute here! For a visual guide to the VPL, click here Two very important buttons in the workspace: Undo (return to workspace            before last action) Redo (Get back the            workspace you undid)
Construct the code you want to execute here! For a visual guide to the VPL, click here
Construct the code you want to execute here! For a visual guide to the VPL, click here
A COUNT-OF function box is now in the workspace.  Before continuing with the problem, let's consider what function boxes mean.
A COUNT-OF function box is now in the workspace.  Before continuing with the problem, let's consider what function boxes mean.
Argument(object) Function-name Flag Keyword object General Syntax of BioBIKE The basic unit of BioBIKE is the function box. It consists of the name of a function, perhaps one or more required arguments, and optional keywords and flags. A function may be thought of as a black box: you feed it information, it produces a product.
Argument(object) Function-name Flag Keyword object General Syntax of BioBIKE Function boxes contain the following elements: ,[object Object]
Argument: Required, acted on by function
Keyword clause: Optional, more information
Flag: Optional, more (yes/no) information,[object Object]
Action icon: Brings up a menu enabling you to execute                a function, copy and paste, information, get help, etc
Clear/Delete icon: Removes information you entered                or removes box entirelyGeneral Syntax of BioBIKE … and icons to help you work with functions:
Sin Functions Sin (angle) Angle
Functions Length Entity
Functions Length Entity "icahLnlna bormA"    14 Abraham Lincoln  192   14 "Abraham Lincoln"  variable vs literal
Functions Length Entity "icahLnlna bormA"    14 Abraham Lincoln  192   14 "Abraham Lincoln"  US-presidents    44 list vs single value
Functions Length Entity "icahLnlna bormA"    14 Abraham Lincoln  192   14 "Abraham Lincoln"  US-presidents    44 (188 170 189 163 …) single application of a function vs iteration of a function
Sin Arcsin Functions Angle Angle
Arcsin Functions Angle Sin (angle) Nested functionsEvaluated from the inside outA box is replaced by its value
Functions "transposase" Gene (npf0076)
Nested functions Gene (npf0076) Evaluated from the inside outA box is replaced by its value
Functions Gene (npf0076) OptionsModify the characteristics of the function they govern
Pitfalls(the most common error in the language) Gene (npf0076) CLOSE BOXES BEFORE EXECUTINGWhite is incompatible with execution
Distinction betweenaresultand a display display result
[object Object],[object Object],[object Object],[object Object],[object Object]
              Demo
BioBIKE
Tour of BioBIKE : Integration of sequences across organisms & human insight We are interested in a highly conserved hypothetical protein:  asr1156
Very strange it start in different place different cyanobacteria! Is the start Wrong ? Collect the NT sequence including the upstream region. HOW ??? Translate into AA sequence Repeat X times Make an alignment
STEP I Find orthologs in other cyanobacteria
STEP I Find orthologs in other cyanobacteria
STEP I Find orthologs in other cyanobacteria
STEP I Find orthologs in other cyanobacteria
STEP I Find orthologs in other cyanobacteria
STEP I Find orthologs in other cyanobacteria
STEP I Find orthologs in other cyanobacteria
STEP II Align the proteins of the previous result
STEP II Align the proteins of the previous result
STEP II Align the proteins of the previous result
STEP II Align the proteins of the previous result
STEP II Align the proteins of the previous result
STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
A function may directly be applied on another function STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
The start is wrong !
Tour of BioBIKE :  integration of metabolism information, Bioinformatic tools & human knowledge How to find a regulatory motive? Example: GlnA
Mission impossible !!!
Find GlnA in the cyanobacterial genomes
Find GlnA in the cyanobacterial genomes Collect the sequences upstream
Find GlnA in the cyanobacterial genomes Collect the sequences upstream
Find GlnA in the cyanobacterial genomes Collect the sequences upstream
Find GlnA in the cyanobacterial genomes Collect the sequences upstream Search for a conserved motif among these sequences using MEME
Find GlnA in the cyanobacterial genomes Collect the sequences upstream Search for a conserved motif among these sequences using MEME
Find GlnA in the cyanobacterial genomes Collect the sequences upstream Search for a conserved motif among these sequences using MEME
OR
We have found a potential NtcA binding site! GT9NTAC
              Demo
Tour of BioBIKE II In this tour, you'll see how to: ,[object Object]
 Find the average contig size in a metagenome
 Find the average GC content within a metagenome
 Visualize the distribution of GC content amongst the contigs of a metagenome,[object Object]
Construct the code you want to execute here! For a visual guide to the VPL, click here
Construct the code you want to execute here! For a visual guide to the VPL, click here
A COUNT-OF function box is now in the workspace.  Before continuing with the problem, let's consider what function boxes mean.
Back to our story… we wanted to count the number of contiguous sequences in our favorite metagenome p-Arct.  Click on the gray argument box to activate it for entry, either from the keyboard or by insertion.
 Tour of BioBIKE III Sequence comparison In this tour, you'll see how to: ,[object Object]
 Find similar sequences amongst metagenomes
 Find similar sequences amongst known viruses
 Find similar sequences amongst everything in GenBank
 Make a sequence alignment
 Make a phylogenetic tree
 Save your work session,[object Object]
Clicking on any palette button brings down choices of functions or data to bring into the workspace. Click the function DISPLAY-SEQUENCE-OF.
A DISPLAY-SEQUENCE-OF function box is now in the workspace.  Before continuing with the problem, let's consider what function boxes mean.
Back to our story… we were displaying the sequence of our favorite metagenome contig, C60790.  Click on the gray argument box to activate it for entry, either from the keyboard or by insertion.
Now that the box is open, type in the name of the contig, C60790. Upper/lower case doesn't matter. When you're done, close the box by pressing Enter or Tab. If you forget to close the box, the function will not work.
To set the length of the lines to be displayed by mousing over the Options icon and clicking LINE-LENGTH. Actually, the default line length is perfectly OK. I did this just to show you an option in action.
Enter a value into the option entry box in the same way you entered a value into the argument box: Click on the box, type, then close the box by pressing Enter or Tab.
The default format for sequences is lines preceded by coordinates. If you want the sequence in FastA format, mouse over the Optionsicon and click FastA. (An example of a Flag in action)
The function is now complete. To execute it, mouse over the Actionicon and click Execute.
Displayed results appear in popup windows, which you can copy or save. When your done with it, click the red X in the upper right hand corner to get rid of it. FireFox has an upper limit on popup windows, so it's a good idea to clean up as you go.
Is the DNA sequence similar to any other metagenome sequence? To find out, mouse over the STRINGS-SEQUENCES menu and click SEQUENCE-SIMILAR-TO. This function allows you to search for similarity by pattern, by mismatches, or by Blast (default).
The function asks for two arguments: the query sequenceand the target sequences against which the query will be compared. The query is c60790, of course. We could enter it by typing, as before, but it is more interesting to copy and paste what you already typed. To do this mouse over the Action icon of the box containing c60790.
Click Copy.
To paste, mouse over the Action icon of the box into which you're pasting and click Paste.
Now to enter the target sequences – the set of all metagenome sequences. Click on the target box to open it for entry. Once the box is open, you could specify by typing that you want to search metagenomic sequences… if you knew what to type.
If you don't know, then mouse over the DATA button, then Organisms, then Metagenomes. Clicking on Metagenomes transfers it to the open target box.
Execute the completed function as before, mousing over the Action icon of the function and clicking Execute. Doing so starts Blast, which may take several seconds to complete execution.
You might expect that your sequence from P-Arct would find other sequences from the same metagenome. It does, but interestingly, after itself, the next 10 best hits are from the P-BBC metagenome. Use browser controls to save the box, if you like, then X out of it.
Of course the metagenome sequences are not annotated. Perhaps you can learn more about your sequence by comparing it to sequences from known viruses. To do this, clear the target box, open it up again by clicking on it…
…and bring down Known Viruses into the box.
Protein searches will find more sequences, mouse over the Options icon and specify that your DNA sequence is to be translated and compared to viral proteins.
Execute the completed function. Again, execution may take several seconds.
Only one hit, and a very poor one at that!  This is typical, because while ViroBIKE has virtually all known viral genomes, those that are known cover only a tiny fraction of viruses that exist in nature. X out of the window and clear known viruses so that we can try another approach.
There is a good deal more variety in organismal genomes than viral genomes, so let's search them. ViroBIKE does not keep organismal genomes locally, so we need to go out to GenBank. Click on the DATA button again.
…and this time click GenBank.
Execute the function as usual. This time we will be at the mercy of NCBI, and depending on the time of day and the phase of the moon, execution may take a minute or longer. By default, ViroBIKE times out execution at 40 seconds. If this occurs, you'll get a message like…
*** TIMEOUT ! TIMEOUT ! TIMEOUT *** *** COMPUTATION ABORTED AFTER 40 SECONDS *** *** YOU CAN: *** - contact support for help:      BioLinguaSupport@lists.Stanford.EDU *** - use the TOOLS -> PREFS menu or the       SET-TIMELIMIT function to extend your      timeout up to 1 hour *** - use RUNJOB to run your code in a      separate process *** - type (explain-timeout) at the      weblistener for detailed info.  You can change the time limit, but let's say that fate is with us and you get your result.
Interesting! Many highly significant hits from various bacteria…
…at different regions of your sequence.  At NCBI, that would be the end of the story. In ViroBIKE, it's the beginning, since you can work with your Blast results. First, we'll want to give the result a name.
To name a result, mouse over the DEFINITION menu and click DEFINE.
The DEFINE function asks for two arguments: the name of the variable and the value that will be assigned to it. Click on the variable entry box.
You can name the result anything you like, so long as the name does not contain spaces (hyphens and underscores are OK). I chose c67090-vs-NR. Press Tab after typing a name.
Tabbing opens up the next argument, the value box. The value to be assigned is the Blast table. There are many ways to retrieve that  result. One way is to recognize that it is the result of the previous function. Click the OTHER-COMMAND button...
…and click Previous-Result.
Executing the function will cause the variable you named to spring into existence, accessible through a new button. Watch for it!
We'll be using that VARIABLES button in a moment. For now, mouse over STRINGS-SEQUENCES, then SEARCH/COMPARE, and…
Click on BLAST-VALUE.  This function allows you to extract values from the Blast table.
What values do we want to extract? Recall…
7 of the top 27 hits came from the same region of your sequence, from coordinates 15 to 503. Notice also that the reading frame is the same in all cases, negative, indicating that the match is on the complementary strand. Let's extract the 7 sequences that matched. First specify the blast-table from which you'll extract data.
After opening up the blast-table entry box, mouse over the VARIABLES button and click the name of the variable you just created.
This brings the variable into the open box. Now specify the cells you want, by row numbers (lines) and column. Click to open the line box
Type the lines you want into the open box as a set: (2 6 10 14 17 20 23) In BioBIKE, elements of sets are separated by spaces, not commas. After typing in the list in parentheses, press TAB to move to the column box.
You can enter any column shown in the Blast table plus several other fields that are normally not displayed. One of these fields is the sequence of the target ("T-SEQ"). Type this into the column box and press Enter.
Executing the function will get you the seven bacterial target sequences matching the coordinate 15 – 503 region of your sequence.
We'd like to compare these bacteral sequences with the region from your sequence. But that region is a DNA sequence. We'll need to translate it.  To do this, click on the GENES-PROTEINS button
Mouse over TRANSLATION and click the TRANSLATION-OF function.
Open the argument box of TRANSLATION-OF for input. We want to put into this box your sequence, but just the portion from 15 to 503, and on the complementary strand. Mouse over the GENES-PROTEINS button to get a function that will extract what you want.
Click the SEQUENCE-OF function.
And paste it into the argument of SEQUENCE-OF. Executing now will translate the entire sequence. But we want only part of the sequence.
So mouse over Options icon and click the FROM option.
And do the same thing to get the TO option.
Now type into the FROM entry box the beginning coordinate, 15, and press TAB.
And type into the TO entry box the end coordinate, 503, and press ENTER.
The sequence needs to be inverted (read from the complementary strand), so choose that option.
And finally, we want to give the sequence a name so we can keep track of it during sequence comparisons. Uh-oh… The option, WITH-LABEL is off screen. One way to handle this is to make space by clearing a now unnecessary box.

Weitere ähnliche Inhalte

Andere mochten auch

Tim coulson send strategy launch
Tim coulson   send strategy launchTim coulson   send strategy launch
Tim coulson send strategy launchessexwebcontentteam
 
Kirkwood-Buff Theory of Solutions and the Development of Atomistic and Coarse...
Kirkwood-Buff Theory of Solutions and the Development of Atomistic and Coarse...Kirkwood-Buff Theory of Solutions and the Development of Atomistic and Coarse...
Kirkwood-Buff Theory of Solutions and the Development of Atomistic and Coarse...Nikos Bentenitis
 
Local offer parent engagement sessions
Local offer parent engagement sessionsLocal offer parent engagement sessions
Local offer parent engagement sessionsessexwebcontentteam
 
Supporting Independence: Prevention and Early Intervention - Sharon Longworth...
Supporting Independence: Prevention and Early Intervention - Sharon Longworth...Supporting Independence: Prevention and Early Intervention - Sharon Longworth...
Supporting Independence: Prevention and Early Intervention - Sharon Longworth...essexwebcontentteam
 
광주 국민디자인단
광주 국민디자인단광주 국민디자인단
광주 국민디자인단Young Choi
 
상명대학교
상명대학교상명대학교
상명대학교Young Choi
 

Andere mochten auch (9)

Send engagement day_june_2013
Send engagement day_june_2013Send engagement day_june_2013
Send engagement day_june_2013
 
Tim coulson send strategy launch
Tim coulson   send strategy launchTim coulson   send strategy launch
Tim coulson send strategy launch
 
Ppt com
Ppt comPpt com
Ppt com
 
Amalesh Resume
Amalesh ResumeAmalesh Resume
Amalesh Resume
 
Kirkwood-Buff Theory of Solutions and the Development of Atomistic and Coarse...
Kirkwood-Buff Theory of Solutions and the Development of Atomistic and Coarse...Kirkwood-Buff Theory of Solutions and the Development of Atomistic and Coarse...
Kirkwood-Buff Theory of Solutions and the Development of Atomistic and Coarse...
 
Local offer parent engagement sessions
Local offer parent engagement sessionsLocal offer parent engagement sessions
Local offer parent engagement sessions
 
Supporting Independence: Prevention and Early Intervention - Sharon Longworth...
Supporting Independence: Prevention and Early Intervention - Sharon Longworth...Supporting Independence: Prevention and Early Intervention - Sharon Longworth...
Supporting Independence: Prevention and Early Intervention - Sharon Longworth...
 
광주 국민디자인단
광주 국민디자인단광주 국민디자인단
광주 국민디자인단
 
상명대학교
상명대학교상명대학교
상명대학교
 

Ähnlich wie BioBike: Integrate and Analyze Biological Knowledge

1PhylogeneticAnalysisHomeworkassignmentThisa.docx
1PhylogeneticAnalysisHomeworkassignmentThisa.docx1PhylogeneticAnalysisHomeworkassignmentThisa.docx
1PhylogeneticAnalysisHomeworkassignmentThisa.docxfelicidaddinwoodie
 
Advanced BLAST (BlastP, PSI-BLAST)
Advanced BLAST (BlastP, PSI-BLAST)Advanced BLAST (BlastP, PSI-BLAST)
Advanced BLAST (BlastP, PSI-BLAST)Syed Lokman
 
Exercise 7B PostlabInstructions· Below is a list of the resou
Exercise 7B PostlabInstructions· Below is a list of the resouExercise 7B PostlabInstructions· Below is a list of the resou
Exercise 7B PostlabInstructions· Below is a list of the resouBetseyCalderon89
 
Recent Developments in SBML
Recent Developments in SBMLRecent Developments in SBML
Recent Developments in SBMLMike Hucka
 
Biopython programming workshop at UGA
Biopython programming workshop at UGABiopython programming workshop at UGA
Biopython programming workshop at UGAEric Talevich
 
Solving Cross-Cutting Concerns in PHP - DutchPHP Conference 2016
Solving Cross-Cutting Concerns in PHP - DutchPHP Conference 2016 Solving Cross-Cutting Concerns in PHP - DutchPHP Conference 2016
Solving Cross-Cutting Concerns in PHP - DutchPHP Conference 2016 Alexander Lisachenko
 
Decoupling shared code with state that needs to cleared in between uses
Decoupling shared code with state that needs to cleared in between usesDecoupling shared code with state that needs to cleared in between uses
Decoupling shared code with state that needs to cleared in between usesMichael Fons
 
APIdays Paris 2019 Backend is the new frontend by Antoine Cheron
APIdays Paris 2019 Backend is the new frontend by Antoine CheronAPIdays Paris 2019 Backend is the new frontend by Antoine Cheron
APIdays Paris 2019 Backend is the new frontend by Antoine Cheronapidays
 
one complete report from all the 4 labs.pdf
one complete report from all the 4 labs.pdfone complete report from all the 4 labs.pdf
one complete report from all the 4 labs.pdfstudy help
 
one complete report from all the 4 labs.pdf
one complete report from all the 4 labs.pdfone complete report from all the 4 labs.pdf
one complete report from all the 4 labs.pdfstudy help
 
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCE
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCEINTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCE
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCEIPutuAdiPratama
 
BIOGLYPHICS: A 2014 GENSPACE IGEM TEAM PROJECT
BIOGLYPHICS: A 2014 GENSPACE IGEM TEAM PROJECTBIOGLYPHICS: A 2014 GENSPACE IGEM TEAM PROJECT
BIOGLYPHICS: A 2014 GENSPACE IGEM TEAM PROJECTEric Fernandez
 
Functional Reactive Programming / Compositional Event Systems
Functional Reactive Programming / Compositional Event SystemsFunctional Reactive Programming / Compositional Event Systems
Functional Reactive Programming / Compositional Event SystemsLeonardo Borges
 
Lab 10.doc
Lab 10.docLab 10.doc
Lab 10.docbutest
 

Ähnlich wie BioBike: Integrate and Analyze Biological Knowledge (20)

1PhylogeneticAnalysisHomeworkassignmentThisa.docx
1PhylogeneticAnalysisHomeworkassignmentThisa.docx1PhylogeneticAnalysisHomeworkassignmentThisa.docx
1PhylogeneticAnalysisHomeworkassignmentThisa.docx
 
Advanced BLAST (BlastP, PSI-BLAST)
Advanced BLAST (BlastP, PSI-BLAST)Advanced BLAST (BlastP, PSI-BLAST)
Advanced BLAST (BlastP, PSI-BLAST)
 
Exercise 7B PostlabInstructions· Below is a list of the resou
Exercise 7B PostlabInstructions· Below is a list of the resouExercise 7B PostlabInstructions· Below is a list of the resou
Exercise 7B PostlabInstructions· Below is a list of the resou
 
The Infobiotics workbench
The Infobiotics workbenchThe Infobiotics workbench
The Infobiotics workbench
 
Recent Developments in SBML
Recent Developments in SBMLRecent Developments in SBML
Recent Developments in SBML
 
Biopython programming workshop at UGA
Biopython programming workshop at UGABiopython programming workshop at UGA
Biopython programming workshop at UGA
 
Solving Cross-Cutting Concerns in PHP - DutchPHP Conference 2016
Solving Cross-Cutting Concerns in PHP - DutchPHP Conference 2016 Solving Cross-Cutting Concerns in PHP - DutchPHP Conference 2016
Solving Cross-Cutting Concerns in PHP - DutchPHP Conference 2016
 
Neo4j and bioinformatics
Neo4j and bioinformaticsNeo4j and bioinformatics
Neo4j and bioinformatics
 
Bio4j
Bio4jBio4j
Bio4j
 
Decoupling shared code with state that needs to cleared in between uses
Decoupling shared code with state that needs to cleared in between usesDecoupling shared code with state that needs to cleared in between uses
Decoupling shared code with state that needs to cleared in between uses
 
Bio java
Bio javaBio java
Bio java
 
APIdays Paris 2019 Backend is the new frontend by Antoine Cheron
APIdays Paris 2019 Backend is the new frontend by Antoine CheronAPIdays Paris 2019 Backend is the new frontend by Antoine Cheron
APIdays Paris 2019 Backend is the new frontend by Antoine Cheron
 
one complete report from all the 4 labs.pdf
one complete report from all the 4 labs.pdfone complete report from all the 4 labs.pdf
one complete report from all the 4 labs.pdf
 
one complete report from all the 4 labs.pdf
one complete report from all the 4 labs.pdfone complete report from all the 4 labs.pdf
one complete report from all the 4 labs.pdf
 
HPC For Bioinformatics
HPC For BioinformaticsHPC For Bioinformatics
HPC For Bioinformatics
 
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCE
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCEINTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCE
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCE
 
BIOGLYPHICS: A 2014 GENSPACE IGEM TEAM PROJECT
BIOGLYPHICS: A 2014 GENSPACE IGEM TEAM PROJECTBIOGLYPHICS: A 2014 GENSPACE IGEM TEAM PROJECT
BIOGLYPHICS: A 2014 GENSPACE IGEM TEAM PROJECT
 
Functional Reactive Programming / Compositional Event Systems
Functional Reactive Programming / Compositional Event SystemsFunctional Reactive Programming / Compositional Event Systems
Functional Reactive Programming / Compositional Event Systems
 
Relation Extraction
Relation ExtractionRelation Extraction
Relation Extraction
 
Lab 10.doc
Lab 10.docLab 10.doc
Lab 10.doc
 

BioBike: Integrate and Analyze Biological Knowledge

  • 1. BioBike: a web-based environment for integration and analysis of Biological knowledge Biniam Abebe
  • 2.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9. We need… Biologists . . . . . . and Programmers
  • 13. BioBIKE INSTANCES AND THEIR KNOWLEDGE AND DATA BASES A BioBIKE instance provides a framework for all available information needed by a given research community Including Sets of genomic sequences Gene annotations Functional descriptions Formal categories (e.g. COG) hierarchical groupings of metabolic reactions linked with genes (from KEGG More………….
  • 14.
  • 15. 42 – Cyanobacteria
  • 17. 6 - Archeal virus , 758 – Bacteriophage, 754- Eubacteria, 1 - Eukaryotic Virus
  • 20.
  • 21. Blast - for sequence searches
  • 22. Clustal - for multiple sequence alignments
  • 23. Meme - for motif discovery;
  • 24. RNAz - for discovery of conserved RNA sequences;
  • 25. Phylip - for construction of phylogenetic trees. All are accessed through the same interface,
  • 26.
  • 27. Computability of results and nesting
  • 28. Small working vocabulary
  • 30.
  • 31. Function palette Workspace The BioBIKE environment is divided into three areas as shown. You'll bring functions down from the function palette to the workspace, execute them, and note the results in the results window Results window
  • 32. Construct the code you want to execute here! For a visual guide to the VPL, click here HELP! PROBLEM Two very important buttons on the function palette: On-line help (general) Something went wrong? Tell us!
  • 33. Construct the code you want to execute here! For a visual guide to the VPL, click here Two very important buttons in the workspace: Undo (return to workspace before last action) Redo (Get back the workspace you undid)
  • 34.
  • 35. Construct the code you want to execute here! For a visual guide to the VPL, click here
  • 36. Construct the code you want to execute here! For a visual guide to the VPL, click here
  • 37. A COUNT-OF function box is now in the workspace. Before continuing with the problem, let's consider what function boxes mean.
  • 38. A COUNT-OF function box is now in the workspace. Before continuing with the problem, let's consider what function boxes mean.
  • 39. Argument(object) Function-name Flag Keyword object General Syntax of BioBIKE The basic unit of BioBIKE is the function box. It consists of the name of a function, perhaps one or more required arguments, and optional keywords and flags. A function may be thought of as a black box: you feed it information, it produces a product.
  • 40.
  • 41. Argument: Required, acted on by function
  • 42. Keyword clause: Optional, more information
  • 43.
  • 44. Action icon: Brings up a menu enabling you to execute a function, copy and paste, information, get help, etc
  • 45. Clear/Delete icon: Removes information you entered or removes box entirelyGeneral Syntax of BioBIKE … and icons to help you work with functions:
  • 46. Sin Functions Sin (angle) Angle
  • 48. Functions Length Entity "icahLnlna bormA" 14 Abraham Lincoln 192 14 "Abraham Lincoln" variable vs literal
  • 49. Functions Length Entity "icahLnlna bormA" 14 Abraham Lincoln 192 14 "Abraham Lincoln" US-presidents 44 list vs single value
  • 50. Functions Length Entity "icahLnlna bormA" 14 Abraham Lincoln 192 14 "Abraham Lincoln" US-presidents 44 (188 170 189 163 …) single application of a function vs iteration of a function
  • 51. Sin Arcsin Functions Angle Angle
  • 52. Arcsin Functions Angle Sin (angle) Nested functionsEvaluated from the inside outA box is replaced by its value
  • 54. Nested functions Gene (npf0076) Evaluated from the inside outA box is replaced by its value
  • 55. Functions Gene (npf0076) OptionsModify the characteristics of the function they govern
  • 56. Pitfalls(the most common error in the language) Gene (npf0076) CLOSE BOXES BEFORE EXECUTINGWhite is incompatible with execution
  • 57. Distinction betweenaresultand a display display result
  • 58.
  • 59.
  • 60. Demo
  • 62. Tour of BioBIKE : Integration of sequences across organisms & human insight We are interested in a highly conserved hypothetical protein: asr1156
  • 63.
  • 64. Very strange it start in different place different cyanobacteria! Is the start Wrong ? Collect the NT sequence including the upstream region. HOW ??? Translate into AA sequence Repeat X times Make an alignment
  • 65. STEP I Find orthologs in other cyanobacteria
  • 66. STEP I Find orthologs in other cyanobacteria
  • 67. STEP I Find orthologs in other cyanobacteria
  • 68. STEP I Find orthologs in other cyanobacteria
  • 69. STEP I Find orthologs in other cyanobacteria
  • 70. STEP I Find orthologs in other cyanobacteria
  • 71. STEP I Find orthologs in other cyanobacteria
  • 72. STEP II Align the proteins of the previous result
  • 73. STEP II Align the proteins of the previous result
  • 74. STEP II Align the proteins of the previous result
  • 75. STEP II Align the proteins of the previous result
  • 76. STEP II Align the proteins of the previous result
  • 77.
  • 78.
  • 79.
  • 80.
  • 81.
  • 82. STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
  • 83. STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
  • 84. STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
  • 85. A function may directly be applied on another function STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
  • 86. STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
  • 87. STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
  • 88. STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
  • 89. STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
  • 90. STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
  • 91. STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
  • 92. STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
  • 93. STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
  • 94. STEP II Align the proteins of the previous result Align the proteinsequences extended uspstream
  • 95. The start is wrong !
  • 96.
  • 97.
  • 98. Tour of BioBIKE : integration of metabolism information, Bioinformatic tools & human knowledge How to find a regulatory motive? Example: GlnA
  • 100. Find GlnA in the cyanobacterial genomes
  • 101.
  • 102.
  • 103.
  • 104. Find GlnA in the cyanobacterial genomes Collect the sequences upstream
  • 105. Find GlnA in the cyanobacterial genomes Collect the sequences upstream
  • 106. Find GlnA in the cyanobacterial genomes Collect the sequences upstream
  • 107.
  • 108. Find GlnA in the cyanobacterial genomes Collect the sequences upstream Search for a conserved motif among these sequences using MEME
  • 109. Find GlnA in the cyanobacterial genomes Collect the sequences upstream Search for a conserved motif among these sequences using MEME
  • 110. Find GlnA in the cyanobacterial genomes Collect the sequences upstream Search for a conserved motif among these sequences using MEME
  • 111.
  • 112.
  • 113.
  • 114. OR
  • 115.
  • 116.
  • 117.
  • 118.
  • 119. We have found a potential NtcA binding site! GT9NTAC
  • 120. Demo
  • 121.
  • 122. Find the average contig size in a metagenome
  • 123. Find the average GC content within a metagenome
  • 124.
  • 125. Construct the code you want to execute here! For a visual guide to the VPL, click here
  • 126. Construct the code you want to execute here! For a visual guide to the VPL, click here
  • 127. A COUNT-OF function box is now in the workspace. Before continuing with the problem, let's consider what function boxes mean.
  • 128. Back to our story… we wanted to count the number of contiguous sequences in our favorite metagenome p-Arct. Click on the gray argument box to activate it for entry, either from the keyboard or by insertion.
  • 129.
  • 130.
  • 131.
  • 132.
  • 133.
  • 134.
  • 135.
  • 136.
  • 137.
  • 138.
  • 139.
  • 140.
  • 141.
  • 142.
  • 143.
  • 144.
  • 145.
  • 146.
  • 147.
  • 148.
  • 149.
  • 150.
  • 151.
  • 152.
  • 153.
  • 154.
  • 155.
  • 156.
  • 157.
  • 158.
  • 159.
  • 160.
  • 161.
  • 162.
  • 163.
  • 164.
  • 165.
  • 166.
  • 167.
  • 168.
  • 169.
  • 170.
  • 171.
  • 172.
  • 173.
  • 174.
  • 175.
  • 176.
  • 177.
  • 178.
  • 179.
  • 180.
  • 181. Find similar sequences amongst metagenomes
  • 182. Find similar sequences amongst known viruses
  • 183. Find similar sequences amongst everything in GenBank
  • 184. Make a sequence alignment
  • 185. Make a phylogenetic tree
  • 186.
  • 187. Clicking on any palette button brings down choices of functions or data to bring into the workspace. Click the function DISPLAY-SEQUENCE-OF.
  • 188. A DISPLAY-SEQUENCE-OF function box is now in the workspace. Before continuing with the problem, let's consider what function boxes mean.
  • 189. Back to our story… we were displaying the sequence of our favorite metagenome contig, C60790. Click on the gray argument box to activate it for entry, either from the keyboard or by insertion.
  • 190. Now that the box is open, type in the name of the contig, C60790. Upper/lower case doesn't matter. When you're done, close the box by pressing Enter or Tab. If you forget to close the box, the function will not work.
  • 191. To set the length of the lines to be displayed by mousing over the Options icon and clicking LINE-LENGTH. Actually, the default line length is perfectly OK. I did this just to show you an option in action.
  • 192. Enter a value into the option entry box in the same way you entered a value into the argument box: Click on the box, type, then close the box by pressing Enter or Tab.
  • 193. The default format for sequences is lines preceded by coordinates. If you want the sequence in FastA format, mouse over the Optionsicon and click FastA. (An example of a Flag in action)
  • 194. The function is now complete. To execute it, mouse over the Actionicon and click Execute.
  • 195. Displayed results appear in popup windows, which you can copy or save. When your done with it, click the red X in the upper right hand corner to get rid of it. FireFox has an upper limit on popup windows, so it's a good idea to clean up as you go.
  • 196. Is the DNA sequence similar to any other metagenome sequence? To find out, mouse over the STRINGS-SEQUENCES menu and click SEQUENCE-SIMILAR-TO. This function allows you to search for similarity by pattern, by mismatches, or by Blast (default).
  • 197. The function asks for two arguments: the query sequenceand the target sequences against which the query will be compared. The query is c60790, of course. We could enter it by typing, as before, but it is more interesting to copy and paste what you already typed. To do this mouse over the Action icon of the box containing c60790.
  • 199. To paste, mouse over the Action icon of the box into which you're pasting and click Paste.
  • 200. Now to enter the target sequences – the set of all metagenome sequences. Click on the target box to open it for entry. Once the box is open, you could specify by typing that you want to search metagenomic sequences… if you knew what to type.
  • 201. If you don't know, then mouse over the DATA button, then Organisms, then Metagenomes. Clicking on Metagenomes transfers it to the open target box.
  • 202. Execute the completed function as before, mousing over the Action icon of the function and clicking Execute. Doing so starts Blast, which may take several seconds to complete execution.
  • 203. You might expect that your sequence from P-Arct would find other sequences from the same metagenome. It does, but interestingly, after itself, the next 10 best hits are from the P-BBC metagenome. Use browser controls to save the box, if you like, then X out of it.
  • 204. Of course the metagenome sequences are not annotated. Perhaps you can learn more about your sequence by comparing it to sequences from known viruses. To do this, clear the target box, open it up again by clicking on it…
  • 205. …and bring down Known Viruses into the box.
  • 206. Protein searches will find more sequences, mouse over the Options icon and specify that your DNA sequence is to be translated and compared to viral proteins.
  • 207. Execute the completed function. Again, execution may take several seconds.
  • 208. Only one hit, and a very poor one at that! This is typical, because while ViroBIKE has virtually all known viral genomes, those that are known cover only a tiny fraction of viruses that exist in nature. X out of the window and clear known viruses so that we can try another approach.
  • 209. There is a good deal more variety in organismal genomes than viral genomes, so let's search them. ViroBIKE does not keep organismal genomes locally, so we need to go out to GenBank. Click on the DATA button again.
  • 210. …and this time click GenBank.
  • 211. Execute the function as usual. This time we will be at the mercy of NCBI, and depending on the time of day and the phase of the moon, execution may take a minute or longer. By default, ViroBIKE times out execution at 40 seconds. If this occurs, you'll get a message like…
  • 212. *** TIMEOUT ! TIMEOUT ! TIMEOUT *** *** COMPUTATION ABORTED AFTER 40 SECONDS *** *** YOU CAN: *** - contact support for help: BioLinguaSupport@lists.Stanford.EDU *** - use the TOOLS -> PREFS menu or the SET-TIMELIMIT function to extend your timeout up to 1 hour *** - use RUNJOB to run your code in a separate process *** - type (explain-timeout) at the weblistener for detailed info. You can change the time limit, but let's say that fate is with us and you get your result.
  • 213. Interesting! Many highly significant hits from various bacteria…
  • 214. …at different regions of your sequence. At NCBI, that would be the end of the story. In ViroBIKE, it's the beginning, since you can work with your Blast results. First, we'll want to give the result a name.
  • 215. To name a result, mouse over the DEFINITION menu and click DEFINE.
  • 216. The DEFINE function asks for two arguments: the name of the variable and the value that will be assigned to it. Click on the variable entry box.
  • 217. You can name the result anything you like, so long as the name does not contain spaces (hyphens and underscores are OK). I chose c67090-vs-NR. Press Tab after typing a name.
  • 218. Tabbing opens up the next argument, the value box. The value to be assigned is the Blast table. There are many ways to retrieve that result. One way is to recognize that it is the result of the previous function. Click the OTHER-COMMAND button...
  • 220. Executing the function will cause the variable you named to spring into existence, accessible through a new button. Watch for it!
  • 221. We'll be using that VARIABLES button in a moment. For now, mouse over STRINGS-SEQUENCES, then SEARCH/COMPARE, and…
  • 222. Click on BLAST-VALUE. This function allows you to extract values from the Blast table.
  • 223. What values do we want to extract? Recall…
  • 224. 7 of the top 27 hits came from the same region of your sequence, from coordinates 15 to 503. Notice also that the reading frame is the same in all cases, negative, indicating that the match is on the complementary strand. Let's extract the 7 sequences that matched. First specify the blast-table from which you'll extract data.
  • 225. After opening up the blast-table entry box, mouse over the VARIABLES button and click the name of the variable you just created.
  • 226. This brings the variable into the open box. Now specify the cells you want, by row numbers (lines) and column. Click to open the line box
  • 227. Type the lines you want into the open box as a set: (2 6 10 14 17 20 23) In BioBIKE, elements of sets are separated by spaces, not commas. After typing in the list in parentheses, press TAB to move to the column box.
  • 228. You can enter any column shown in the Blast table plus several other fields that are normally not displayed. One of these fields is the sequence of the target ("T-SEQ"). Type this into the column box and press Enter.
  • 229. Executing the function will get you the seven bacterial target sequences matching the coordinate 15 – 503 region of your sequence.
  • 230. We'd like to compare these bacteral sequences with the region from your sequence. But that region is a DNA sequence. We'll need to translate it. To do this, click on the GENES-PROTEINS button
  • 231. Mouse over TRANSLATION and click the TRANSLATION-OF function.
  • 232. Open the argument box of TRANSLATION-OF for input. We want to put into this box your sequence, but just the portion from 15 to 503, and on the complementary strand. Mouse over the GENES-PROTEINS button to get a function that will extract what you want.
  • 233. Click the SEQUENCE-OF function.
  • 234. And paste it into the argument of SEQUENCE-OF. Executing now will translate the entire sequence. But we want only part of the sequence.
  • 235. So mouse over Options icon and click the FROM option.
  • 236. And do the same thing to get the TO option.
  • 237. Now type into the FROM entry box the beginning coordinate, 15, and press TAB.
  • 238. And type into the TO entry box the end coordinate, 503, and press ENTER.
  • 239. The sequence needs to be inverted (read from the complementary strand), so choose that option.
  • 240. And finally, we want to give the sequence a name so we can keep track of it during sequence comparisons. Uh-oh… The option, WITH-LABEL is off screen. One way to handle this is to make space by clearing a now unnecessary box.
  • 241. Better. Now click on the Options icon
  • 242. And this time the WITH-LABEL option appears. Click on it.
  • 243. And fill in its entry box with a descriptive name. I chose "c60790-15-503R", indicating the contig, coordinates, and orientation. Note that the name must be in quotes.
  • 244. Executing the function should give an amino acid sequence resulting from the translation of the desired region of your sequence.
  • 245. We now have all the relevant sequences, ready to be joined together into a single list and compared. To join the sequences, mouse over the LISTS-TABLES button, then LIST-PRODUCTION, and click on the JOIN function
  • 246. We could define names for the bacterial sequence and the translated sequence, but… too much bother. Instead, cut and paste. Click on the Action icon of the function that produced the bacterial sequences…
  • 247. Cut the function box and paste it into the first argument box of JOIN.
  • 248. Then cut the TRANSLATION function…
  • 249. …and paste it into the second argument box of JOIN.
  • 250. Again, we could name the joined sequences and then align them, but it is easier simply to surround the JOIN function with the function that will do the aligning. Click on Surround with, from the Action icon menu.
  • 251. Then select ALIGNMENT-OF from the STRINGS-SEQUENCES menu, BIOINFORMATI-TOOLS submenu.
  • 252. It was a bit of work, but we finally have what we want: a single list consisting of the region of your sequence that is similar to the collection of bacterial sequences, all ready to be aligned. Go to the Action icon to execute.
  • 253. This is another function that usually requires several seconds.
  • 254. The alignment in the popup window shows us which regions are conserved in the putative open reading frame in your sequence. By including more divergent protein, we can assess whether the putative ORF retains motifs typical of this class of protein. From the alignment we can also generate a phylogenetic tree. X out of the window.
  • 255. And to save space, collapse the alignment box into a stub.
  • 256. The full function is still there, but it occupies less space on the screen. Now click on the Action icon of the ALIGNMENT-OF box to begin surrounding the function by a function that will create a phylogenetic tree.
  • 258. …and go to STRINGS-SEQUENCES, PHYLOGENETIC-TREE, TREE-OF to surround the alignment with the tree function.
  • 259. The function will store much tree-related information on disk, in case you want to modify the tree later. It needs to know the name of a new directory in which to put the information. I chose "c60790-orf1".
  • 260. There are many ways of constructing trees. I chose PARSIMONY -- estimating phylogenetic proximity by the number of steps it takes to go from one sequence to another.
  • 261. Execute. After several seconds, the function will give you the same alignment you saw before and a few seconds after that a tree.
  • 262. The three Sphingomonas proteins cluster together, as do the Erythrobacter proteins. Then there's yours.
  • 263. If you want to return to this session or refer to it later, you can save it by mousing over the EDIT button and clicking Save user session.

Hinweis der Redaktion

  1. BioBike is composed of three integrated components: (i) a biological knowledge base, (ii) a graphical programming interface and (iii) an extensible set of tools that can be combined in novel ways
  2. BioBIKE available at biobike.csbc.vcu.edu for now only on Firefox platform BioBIKE INSTANCES AND THEIR KNOWLEDGE AND DATA BASESCyanoBIkePhantomeSterptobikestephylobikeViroBike
  3. BioBIKE INSTANCES AND THEIR KNOWLEDGE AND DATA BASESA BioBIKE instance provides a framework for all availableinformation needed by a given research community including sets of genomic sequences,gene annotations,functional descriptions,formal categories (e.g. COG), hierarchical groupings of metabolic reactions linked with genes (from KEGG, 2) and internal tables of Blast scores to support rapid protein comparisons. In addition, an instance may be stocked with experimental data, such as results from microarray or proteomic experiments.Indeed, any data that can be put into a standardized form, such as a table or XML structure, can be
  4. Useful tools not already in the language that have Application Programming Interfaces (APIs), or that are capable of running within a Linux environment can generally be added to BioBIKE on request with little difficulty, and thus be made accessible to BioBIKE users through the standard graphical programminginterface.
  5. ----- Meeting Notes (6/7/11 14:47) -----include the real result
  6. ----- Meeting Notes (6/7/11 14:47) -----biology example
  7. ----- Meeting Notes (6/7/11 14:47) -----present problem solve by biobike