The BioExtract Server (bioextract.org), funded by the United States National Science Foundation, is a Web-based, workflow-enabling system designed to aid researchers in the analysis of genomic data by providing a platform for the creation of bioinformatics workflows. The BioExtract Server provides: 1) a flexible querying and retrieval interface to National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI) non-redundant nucleotide and protein databases, 2) the ability to filter query results and use them as input into analytic tools, 3) the facility to save query results as data extracts which are automatically integrated into the system as searchable data sets, 4) access to analytic tools including a large list of curated Web services such as Emboss (http://www.ebi.ac.uk/Tools/emboss/) and BioMart (http://www.biomart.org/) resources, 5) the ability to save a series of BioExtract Server tasks (e.g. query a data source, save a data extract and execute an analytic tool) as a workflow, and 6) the opportunity for researchers to share their data extracts, analytic tools and workflows with collaborators.
10. Uploaded iPlant Data File
Sample Upload File to iPlant Discovery Environment
File Size ~2.27 G
Queue Time ~3 minutes
Transfer time
~36 minutes
7/24/2013
25. Creating a Query
7/24/2013
1. Select UniProtKB.
2. Use the Search Field drop-down menu and select Gene.
3. In the Search Term box, type myb*.
4. Click the Add Search Line button to add an additional
search expression. Boolean search expression options are
AND, OR, AND NOT. For our example, select AND.
5. From the Search Field drop-down menu, select Taxonomy.
6. In the Search Term box, type in Panicoideae.
7. Click Submit Query button. Query results will display under
the “Extracts” tab.
31. Analytic Tool Availability
• Access the set of commonly used analytic tools that have
been incorporated into the system including iPlant tools
available through the iPlant Foundation API
32. Analyzing Data within the BioExtract
Server
7/24/2013
The basic steps for executing
tools are:
Step 1. Select a tool
Step 2. Input some data
Step 3. Define parameters
Step 4. Click Execute and wait
Step 5. View tool results
34. Executing an Analytic Tool
7/24/2013
1. Select blastn from the list of Similarity Search Tools in the Tools list
2. Select the Paste or type data into the text area radio button
3. Enter: XM_001061943
4. Click the Execute button
42. Use File Stored at iPlant as Input into
Analytic Tool
7/24/2013
43. BioExtract Server Workflows
7/24/2013
• As the researcher works with the system, their
tasks or “steps” are saved in the background.
• At any time these steps can be saved as a
workflow simply by providing a name and
description.
• Once saved, these workflows can be executed
and/or modified.
45. Creating a Workflow
7/24/2013
1. Find all protein sequences that relate to the FXN gene in humans but
excluding the full sequence.
2. Save the result set.
3. Execute the EMBOSS showalign analytic tool to align the resulting
sequence record.
4. Execute NCBI BLASTP to find similar sequences from the swissprot
database.
5. Execute TCoffee to perform a multiple sequence alignment of the similar
sequences.
6. Execute iPlant’s Muscle to perform another multiple seqeunces alignment
for comparison.
7. Finally, perform a multiple sequence alignment using iPlant’s Clustal
Omega using the results from the TCoffee alignment.
46. Find all protein sequences that relate to the FXN gene in
humans but excluding the full sequence.
7/24/2013
68. Modifying a Workflow
7/24/2013
Modify the query to search for the wcaG gene in Salmonella Typhimurium
common:gene=wcaG AND common:species=Salmonella AND
common:defn='Typhimurium'
GUI Search
Field
Common search fields
All Text common:all
Id common:id
Author common: author
Title common: title
Accession common: accn
Definition common: defn
Feature Key common: fkey
Gene common: gene
Keywords common: keyword
Species common: species
71. The myexperiment.org social web site, launched in
November 2007
myExperiment is a collaborative environment where
scientists can safely publish their workflows and
experiment plans, share them with groups and find
those of others.
myExperiment is brought to you by a joint team from the
University of Southampton and The University of
Manchester in the UK, led by David De Roure and Carole
Goble,
76. Acknowledgements
Volker Brendel - Professor of Biology and Computer
Science at Indiana University in Bloomington.
Rion Dooley – Manager, Web & Cloud Services Group,
Texas Advanced Computing Center.
iPlant Collaborative - This work was created using
resources or cyberinfrastructure provided by the iPlant
Collaborative. The iPlant Collaborative is funded by a
grant from the National Science Foundation (#DBI-
0735191). URL: www.iplantcollaborative.org
Funded through NSF IOS-1126481 NSF Plant Genome Research
National Science Foundation