SlideShare ist ein Scribd-Unternehmen logo
1 von 58
Downloaden Sie, um offline zu lesen
UNIVERSITY OF
CALIFORNIA
An introduction to Web Apollo.
Manual Annotation Workshop at UIUC
Monica Munoz-Torres, PhD | @monimunozto
Berkeley Bioinformatics Open-Source Projects (BBOP)
Genomics Division, Lawrence Berkeley National Laboratory
11 June, 2014
UNIVERSITY OF
CALIFORNIA
Outline
1.  What is Web Apollo?:
• Definition & working concept.
2.  Our Experience With Community
Based Curation.
3.  The Thought Process Behind Manual
Annotation.
4.  Becoming Acquainted With Web
Apollo.
5.  Demonstration and Exercises.
An introduction to
Web Apollo.
Manual Annotation
Workshop at UIUC
Outline 3
During this workshop you will:
•  Learn to identify homologs of known genes of interest in
your newly sequenced genome.
•  Become familiar with the environment and functionality
of the Web Apollo genome annotation editing tool.
•  Learn how to corroborate and / or modify automatically
annotated gene models using all available evidence in
Web Apollo.
•  Understand the process of curation in the context of
genome annotation: from the assembled genome
sequence to manual curation via automated annotation,
among other things.
Footer 4
During this lecture I invite you to:
Observe and detail the figures
Listen to the explanations
Interrupt me at any time to ask questions
Use Twitter & share your thoughts: I am @monimunozto
A few suggested tags & users: #WebApollo #Annotation
#Curation #GMOD #HPCBio #HoneyBee #genome
@JBrowseGossip
Take brakes: LBL’s ergo safety team suggests I should not work
at the computer for >55 minutes without a break; neither
should you!
We will be here for 2.5 hours: please get up and stretch your
neck, arms, and legs as often as you need.
Footer 5
Let Us Get Started
What is Web Apollo?
•  Web Apollo is a web-based, collaborative genomic
annotation editing platform.
We	
  need	
  annota)on	
  edi)ng	
  tools	
  to	
  modify	
  and	
  refine	
  the	
  
precise	
  loca)on	
  and	
  structure	
  of	
  the	
  genome	
  elements	
  that	
  
predic)ve	
  algorithms	
  cannot	
  yet	
  resolve	
  automa)cally.
71. What is Web Apollo?
Find more about Web Apollo at
http://GenomeArchitect.org
and
Genome Biol 14:R93. (2013).
Brief history of Apollo*:
a. Desktop:
one person at a time editing a
specific region, annotations
saved in local files; slowed down
collaboration.
b. Java Web Start:
users saved annotations directly
to a centralized database;
potential issues with stale
annotation data remained.
1. What is Web Apollo? 8
Biologists could finally visualize computational analyses and
experimental evidence from genomic features and build
manually-curated consensus gene structures. Apollo became a
very popular, open source tool (insects, fish, mammals, birds, etc.).
*
Web Apollo
•  Browser-based tool integrated with JBrowse.
•  Two new tracks: “Annotation” and “DNA Sequence”
•  Allows for intuitive annotation creation and editing,
with gestures and pull-down menus to create and
modify transcripts and exons
structures, insert comments
(CV, freeform text), etc.
•  Customizable look & feel.
•  Edits in one client are
instantly pushed to all other
clients: Collaborative!
1. What is Web Apollo? 9
Web Apollo
1. What is Web Apollo? 10
Chado
UCSC (MySQL)
Ensembl (DAS)
BAM
BED
BigWig
GFF3
MAKER output
•  Provides dynamic access to genomic analysis results
from both UCSC and Chado databases as well as
database storage of user-created annotations. All
user-created sequence annotations are automatically
uploaded to a server, ensuring reliability.
- BAM
- BigWig
- GFF3
- VCF*
Trellis
Data Broker
(Java)
Static JSON
Generation Pipeline
(Perl)
Server-side Data Service Annotation Editing Engine (Java)
Berkeley DB
realtime store
User
Management
Data Sources
Analysis Pipelines
- BAM
- BED
- BigWig
- GFF3
- MAKER
output
Data Repositories
Chado
MySQL
DAS servers
e.g. Ensembl
Annotation Exports
Local DB.
e.g. Chado
- GFF3
- FASTA
Annotators
Web
Apollo
JBrowse
Apollo Edit Operations
& User Management
User Interface (JavaScript)
JSON
Web Apollo
Architecture
Working
Concept
In the context of gene manual annotation,
curation tries to find the best examples
and / or eliminate most errors.
To conduct manual annotation efforts:
Gather and evaluate all available evidence
using quality-control metrics to
corroborate or modify automated
annotation predictions.
Perform sequence similarity searches
(phylogenetic framework) and use
literature and public databases to:
• Predict functional assignments from
experimental data.
• Distinguish orthologs from paralogs,
and classify gene membership in
families and networks.
2. In our experience. 12
Automated gene models
Evidence:
cDNAs, HMM domain searches,
alignments with assemblies or
genes from other species.
Manual annotation & curation
Dispersed, community-based gene
manual annotation efforts.
We continuously train and support
hundreds of geographically dispersed
scientists from many research
communities, to perform biologically
supported manual annotations using
Web Apollo.
– Gate keepers and monitoring.
– Written tutorials.
– Training workshops and geneborees.
– Personalized user support.
2. In our experience. 13
What we have learned.
Harvesting expertise from dispersed researchers who
assigned functions to predicted and curated peptides
we have developed more interactive and
responsive tools, as well as better visualization,
editing, and analysis capabilities.
142. In our experience.
http://people.csail.mit.edu/fredo/PUBLI/Drawing/
Collaborative Efforts Improved
Automated Annotations*
In many cases, automated annotations have been
improved (e.g: Apis mellifera. Elsik et al. BMC Genomics 2014, 15:86).
Also, learned of the challenges of newer sequencing
technologies, e.g.:
– Frameshifts and indel errors
– Split genes across scaffolds
– Highly repetitive sequences
To face these challenges, we train annotators in
recovering coding sequences in agreement with all
available biological evidence.
152. In our experience.
It is helpful to work together.
Scientific community efforts bring together domain-
specific and natural history expertise that would
otherwise remain disconnected.
Breaking down large amounts of data into
manageable portions and mobilizing groups
of researchers to extract the most accurate
representation of the biology from all
available data distills invaluable
knowledge from genome analysis.
162. In our experience.
Understanding the evolution of sociality
Comparing the genomes of 7 species of ants
contributed to a better understanding of the
evolution and organization of insect societies
at the molecular level.
Insights drawn mainly from six core aspects of
ant biology:
1.  Alternative morphological castes
2.  Division of labor
3.  Chemical Communication
4.  Alternative social organization
5.  Social immunity
6.  Mutualism
17
Libbrecht et al. 2012. Genome Biology 2013, 14:212
2. In our experience.
Atta cephalotes (above) and Harpegnathos saltator.
©alexanderwild.com
Groups of
communities
continue to guide
our efforts.
A little training goes a long way!
With the right tools, wet lab scientists make exceptional
curators who can easily learn to maximize the
generation of accurate, biologically supported gene
models.
182. In our experience.
Manual
Annotation
How do we get there?
19
Assembly
Manual
annotation
Experimental
validation
Automated
Annotation
In a genome sequencing project…
Q-ratore
3. How do we get there?
Gene Prediction
Identification of protein-coding genes, tRNAs, rRNAs,
regulatory motifs, repetitive elements (masked), etc.
- Ab initio (DNA composition): Augustus, GENSCAN,
geneid, fgenesh
- Homology-based: E.g: SGP2, fgenesh++
20
Nucleic Acids 2003 vol. 31 no. 13 3738-3741
3. How do we get there?
Gene Annotation
Integration of data from prediction tools to generate a
consensus set of predictions or gene models.
•  Models may be organized using:
-  automatic integration of predicted sets; e.g: GLEAN
-  packaging necessary tools into pipeline; e.g: MAKER
•  All available biological evidence (e.g. transcriptomes) further
informs the annotation process.
213. How do we get there?
In some cases algorithms and metrics used to generate
consensus sets may actually reduce the accuracy of the
gene’s representation; in such cases it is usually better to
use an ab initio model to create a new annotation.
Manual Genome Annotation
•  Identifies elements that best represent the underlying
biology.
•  Eliminates elements that reflect the systemic errors of
automated genome analyses.
•  Determines functional roles through comparative
analysis of well-studied, phylogenetically similar
genome elements using literature, databases, and
the researcher’s experience.
223. How do we get there?
Curation Process is Necessary
1.  A computationally predicted consensus gene set is
generated using multiple lines of evidence.
2.  Manual annotation takes place.
3.  Ideally consensus computational predictions will be
integrated with manual annotations to produce an
updated Official Gene Set (OGS).
Otherwise, “incorrect and incomplete genome annotations
will poison every experiment that uses them”.
- M. Yandell.
233. How do we get there?
Web Apollo
Sort
Web Apollo
25
The Sequence Selection Window
4. Becoming Acquainted with Web Apollo.
25
Navigation tools:
pan and zoom Search box: go
to a scaffold or
a gene model.
Grey bar of coordinates
indicates location. You can
also select here in order to
zoom to a sub-region.
‘View’: change
color by CDS,
toggle strands,
set highlight.
‘File’:
Upload your own
evidence: GFF3,
BAM, BigWig, VCF*.
Add combination
and sequence
search tracks.
‘Tools’:
Use BLAT to query the
genome with a protein
or DNA sequence.
Available Tracks
Evidence Tracks Area
‘User-created Annotations’ Track
Login
Web Apollo
26
Graphical User Interface (GUI) for editing annotations
4. Becoming Acquainted with Web Apollo.
Web Apollo Functionality
In addition to protein-coding gene annotation that you know and love.
•  Non-coding genes: ncRNAs, miRNAs, repeat regions, and TEs
•  Sequence alterations (less coverage = more fragmentation)
•  Visualization of stage and cell-type specific transcription data as
coverage plots, heat maps, and alignments
27	
4. Becoming Acquainted with Web Apollo.
27
1.  Select a chromosomal region of interest, e.g. scaffold.
2.  Select appropriate evidence tracks.
3.  Determine whether a feature in an existing evidence track will
provide a reasonable gene model to start working.
-  If yes: select and drag the feature to the ‘User-created
Annotations’ area, creating an initial gene model. If necessary
use editing functions to adjust the gene model.
-  If not: let’s talk.
4.  Check your edited gene model for integrity and accuracy by
comparing it with available homologs.
4. Becoming Acquainted with Web Apollo
General Process of Curation
28 |
Always remember: when annotating gene models using Web
Apollo, you are looking at a ‘frozen’ version of the genome
assembly and you will not be able to modify the assembly itself.
28
Choose (click or drag) appropriate evidence tracks from the list on the left.
Click on an exon to select it. Double click on an exon or single click on an intron to select the
entire gene.
Select & drag any elements from an evidence track into the curation area: these are editable and
considered the curated version of the gene. Other options for elements in evidence tracks
available from right-click menu.
If you select an exon or a gene, then every track is automatically searched for exons with exactly
the same co-ordinates as what you selected. Matching edges are highlighted red.
Hovering over an annotation in progress brings up an information pop-up.
User Navigation
29 | 29	
4. Becoming Acquainted with Web Apollo.
Right-click menu:
•  With the exception of deleting a
model, all edits can be reversed with
‘Undo’ option. ‘Redo’ also available.
All changes are immediately saved
and available to all users in real time.
•  ‘Get sequence’ retrieves peptide,
cDNA, CDS, and genomic sequences.
•  You can select an exon and select
‘Delete’. You can create an intron, flip
the direction, change the start or split
the gene.
User Navigation
30 | 30	
4. Becoming Acquainted with Web Apollo.
Right-click menu:
•  If you select two gene models, you
can join them using ‘Merge’, and you
may also ‘Split’ a model.
•  You can select ‘Duplicate’, for
example to annotate isoforms.
•  Set translation start, annotate
selenocysteine-containing proteins,
match edges of annotation to those of
evidence tracks.
User Navigation
31 | 31	
4. Becoming Acquainted with Web Apollo.
User Navigation
32	
Annotations, annotation edits, and History: stored in a centralized database.
4. Becoming Acquainted with Web Apollo.
32
User Navigation
33	
The Annotation Information Editor
4. Becoming Acquainted with Web Apollo.
DBXRefs are database crossreferences: if
you have reason to believe that this gene
is linked to a gene in a public database
(including your own), then add it here.
33
User Navigation
34	
The Annotation Information Editor
4. Becoming Acquainted with Web Apollo.
•  Add PubMed IDs
•  Include GO terms as appropriate
from any of the three ontologies
•  Write comments stating how you
have validated each model.
34
User Navigation
35 |
•  ‘Zoom to base level’ option reveals
the DNA Track.
•  Change color of exons by CDS
from the ‘View’ menu.
•  The reference DNA sequence is
visible in both directions as are the
protein translations in all six
frames. You can toggle either
direction to display only 3 frames.
Zoom in/out with keyboard:
shift + arrow keys up/down
35	
4. Becoming Acquainted with Web Apollo.
Web Apollo User Guide
(Fragment)
http://genomearchitect.org/web_apollo_user_guide
In a “simple case” the predicted gene model is correct or
nearly correct, and this model is supported by evidence
that completely or mostly agrees with the prediction.
Evidence that extends beyond the predicted model is
assumed to be non-coding sequence.
The following sections describe simple modifications.
Annotating Simple Cases
37 | 37	
4. Becoming Acquainted with Web Apollo.
Select and drag the putative new exon from a track, and add it directly to an
annotated transcript in the ‘User-created Annotations’ area.
•  Click the exon, hold your finger on the mouse button, and drag the cursor
until it touches the receiving transcript. A dark green highlight indicates it
is okay to release the mouse button.
•  When released, the additional exon becomes attached to the receiving
transcript.
Adding exons
38 |
•  A confirmation box will
warn you if the receiving
transcript is not on the
same strand as the
feature where the new
exon originated.
38	
4. Becoming Acquainted with Web Apollo.
Each time you add an exon region, whether by extension or
adding an exon, Web Apollo recalculates the longest
ORF, identifying ‘Start’ and ‘Stop’ signals and allowing you
to determine whether a ‘Stop’ codon has been
incorporated after each editing step.
Adding exons
39 |
Web Apollo demands that an exon already exists as an evidence in one of the
tracks. You could provide a text file in GFF format and select File à Open. GFF
is a simple text file delimited by TABs, one line for each genomic ‘feature’:
column 1 is the name of the scaffold; then some text (irrelevant), then ‘exon’,
then start, stop, strand as + or -, a dot, another dot, and Name=some name
Example:
scaffold_88 	
  Qratore	
  exon	
  21 	
  2111	
  + 	
  . 	
  . 	
  Name=bob	
  
scaffold_88 	
  Qratore	
  exon	
  2201	
  5111	
  + 	
  . 	
  . 	
  Name=rad	
  
39	
4. Becoming Acquainted with Web Apollo.
Gene predictions may or may not include
UTRs. If transcript alignment data are
available and extend beyond your original
annotation, you may extend or add UTRs.
1.  Position the cursor at the beginning of
the exon that needs to be extended and
‘Zoom to base level’.
2.  Place the cursor over the edge of the
exon until it becomes a black arrow then
click and drag the edge of the exon to the
new coordinate position that includes the
UTR.
Adding UTRs
40 |
View zoomed to base level. The DNA track
and annotation track are visible. The DNA
track includes the sense strand (top) and anti-
sense strand (bottom). The six reading frames
flank the DNA track, with the three forward
frames above and the three reverse frames
below. The User-created Annotation track
shows the terminal end of an annotation. The
green rectangle highlights the location of the
nucleotide residues in the ‘Stop’ signal.
To add a new spliced UTR to an existing
annotation follow the procedure for
adding an exon.
40	
4. Becoming Acquainted with Web Apollo.
1.  Zoom in sufficiently to clearly resolve each exon as a distinct
rectangle.
2.  Two exons from different tracks sharing the same start and/or
end coordinates, will display a red bar to indicate the matching
edges.
3.  Selecting the whole annotation or one exon at a time, use this
‘edge-matching’ function and scroll along the length of the
annotation, verifying exon boundaries against available data.
Use square [ ] brackets to scroll from exon to exon.
4.  Note if there are cDNA / RNAseq that lack one or more of the
annotated exons or include additional exons.
Exon structure integrity
41 | 41	
4. Becoming Acquainted with Web Apollo.
To modify an exon boundary and
match data in the evidence tracks:
select both the offending exon
and the feature with the expected
boundary, then right click on the
annotation to select ‘Set 3’ end’ or
‘Set 5’ end’ as appropriate.
Exon structure integrity
42 |
In some cases all the data may disagree with the annotation, in other
cases some data support the annotation and some of the data support
one or more alternative transcripts. Try to annotate as many
alternative transcripts as are well supported by the data.
42	
4. Becoming Acquainted with Web Apollo.
Flags non-
canonical splice
sites.
Selection of features and
sub-features
Edge-matching
Evidence Tracks Area
‘User-created Annotations’ Track
The editing logic in the server:
§  selects longest ORF as CDS
§  flags non-canonical splice sites
43	
Editing Logic
4. Becoming Acquainted with Web Apollo.
Zoom to base level to review non-
canonical splice site warnings.
These do not necessarily need
to be corrected, but should be
flagged with the appropriate
comment.
Splice sites
44 |
Exon/intron junction possible error
Original model
Curated model
Non-canonical splices are indicated by an
orange circle with a white exclamation point
inside, placed over the edge of the offending
exon.
Most insects, have a valid non-canonical site
GC-AG. Other non-canonical splice sites are
unverified. Web Apollo flags GC splice donors
as non-canonical.
Canonical splice sites:
3’-…exon]GA / TG[exon…-5’
5’-…exon]GT / AG[exon…-3’
reverse strand, not reverse-complemented:
forward strand
44	
4. Becoming Acquainted with Web Apollo.
Some gene prediction algorithms do not
recognize GC splice sites, thus the intron/
exon junction may be incorrect. For example,
one such gene prediction algorithm may
ignore a true GC donor and select another
non-canonical splice site that is less
frequently observed in nature.
Therefore, if upon inspection you find a non-
canonical splice site that is rarely observed
in nature, you may wish to search the region
for a more frequent in-frame non-canonical
splice site, such as a GC donor. If there is an
in-frame site close that is more likely to be
the correct splice donor, you may make this
adjustment while zoomed at base level.
Splice sites (keep this in mind)
45 |
Exon/intron junction possible error
Original model
Curated model
Use RNA-Seq data to
make a decision.
Canonical splice sites:
3’-…exon]GA / TG[exon…-5’
5’-…exon]GT / AG[exon…-3’
reverse strand, not reverse-complemented:
forward strand
45	
4. Becoming Acquainted with Web Apollo.
Web Apollo calculates the longest possible
open reading frame (ORF) that includes
canonical ‘Start’ and ‘Stop’ signals within the
predicted exons.
If it appears to have calculated an incorrect
‘Start’ signal, you may modify it selecting an
in-frame ‘Start’ codon further up or
downstream, depending on evidence (protein
database, additional evidence tracks). An
upstream ‘Start’ codon may be present
outside the predicted gene model, within a
region supported by another evidence track.
‘Start’ and ‘Stop’ sites
46 | 46	
4. Becoming Acquainted with Web Apollo.
Note that the ‘Start’ codon may also be located in a non-predicted exon
further upstream. If you cannot identify that exon, add the appropriate
note in the transcript’s ‘Comments’ section.
In very rare cases, the actual ‘Start’ codon may be non-canonical (non-
ATG).
In some cases, a ‘Stop’ codon may not be automatically identified. Check
to see if there are data supporting a 3’ extension of the terminal exon
or additional 3’ exons with valid splice sites.
‘Start’ and ‘Stop’ sites (keep in mind)
47 | 47	
4. Becoming Acquainted with Web Apollo.
Evidence may support joining two or more different gene models. Warning: protein
alignments may have incorrect splice sites and lack non-conserved regions!
1.  Drag and drop each gene model to ‘User-created Annotations’ area. Shift click
to select an intron from each gene model and right click to select the ‘Merge’
option from the menu.
2.  Drag supporting evidence tracks over the candidate models to corroborate
overlap, or review edge matching and coverage across models.
3.  Check the resulting translation by querying a protein database e.g. UniProt.
Record the IDs of both starting gene models in ‘DBXref’ and add comments to
record that this annotation is the result of a merge.
Complex Cases:
Merge two gene predictions on the same scaffold
48 | 48	
Red lines around exons:
‘edge-matching’ allows annotators to confirm whether the
evidence is in agreement without examining each exon
at the base level.
4. Becoming Acquainted with Web Apollo.
One or more splits may be recommended when different
segments of the predicted protein align to two or more
different families of protein homologs, and the predicted
protein does not align to any known protein over its entire
length. Transcript data may support a split (if so, verify that
it is not a case of alternative transcripts).
Complex Cases: Split a gene prediction
49 | 49	
4. Becoming Acquainted with Web Apollo.
DNA Track
‘User-created Annotations’ Track
Complex Cases: Frameshifts, single-
base errors, and selenocysteines
50	
4. Becoming Acquainted with Web Apollo.
1.  Web Apollo allows annotators to make single base modifications or frameshifts that are
reflected in the sequence and structure of any transcripts overlapping the modification. Note
that these manipulations do NOT change the underlying genomic sequence.
2.  If you determine that you need to make one of these changes, zoom in to the nucleotide
level and right click over a single nucleotide on the genomic sequence to access a menu that
provides options for creating insertions, deletions or substitutions.
3.  The ‘Create Genomic Insertion’ feature will require you to enter the necessary string of
nucleotide residues that will be inserted to the right of the cursor’s current location. The
‘Create Genomic Deletion’ option will require you to enter the length of the deletion, starting
with the nucleotide where the cursor is positioned. The ‘Create Genomic Substitution’ feature
asks for the string of nucleotide residues that will replace the ones on the DNA track.
4.  Once you have entered the modifications, Web Apollo will recalculate the corrected
transcript and protein sequences, which will appear when you use the right-click menu ‘Get
Sequence’ option. Since the underlying genomic sequence is reflected in all annotations that
include the modified region you should alert the curators of your organisms database using
the ‘Comments’ section to report the CDS edits.
5.  In special cases such as selenocysteine containing proteins (read-throughs), right-click over
the offending/premature ‘Stop’ signal and choose the ‘Set readthrough stop codon’ option
from the menu.
51 | 51	
Complex Cases: Frameshifts, single-
base errors, and selenocysteines
4. Becoming Acquainted with Web Apollo.
Follow our checklist until you are happy with the annotation!
Then:
–  Comment to validate your annotation, even if you made no
changes to an existing model. Your comments mean you
looked at the curated model and are happy with it; think of
it as a vote of confidence.
–  Or add a comment to inform the community of unresolved
issues you think this model may have.
Completing the annotation
52 | 52	
Always Remember: Web Apollo curation is a community effort so
please use comments to communicate the reasons for your
annotation (your comments will be visible to everyone).
4. Becoming Acquainted with Web Apollo.
1.  Can you add UTRs (e.g.: via RNA-Seq)?
2.  Check exon structures
3.  Check splice sites: most splice sites display
these residues …]5’-GT/AG-3’[…
4.  Check ‘Start’ and ‘Stop’ sites
5.  Check the predicted protein product(s)
–  Align it against relevant genes/gene family.
–  blastp against NCBI’s RefSeq or nr
6.  If the protein product still does not look correct
then check:
–  Are there gaps in the genome?
–  Merge of 2 gene predictions on the same
scaffold
–  Merge of 2 gene predictions from different
scaffolds
–  Split a gene prediction
–  Frameshifts
–  error in the genome assembly?
–  Selenocysteine, single-base errors, and other
inconvenient phenomena
Checklist for Accuracy and Integrity
53 | 53	
7.  Finalize annotation by adding:
–  Important project information in the form of canned
and/or customized comments
–  IDs from GenBank (via DBXRef), gene symbol(s),
common name(s), synonyms, top BLAST hits (with
GenBank IDs), orthologs with species names, and
everything else you can think of, because you are
the expert.
–  Whether your model replaces one or more models
from the official gene set (so it can be deleted).
–  The kinds of changes you made to the gene model
of interest, if any. E.g.: splits, merges, whether the
5’ or 3’ ends had to be modified to include ‘Start’ or
‘Stop’ codons, additional exons had to be added,
or non-canonical splice sites were accepted.
–  Any functional assignments that you think are of
interest to the community (e.g. via BLAST, RNA-
Seq data, literature searches, etc.)
4. Becoming Acquainted with Web Apollo.
Exercises
Live Demonstration using the Apis mellifera genome.
4. Becoming Acquainted with Web Apollo. 54
1. Evidence in support of protein coding
gene models.
1.1 Consensus Gene Sets:
Official Gene Set v3.2
Official Gene Set v1.0
1.2 Consensus Gene Sets comparison:
OGSv3.2 genes that merge OGSv1.0 and
RefSeq genes
OGSv3.2 genes that split OGSv1.0 and
RefSeq genes
1.3 Protein Coding Gene Predictions
Supported by Biological Evidence:
NCBI Gnomon
Fgenesh++ with RNASeq training data
Fgenesh++ without RNASeq training data
NCBI RefSeq Protein Coding Genes and Low
Quality Protein Coding Genes
1.4 Ab initio protein coding gene
predictions:
Augustus Set 12, Augustus Set 9, Fgenesh,
GeneID, N-SCAN, SGP2
1.5 Transcript Sequence Alignment:
NCBI ESTs, Apis cerana RNA-Seq, Forager
Bee Brain Illumina Contigs, Nurse Bee Brain
Illumina Contigs, Forager RNA-Seq reads,
Nurse RNA-Seq reads, Abdomen 454 Contigs,
Brain and Ovary 454 Contigs, Embryo 454
Contigs, Larvae 454 Contigs, Mixed Antennae
454 Contigs, Ovary 454 Contigs
Testes 454 Contigs, Forager RNA-Seq
HeatMap, Forager RNA-Seq XY Plot, Nurse
RNA-Seq HeatMap, Nurse RNA-Seq XY Plot
Exercises
Live Demonstration using the Apis mellifera genome.
4. Becoming Acquainted with Web Apollo. 55
1. Evidence in support of protein coding
gene models (Ctd).
1.6 Protein homolog alignment:
Acep_OGSv1.2
Aech_OGSv3.8
Cflo_OGSv3.3
Dmel_r5.42
Hsal_OGSv3.3
Lhum_OGSv1.2
Nvit_OGSv1.2
Nvit_OGSv2.0
Pbar_OGSv1.2
Sinv_OGSv2.2.3
Znev_OGSv2.1
Metazoa_Swissprot
2. Evidence in support of non protein
coding gene models
2.1 Non-protein coding gene predictions:
NCBI RefSeq Noncoding RNA
NCBI RefSeq miRNA
2.2 Pseudogene predictions:
NCBI RefSeq Pseudogene
Web Apollo Workshop
This Workshop’s Documentation found at https://uofi.box.com/webapollo
The Web Apollo Demo http://icebox.lbl.gov/WebApolloDemo/
Arthropodcentric Thanks!
AgriPest Base
FlyBase
Hymenoptera Genome Database
VectorBase
Acromyrmex echinatior
Acyrthosiphon pisum
Apis mellifera
Atta cephalotes
Bombus terrestris
Camponotus floridanus
Helicoverpa armigera
Linepithema humile
Manduca sexta
Mayetiola destructor
Nasonia vitripennis
Pogonomyrmex barbatus
Solenopsis invicta
Tribolium castaneum…and many more!
57	
57
Thanks!
•  Berkeley Bioinformatics Open-source Projects
(BBOP), Berkeley Lab: Web Apollo and Gene
Ontology teams. Suzanna E. Lewis (PI).
•  Christine G. Elsik (PI). § University of Missouri.
•  Ian Holmes (PI). * University of California Berkeley.
•  Arthropod genomics community: i5K Steering
Committee, Alexie Papanicolaou at CSIRO, Monica
Poelchau at USDA/NAL, fringy Richards at HGSC-
BCM, BGI, Oliver Niehuis at 1KITE
http://www.1kite.org/, and the Honey Bee Genome
Sequencing Consortium.
•  Web Apollo is supported by NIH grants
5R01GM080203 from NIGMS, and 5R01HG004483
from NHGRI, and by the Director, Office of Science,
Office of Basic Energy Sciences, of the U.S.
Department of Energy under Contract No. DE-
AC02-05CH11231.
•  Insect images used with permission:
http://AlexanderWild.com and O. Niehuis.
•  For your attention, thank you!
Thank you. 58
Web Apollo
Gregg Helt
Ed Lee
Colin Diesh §
Deepak Unni §
Rob Buels *
Gene Ontology
Chris Mungall
Seth Carbon
Heiko Dietze
BBOP
Web Apollo: http://GenomeArchitect.org
GO: http://GeneOntology.org
i5K: http://arthropodgenomes.org/wiki/i5K

Weitere ähnliche Inhalte

Was ist angesagt?

Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesMonica Munoz-Torres
 
Apollo provides collaborative genome annotation editing with the power of jbr...
Apollo provides collaborative genome annotation editing with the power of jbr...Apollo provides collaborative genome annotation editing with the power of jbr...
Apollo provides collaborative genome annotation editing with the power of jbr...Nathan Dunn
 
Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Monica Munoz-Torres
 
Curation Introduction - Apollo Workshop
Curation Introduction - Apollo WorkshopCuration Introduction - Apollo Workshop
Curation Introduction - Apollo WorkshopMonica Munoz-Torres
 
Editing Functionality - Apollo Workshop
Editing Functionality - Apollo WorkshopEditing Functionality - Apollo Workshop
Editing Functionality - Apollo WorkshopMonica Munoz-Torres
 
Apollo Workshop AGS2017 Editing functionality
Apollo Workshop AGS2017 Editing functionalityApollo Workshop AGS2017 Editing functionality
Apollo Workshop AGS2017 Editing functionalityMonica Munoz-Torres
 
Gene Ontology Project
Gene Ontology ProjectGene Ontology Project
Gene Ontology Projectvaibhavdeoda
 
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesGenome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesEBI
 
The Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesThe Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesMelanie Courtot
 
Light Intro to the Gene Ontology
Light Intro to the Gene OntologyLight Intro to the Gene Ontology
Light Intro to the Gene Ontologynniiicc
 
Gene Ontology WormBase Workshop International Worm Meeting 2015
Gene Ontology WormBase Workshop International Worm Meeting 2015Gene Ontology WormBase Workshop International Worm Meeting 2015
Gene Ontology WormBase Workshop International Worm Meeting 2015raymond91105
 
Java Introductie
Java IntroductieJava Introductie
Java Introductiembruggen
 
Apollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 IntroductionApollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 IntroductionMonica Munoz-Torres
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesAnita de Waard
 
Biocurator2012.41.hu
Biocurator2012.41.huBiocurator2012.41.hu
Biocurator2012.41.hujimhutamu
 
US2TS presentation on Gene Ontology
US2TS presentation on Gene OntologyUS2TS presentation on Gene Ontology
US2TS presentation on Gene OntologyChris Mungall
 
Introducing ProtAnnot - Araport workshop at PAG 2016
Introducing ProtAnnot - Araport workshop at PAG 2016Introducing ProtAnnot - Araport workshop at PAG 2016
Introducing ProtAnnot - Araport workshop at PAG 2016Ann Loraine
 

Was ist angesagt? (20)

Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
 
Apollo provides collaborative genome annotation editing with the power of jbr...
Apollo provides collaborative genome annotation editing with the power of jbr...Apollo provides collaborative genome annotation editing with the power of jbr...
Apollo provides collaborative genome annotation editing with the power of jbr...
 
Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing
 
Curation Introduction - Apollo Workshop
Curation Introduction - Apollo WorkshopCuration Introduction - Apollo Workshop
Curation Introduction - Apollo Workshop
 
The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...
The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...
The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...
 
Editing Functionality - Apollo Workshop
Editing Functionality - Apollo WorkshopEditing Functionality - Apollo Workshop
Editing Functionality - Apollo Workshop
 
Apollo Workshop AGS2017 Editing functionality
Apollo Workshop AGS2017 Editing functionalityApollo Workshop AGS2017 Editing functionality
Apollo Workshop AGS2017 Editing functionality
 
Gene Ontology Project
Gene Ontology ProjectGene Ontology Project
Gene Ontology Project
 
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesGenome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
 
The Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesThe Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resources
 
Light Intro to the Gene Ontology
Light Intro to the Gene OntologyLight Intro to the Gene Ontology
Light Intro to the Gene Ontology
 
Gene Ontology WormBase Workshop International Worm Meeting 2015
Gene Ontology WormBase Workshop International Worm Meeting 2015Gene Ontology WormBase Workshop International Worm Meeting 2015
Gene Ontology WormBase Workshop International Worm Meeting 2015
 
Java Introductie
Java IntroductieJava Introductie
Java Introductie
 
CV_10/17
CV_10/17CV_10/17
CV_10/17
 
Apollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 IntroductionApollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 Introduction
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data Services
 
Biocurator2012.41.hu
Biocurator2012.41.huBiocurator2012.41.hu
Biocurator2012.41.hu
 
US2TS presentation on Gene Ontology
US2TS presentation on Gene OntologyUS2TS presentation on Gene Ontology
US2TS presentation on Gene Ontology
 
Introducing ProtAnnot - Araport workshop at PAG 2016
Introducing ProtAnnot - Araport workshop at PAG 2016Introducing ProtAnnot - Araport workshop at PAG 2016
Introducing ProtAnnot - Araport workshop at PAG 2016
 
2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
 

Ähnlich wie Web Apollo Workshop UIUC

Munoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMunoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMonica Munoz-Torres
 
Web Apollo Workshop University of Exeter
Web Apollo Workshop University of ExeterWeb Apollo Workshop University of Exeter
Web Apollo Workshop University of ExeterMonica Munoz-Torres
 
Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014Monica Munoz-Torres
 
Genome annotation with open source software: Apollo, Jbrowse and the GO in Ga...
Genome annotation with open source software: Apollo, Jbrowse and the GO in Ga...Genome annotation with open source software: Apollo, Jbrowse and the GO in Ga...
Genome annotation with open source software: Apollo, Jbrowse and the GO in Ga...Nathan Dunn
 
Introduction to Apollo - i5k Research Community – Calanoida (copepod)
Introduction to Apollo - i5k Research Community – Calanoida (copepod)Introduction to Apollo - i5k Research Community – Calanoida (copepod)
Introduction to Apollo - i5k Research Community – Calanoida (copepod)Monica Munoz-Torres
 
Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysisDr. Naveen Gaurav srivastava
 
Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyBarry Smith
 
Apollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriApollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriMonica Munoz-Torres
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchEuropean Bioinformatics Institute
 
EVE161: Microbial Phylogenomics - Class 1 - Introduction
EVE161: Microbial Phylogenomics - Class 1 - IntroductionEVE161: Microbial Phylogenomics - Class 1 - Introduction
EVE161: Microbial Phylogenomics - Class 1 - IntroductionJonathan Eisen
 
Designing a community resource - Sandra Orchard
Designing a community resource - Sandra OrchardDesigning a community resource - Sandra Orchard
Designing a community resource - Sandra OrchardEMBL-ABR
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009bosc
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBenjamin Good
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformaticsmaulikchaudhary8
 
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientistsRamil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientistsGigaScience, BGI Hong Kong
 

Ähnlich wie Web Apollo Workshop UIUC (20)

Munoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMunoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ss
 
Web Apollo Workshop University of Exeter
Web Apollo Workshop University of ExeterWeb Apollo Workshop University of Exeter
Web Apollo Workshop University of Exeter
 
Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014
 
Genome annotation with open source software: Apollo, Jbrowse and the GO in Ga...
Genome annotation with open source software: Apollo, Jbrowse and the GO in Ga...Genome annotation with open source software: Apollo, Jbrowse and the GO in Ga...
Genome annotation with open source software: Apollo, Jbrowse and the GO in Ga...
 
Introduction to Apollo - i5k Research Community – Calanoida (copepod)
Introduction to Apollo - i5k Research Community – Calanoida (copepod)Introduction to Apollo - i5k Research Community – Calanoida (copepod)
Introduction to Apollo - i5k Research Community – Calanoida (copepod)
 
Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysis
 
Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental Biology
 
Apollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriApollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citri
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
rheumatoid arthritis
rheumatoid arthritisrheumatoid arthritis
rheumatoid arthritis
 
EVE161: Microbial Phylogenomics - Class 1 - Introduction
EVE161: Microbial Phylogenomics - Class 1 - IntroductionEVE161: Microbial Phylogenomics - Class 1 - Introduction
EVE161: Microbial Phylogenomics - Class 1 - Introduction
 
B.3.5
B.3.5B.3.5
B.3.5
 
Designing a community resource - Sandra Orchard
Designing a community resource - Sandra OrchardDesigning a community resource - Sandra Orchard
Designing a community resource - Sandra Orchard
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiers
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
PhDc exam presentation
PhDc exam presentationPhDc exam presentation
PhDc exam presentation
 
Presentationonline
PresentationonlinePresentationonline
Presentationonline
 
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientistsRamil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
 

Mehr von Monica Munoz-Torres

Apollo Exercises Kansas State University 2015
Apollo Exercises Kansas State University 2015Apollo Exercises Kansas State University 2015
Apollo Exercises Kansas State University 2015Monica Munoz-Torres
 
JBrowse & Apollo Overview - for AGR
JBrowse & Apollo Overview - for AGRJBrowse & Apollo Overview - for AGR
JBrowse & Apollo Overview - for AGRMonica Munoz-Torres
 
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...Monica Munoz-Torres
 
Gene Ontology Consortium: Website & COmmunity
Gene Ontology Consortium: Website & COmmunityGene Ontology Consortium: Website & COmmunity
Gene Ontology Consortium: Website & COmmunityMonica Munoz-Torres
 
Essential Requirements for Community Annotation Tools
Essential Requirements for Community Annotation ToolsEssential Requirements for Community Annotation Tools
Essential Requirements for Community Annotation ToolsMonica Munoz-Torres
 
Genome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTKGenome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTKMonica Munoz-Torres
 
Introduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinisIntroduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinisMonica Munoz-Torres
 
Introduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research CommunityIntroduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research CommunityMonica Munoz-Torres
 
Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07Monica Munoz-Torres
 
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcionalCONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcionalMonica Munoz-Torres
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityMonica Munoz-Torres
 
Apollo : A workshop for the Manakin Research Coordination Network
Apollo: A workshop for the Manakin Research Coordination NetworkApollo: A workshop for the Manakin Research Coordination Network
Apollo : A workshop for the Manakin Research Coordination NetworkMonica Munoz-Torres
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityMonica Munoz-Torres
 
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015Monica Munoz-Torres
 
Data Visualization And Annotation Workshop at Biocuration 2015
Data Visualization And Annotation Workshop at Biocuration 2015Data Visualization And Annotation Workshop at Biocuration 2015
Data Visualization And Annotation Workshop at Biocuration 2015Monica Munoz-Torres
 
Apollo: developers call 2015-02-05
Apollo: developers call 2015-02-05Apollo: developers call 2015-02-05
Apollo: developers call 2015-02-05Monica Munoz-Torres
 

Mehr von Monica Munoz-Torres (19)

Apollo Exercises Kansas State University 2015
Apollo Exercises Kansas State University 2015Apollo Exercises Kansas State University 2015
Apollo Exercises Kansas State University 2015
 
JBrowse & Apollo Overview - for AGR
JBrowse & Apollo Overview - for AGRJBrowse & Apollo Overview - for AGR
JBrowse & Apollo Overview - for AGR
 
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
 
Gene Ontology Consortium: Website & COmmunity
Gene Ontology Consortium: Website & COmmunityGene Ontology Consortium: Website & COmmunity
Gene Ontology Consortium: Website & COmmunity
 
Essential Requirements for Community Annotation Tools
Essential Requirements for Community Annotation ToolsEssential Requirements for Community Annotation Tools
Essential Requirements for Community Annotation Tools
 
Genome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTKGenome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTK
 
Introduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinisIntroduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinis
 
Introduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research CommunityIntroduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research Community
 
Genome Curation using Apollo
Genome Curation using ApolloGenome Curation using Apollo
Genome Curation using Apollo
 
Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07
 
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcionalCONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
 
Apolo Taller en BIOS
Apolo Taller en BIOS Apolo Taller en BIOS
Apolo Taller en BIOS
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research Community
 
Apollo : A workshop for the Manakin Research Coordination Network
Apollo: A workshop for the Manakin Research Coordination NetworkApollo: A workshop for the Manakin Research Coordination Network
Apollo : A workshop for the Manakin Research Coordination Network
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research community
 
PAINT Family PTHR13451-MUS81
PAINT Family PTHR13451-MUS81PAINT Family PTHR13451-MUS81
PAINT Family PTHR13451-MUS81
 
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
 
Data Visualization And Annotation Workshop at Biocuration 2015
Data Visualization And Annotation Workshop at Biocuration 2015Data Visualization And Annotation Workshop at Biocuration 2015
Data Visualization And Annotation Workshop at Biocuration 2015
 
Apollo: developers call 2015-02-05
Apollo: developers call 2015-02-05Apollo: developers call 2015-02-05
Apollo: developers call 2015-02-05
 

Kürzlich hochgeladen

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 

Kürzlich hochgeladen (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Web Apollo Workshop UIUC

  • 2. An introduction to Web Apollo. Manual Annotation Workshop at UIUC Monica Munoz-Torres, PhD | @monimunozto Berkeley Bioinformatics Open-Source Projects (BBOP) Genomics Division, Lawrence Berkeley National Laboratory 11 June, 2014 UNIVERSITY OF CALIFORNIA
  • 3. Outline 1.  What is Web Apollo?: • Definition & working concept. 2.  Our Experience With Community Based Curation. 3.  The Thought Process Behind Manual Annotation. 4.  Becoming Acquainted With Web Apollo. 5.  Demonstration and Exercises. An introduction to Web Apollo. Manual Annotation Workshop at UIUC Outline 3
  • 4. During this workshop you will: •  Learn to identify homologs of known genes of interest in your newly sequenced genome. •  Become familiar with the environment and functionality of the Web Apollo genome annotation editing tool. •  Learn how to corroborate and / or modify automatically annotated gene models using all available evidence in Web Apollo. •  Understand the process of curation in the context of genome annotation: from the assembled genome sequence to manual curation via automated annotation, among other things. Footer 4
  • 5. During this lecture I invite you to: Observe and detail the figures Listen to the explanations Interrupt me at any time to ask questions Use Twitter & share your thoughts: I am @monimunozto A few suggested tags & users: #WebApollo #Annotation #Curation #GMOD #HPCBio #HoneyBee #genome @JBrowseGossip Take brakes: LBL’s ergo safety team suggests I should not work at the computer for >55 minutes without a break; neither should you! We will be here for 2.5 hours: please get up and stretch your neck, arms, and legs as often as you need. Footer 5
  • 6. Let Us Get Started
  • 7. What is Web Apollo? •  Web Apollo is a web-based, collaborative genomic annotation editing platform. We  need  annota)on  edi)ng  tools  to  modify  and  refine  the   precise  loca)on  and  structure  of  the  genome  elements  that   predic)ve  algorithms  cannot  yet  resolve  automa)cally. 71. What is Web Apollo? Find more about Web Apollo at http://GenomeArchitect.org and Genome Biol 14:R93. (2013).
  • 8. Brief history of Apollo*: a. Desktop: one person at a time editing a specific region, annotations saved in local files; slowed down collaboration. b. Java Web Start: users saved annotations directly to a centralized database; potential issues with stale annotation data remained. 1. What is Web Apollo? 8 Biologists could finally visualize computational analyses and experimental evidence from genomic features and build manually-curated consensus gene structures. Apollo became a very popular, open source tool (insects, fish, mammals, birds, etc.). *
  • 9. Web Apollo •  Browser-based tool integrated with JBrowse. •  Two new tracks: “Annotation” and “DNA Sequence” •  Allows for intuitive annotation creation and editing, with gestures and pull-down menus to create and modify transcripts and exons structures, insert comments (CV, freeform text), etc. •  Customizable look & feel. •  Edits in one client are instantly pushed to all other clients: Collaborative! 1. What is Web Apollo? 9
  • 10. Web Apollo 1. What is Web Apollo? 10 Chado UCSC (MySQL) Ensembl (DAS) BAM BED BigWig GFF3 MAKER output •  Provides dynamic access to genomic analysis results from both UCSC and Chado databases as well as database storage of user-created annotations. All user-created sequence annotations are automatically uploaded to a server, ensuring reliability.
  • 11. - BAM - BigWig - GFF3 - VCF* Trellis Data Broker (Java) Static JSON Generation Pipeline (Perl) Server-side Data Service Annotation Editing Engine (Java) Berkeley DB realtime store User Management Data Sources Analysis Pipelines - BAM - BED - BigWig - GFF3 - MAKER output Data Repositories Chado MySQL DAS servers e.g. Ensembl Annotation Exports Local DB. e.g. Chado - GFF3 - FASTA Annotators Web Apollo JBrowse Apollo Edit Operations & User Management User Interface (JavaScript) JSON Web Apollo Architecture
  • 12. Working Concept In the context of gene manual annotation, curation tries to find the best examples and / or eliminate most errors. To conduct manual annotation efforts: Gather and evaluate all available evidence using quality-control metrics to corroborate or modify automated annotation predictions. Perform sequence similarity searches (phylogenetic framework) and use literature and public databases to: • Predict functional assignments from experimental data. • Distinguish orthologs from paralogs, and classify gene membership in families and networks. 2. In our experience. 12 Automated gene models Evidence: cDNAs, HMM domain searches, alignments with assemblies or genes from other species. Manual annotation & curation
  • 13. Dispersed, community-based gene manual annotation efforts. We continuously train and support hundreds of geographically dispersed scientists from many research communities, to perform biologically supported manual annotations using Web Apollo. – Gate keepers and monitoring. – Written tutorials. – Training workshops and geneborees. – Personalized user support. 2. In our experience. 13
  • 14. What we have learned. Harvesting expertise from dispersed researchers who assigned functions to predicted and curated peptides we have developed more interactive and responsive tools, as well as better visualization, editing, and analysis capabilities. 142. In our experience. http://people.csail.mit.edu/fredo/PUBLI/Drawing/
  • 15. Collaborative Efforts Improved Automated Annotations* In many cases, automated annotations have been improved (e.g: Apis mellifera. Elsik et al. BMC Genomics 2014, 15:86). Also, learned of the challenges of newer sequencing technologies, e.g.: – Frameshifts and indel errors – Split genes across scaffolds – Highly repetitive sequences To face these challenges, we train annotators in recovering coding sequences in agreement with all available biological evidence. 152. In our experience.
  • 16. It is helpful to work together. Scientific community efforts bring together domain- specific and natural history expertise that would otherwise remain disconnected. Breaking down large amounts of data into manageable portions and mobilizing groups of researchers to extract the most accurate representation of the biology from all available data distills invaluable knowledge from genome analysis. 162. In our experience.
  • 17. Understanding the evolution of sociality Comparing the genomes of 7 species of ants contributed to a better understanding of the evolution and organization of insect societies at the molecular level. Insights drawn mainly from six core aspects of ant biology: 1.  Alternative morphological castes 2.  Division of labor 3.  Chemical Communication 4.  Alternative social organization 5.  Social immunity 6.  Mutualism 17 Libbrecht et al. 2012. Genome Biology 2013, 14:212 2. In our experience. Atta cephalotes (above) and Harpegnathos saltator. ©alexanderwild.com Groups of communities continue to guide our efforts.
  • 18. A little training goes a long way! With the right tools, wet lab scientists make exceptional curators who can easily learn to maximize the generation of accurate, biologically supported gene models. 182. In our experience.
  • 19. Manual Annotation How do we get there? 19 Assembly Manual annotation Experimental validation Automated Annotation In a genome sequencing project… Q-ratore 3. How do we get there?
  • 20. Gene Prediction Identification of protein-coding genes, tRNAs, rRNAs, regulatory motifs, repetitive elements (masked), etc. - Ab initio (DNA composition): Augustus, GENSCAN, geneid, fgenesh - Homology-based: E.g: SGP2, fgenesh++ 20 Nucleic Acids 2003 vol. 31 no. 13 3738-3741 3. How do we get there?
  • 21. Gene Annotation Integration of data from prediction tools to generate a consensus set of predictions or gene models. •  Models may be organized using: -  automatic integration of predicted sets; e.g: GLEAN -  packaging necessary tools into pipeline; e.g: MAKER •  All available biological evidence (e.g. transcriptomes) further informs the annotation process. 213. How do we get there? In some cases algorithms and metrics used to generate consensus sets may actually reduce the accuracy of the gene’s representation; in such cases it is usually better to use an ab initio model to create a new annotation.
  • 22. Manual Genome Annotation •  Identifies elements that best represent the underlying biology. •  Eliminates elements that reflect the systemic errors of automated genome analyses. •  Determines functional roles through comparative analysis of well-studied, phylogenetically similar genome elements using literature, databases, and the researcher’s experience. 223. How do we get there?
  • 23. Curation Process is Necessary 1.  A computationally predicted consensus gene set is generated using multiple lines of evidence. 2.  Manual annotation takes place. 3.  Ideally consensus computational predictions will be integrated with manual annotations to produce an updated Official Gene Set (OGS). Otherwise, “incorrect and incomplete genome annotations will poison every experiment that uses them”. - M. Yandell. 233. How do we get there?
  • 25. Sort Web Apollo 25 The Sequence Selection Window 4. Becoming Acquainted with Web Apollo. 25
  • 26. Navigation tools: pan and zoom Search box: go to a scaffold or a gene model. Grey bar of coordinates indicates location. You can also select here in order to zoom to a sub-region. ‘View’: change color by CDS, toggle strands, set highlight. ‘File’: Upload your own evidence: GFF3, BAM, BigWig, VCF*. Add combination and sequence search tracks. ‘Tools’: Use BLAT to query the genome with a protein or DNA sequence. Available Tracks Evidence Tracks Area ‘User-created Annotations’ Track Login Web Apollo 26 Graphical User Interface (GUI) for editing annotations 4. Becoming Acquainted with Web Apollo.
  • 27. Web Apollo Functionality In addition to protein-coding gene annotation that you know and love. •  Non-coding genes: ncRNAs, miRNAs, repeat regions, and TEs •  Sequence alterations (less coverage = more fragmentation) •  Visualization of stage and cell-type specific transcription data as coverage plots, heat maps, and alignments 27 4. Becoming Acquainted with Web Apollo. 27
  • 28. 1.  Select a chromosomal region of interest, e.g. scaffold. 2.  Select appropriate evidence tracks. 3.  Determine whether a feature in an existing evidence track will provide a reasonable gene model to start working. -  If yes: select and drag the feature to the ‘User-created Annotations’ area, creating an initial gene model. If necessary use editing functions to adjust the gene model. -  If not: let’s talk. 4.  Check your edited gene model for integrity and accuracy by comparing it with available homologs. 4. Becoming Acquainted with Web Apollo General Process of Curation 28 | Always remember: when annotating gene models using Web Apollo, you are looking at a ‘frozen’ version of the genome assembly and you will not be able to modify the assembly itself. 28
  • 29. Choose (click or drag) appropriate evidence tracks from the list on the left. Click on an exon to select it. Double click on an exon or single click on an intron to select the entire gene. Select & drag any elements from an evidence track into the curation area: these are editable and considered the curated version of the gene. Other options for elements in evidence tracks available from right-click menu. If you select an exon or a gene, then every track is automatically searched for exons with exactly the same co-ordinates as what you selected. Matching edges are highlighted red. Hovering over an annotation in progress brings up an information pop-up. User Navigation 29 | 29 4. Becoming Acquainted with Web Apollo.
  • 30. Right-click menu: •  With the exception of deleting a model, all edits can be reversed with ‘Undo’ option. ‘Redo’ also available. All changes are immediately saved and available to all users in real time. •  ‘Get sequence’ retrieves peptide, cDNA, CDS, and genomic sequences. •  You can select an exon and select ‘Delete’. You can create an intron, flip the direction, change the start or split the gene. User Navigation 30 | 30 4. Becoming Acquainted with Web Apollo.
  • 31. Right-click menu: •  If you select two gene models, you can join them using ‘Merge’, and you may also ‘Split’ a model. •  You can select ‘Duplicate’, for example to annotate isoforms. •  Set translation start, annotate selenocysteine-containing proteins, match edges of annotation to those of evidence tracks. User Navigation 31 | 31 4. Becoming Acquainted with Web Apollo.
  • 32. User Navigation 32 Annotations, annotation edits, and History: stored in a centralized database. 4. Becoming Acquainted with Web Apollo. 32
  • 33. User Navigation 33 The Annotation Information Editor 4. Becoming Acquainted with Web Apollo. DBXRefs are database crossreferences: if you have reason to believe that this gene is linked to a gene in a public database (including your own), then add it here. 33
  • 34. User Navigation 34 The Annotation Information Editor 4. Becoming Acquainted with Web Apollo. •  Add PubMed IDs •  Include GO terms as appropriate from any of the three ontologies •  Write comments stating how you have validated each model. 34
  • 35. User Navigation 35 | •  ‘Zoom to base level’ option reveals the DNA Track. •  Change color of exons by CDS from the ‘View’ menu. •  The reference DNA sequence is visible in both directions as are the protein translations in all six frames. You can toggle either direction to display only 3 frames. Zoom in/out with keyboard: shift + arrow keys up/down 35 4. Becoming Acquainted with Web Apollo.
  • 36. Web Apollo User Guide (Fragment) http://genomearchitect.org/web_apollo_user_guide
  • 37. In a “simple case” the predicted gene model is correct or nearly correct, and this model is supported by evidence that completely or mostly agrees with the prediction. Evidence that extends beyond the predicted model is assumed to be non-coding sequence. The following sections describe simple modifications. Annotating Simple Cases 37 | 37 4. Becoming Acquainted with Web Apollo.
  • 38. Select and drag the putative new exon from a track, and add it directly to an annotated transcript in the ‘User-created Annotations’ area. •  Click the exon, hold your finger on the mouse button, and drag the cursor until it touches the receiving transcript. A dark green highlight indicates it is okay to release the mouse button. •  When released, the additional exon becomes attached to the receiving transcript. Adding exons 38 | •  A confirmation box will warn you if the receiving transcript is not on the same strand as the feature where the new exon originated. 38 4. Becoming Acquainted with Web Apollo.
  • 39. Each time you add an exon region, whether by extension or adding an exon, Web Apollo recalculates the longest ORF, identifying ‘Start’ and ‘Stop’ signals and allowing you to determine whether a ‘Stop’ codon has been incorporated after each editing step. Adding exons 39 | Web Apollo demands that an exon already exists as an evidence in one of the tracks. You could provide a text file in GFF format and select File à Open. GFF is a simple text file delimited by TABs, one line for each genomic ‘feature’: column 1 is the name of the scaffold; then some text (irrelevant), then ‘exon’, then start, stop, strand as + or -, a dot, another dot, and Name=some name Example: scaffold_88  Qratore  exon  21  2111  +  .  .  Name=bob   scaffold_88  Qratore  exon  2201  5111  +  .  .  Name=rad   39 4. Becoming Acquainted with Web Apollo.
  • 40. Gene predictions may or may not include UTRs. If transcript alignment data are available and extend beyond your original annotation, you may extend or add UTRs. 1.  Position the cursor at the beginning of the exon that needs to be extended and ‘Zoom to base level’. 2.  Place the cursor over the edge of the exon until it becomes a black arrow then click and drag the edge of the exon to the new coordinate position that includes the UTR. Adding UTRs 40 | View zoomed to base level. The DNA track and annotation track are visible. The DNA track includes the sense strand (top) and anti- sense strand (bottom). The six reading frames flank the DNA track, with the three forward frames above and the three reverse frames below. The User-created Annotation track shows the terminal end of an annotation. The green rectangle highlights the location of the nucleotide residues in the ‘Stop’ signal. To add a new spliced UTR to an existing annotation follow the procedure for adding an exon. 40 4. Becoming Acquainted with Web Apollo.
  • 41. 1.  Zoom in sufficiently to clearly resolve each exon as a distinct rectangle. 2.  Two exons from different tracks sharing the same start and/or end coordinates, will display a red bar to indicate the matching edges. 3.  Selecting the whole annotation or one exon at a time, use this ‘edge-matching’ function and scroll along the length of the annotation, verifying exon boundaries against available data. Use square [ ] brackets to scroll from exon to exon. 4.  Note if there are cDNA / RNAseq that lack one or more of the annotated exons or include additional exons. Exon structure integrity 41 | 41 4. Becoming Acquainted with Web Apollo.
  • 42. To modify an exon boundary and match data in the evidence tracks: select both the offending exon and the feature with the expected boundary, then right click on the annotation to select ‘Set 3’ end’ or ‘Set 5’ end’ as appropriate. Exon structure integrity 42 | In some cases all the data may disagree with the annotation, in other cases some data support the annotation and some of the data support one or more alternative transcripts. Try to annotate as many alternative transcripts as are well supported by the data. 42 4. Becoming Acquainted with Web Apollo.
  • 43. Flags non- canonical splice sites. Selection of features and sub-features Edge-matching Evidence Tracks Area ‘User-created Annotations’ Track The editing logic in the server: §  selects longest ORF as CDS §  flags non-canonical splice sites 43 Editing Logic 4. Becoming Acquainted with Web Apollo.
  • 44. Zoom to base level to review non- canonical splice site warnings. These do not necessarily need to be corrected, but should be flagged with the appropriate comment. Splice sites 44 | Exon/intron junction possible error Original model Curated model Non-canonical splices are indicated by an orange circle with a white exclamation point inside, placed over the edge of the offending exon. Most insects, have a valid non-canonical site GC-AG. Other non-canonical splice sites are unverified. Web Apollo flags GC splice donors as non-canonical. Canonical splice sites: 3’-…exon]GA / TG[exon…-5’ 5’-…exon]GT / AG[exon…-3’ reverse strand, not reverse-complemented: forward strand 44 4. Becoming Acquainted with Web Apollo.
  • 45. Some gene prediction algorithms do not recognize GC splice sites, thus the intron/ exon junction may be incorrect. For example, one such gene prediction algorithm may ignore a true GC donor and select another non-canonical splice site that is less frequently observed in nature. Therefore, if upon inspection you find a non- canonical splice site that is rarely observed in nature, you may wish to search the region for a more frequent in-frame non-canonical splice site, such as a GC donor. If there is an in-frame site close that is more likely to be the correct splice donor, you may make this adjustment while zoomed at base level. Splice sites (keep this in mind) 45 | Exon/intron junction possible error Original model Curated model Use RNA-Seq data to make a decision. Canonical splice sites: 3’-…exon]GA / TG[exon…-5’ 5’-…exon]GT / AG[exon…-3’ reverse strand, not reverse-complemented: forward strand 45 4. Becoming Acquainted with Web Apollo.
  • 46. Web Apollo calculates the longest possible open reading frame (ORF) that includes canonical ‘Start’ and ‘Stop’ signals within the predicted exons. If it appears to have calculated an incorrect ‘Start’ signal, you may modify it selecting an in-frame ‘Start’ codon further up or downstream, depending on evidence (protein database, additional evidence tracks). An upstream ‘Start’ codon may be present outside the predicted gene model, within a region supported by another evidence track. ‘Start’ and ‘Stop’ sites 46 | 46 4. Becoming Acquainted with Web Apollo.
  • 47. Note that the ‘Start’ codon may also be located in a non-predicted exon further upstream. If you cannot identify that exon, add the appropriate note in the transcript’s ‘Comments’ section. In very rare cases, the actual ‘Start’ codon may be non-canonical (non- ATG). In some cases, a ‘Stop’ codon may not be automatically identified. Check to see if there are data supporting a 3’ extension of the terminal exon or additional 3’ exons with valid splice sites. ‘Start’ and ‘Stop’ sites (keep in mind) 47 | 47 4. Becoming Acquainted with Web Apollo.
  • 48. Evidence may support joining two or more different gene models. Warning: protein alignments may have incorrect splice sites and lack non-conserved regions! 1.  Drag and drop each gene model to ‘User-created Annotations’ area. Shift click to select an intron from each gene model and right click to select the ‘Merge’ option from the menu. 2.  Drag supporting evidence tracks over the candidate models to corroborate overlap, or review edge matching and coverage across models. 3.  Check the resulting translation by querying a protein database e.g. UniProt. Record the IDs of both starting gene models in ‘DBXref’ and add comments to record that this annotation is the result of a merge. Complex Cases: Merge two gene predictions on the same scaffold 48 | 48 Red lines around exons: ‘edge-matching’ allows annotators to confirm whether the evidence is in agreement without examining each exon at the base level. 4. Becoming Acquainted with Web Apollo.
  • 49. One or more splits may be recommended when different segments of the predicted protein align to two or more different families of protein homologs, and the predicted protein does not align to any known protein over its entire length. Transcript data may support a split (if so, verify that it is not a case of alternative transcripts). Complex Cases: Split a gene prediction 49 | 49 4. Becoming Acquainted with Web Apollo.
  • 50. DNA Track ‘User-created Annotations’ Track Complex Cases: Frameshifts, single- base errors, and selenocysteines 50 4. Becoming Acquainted with Web Apollo.
  • 51. 1.  Web Apollo allows annotators to make single base modifications or frameshifts that are reflected in the sequence and structure of any transcripts overlapping the modification. Note that these manipulations do NOT change the underlying genomic sequence. 2.  If you determine that you need to make one of these changes, zoom in to the nucleotide level and right click over a single nucleotide on the genomic sequence to access a menu that provides options for creating insertions, deletions or substitutions. 3.  The ‘Create Genomic Insertion’ feature will require you to enter the necessary string of nucleotide residues that will be inserted to the right of the cursor’s current location. The ‘Create Genomic Deletion’ option will require you to enter the length of the deletion, starting with the nucleotide where the cursor is positioned. The ‘Create Genomic Substitution’ feature asks for the string of nucleotide residues that will replace the ones on the DNA track. 4.  Once you have entered the modifications, Web Apollo will recalculate the corrected transcript and protein sequences, which will appear when you use the right-click menu ‘Get Sequence’ option. Since the underlying genomic sequence is reflected in all annotations that include the modified region you should alert the curators of your organisms database using the ‘Comments’ section to report the CDS edits. 5.  In special cases such as selenocysteine containing proteins (read-throughs), right-click over the offending/premature ‘Stop’ signal and choose the ‘Set readthrough stop codon’ option from the menu. 51 | 51 Complex Cases: Frameshifts, single- base errors, and selenocysteines 4. Becoming Acquainted with Web Apollo.
  • 52. Follow our checklist until you are happy with the annotation! Then: –  Comment to validate your annotation, even if you made no changes to an existing model. Your comments mean you looked at the curated model and are happy with it; think of it as a vote of confidence. –  Or add a comment to inform the community of unresolved issues you think this model may have. Completing the annotation 52 | 52 Always Remember: Web Apollo curation is a community effort so please use comments to communicate the reasons for your annotation (your comments will be visible to everyone). 4. Becoming Acquainted with Web Apollo.
  • 53. 1.  Can you add UTRs (e.g.: via RNA-Seq)? 2.  Check exon structures 3.  Check splice sites: most splice sites display these residues …]5’-GT/AG-3’[… 4.  Check ‘Start’ and ‘Stop’ sites 5.  Check the predicted protein product(s) –  Align it against relevant genes/gene family. –  blastp against NCBI’s RefSeq or nr 6.  If the protein product still does not look correct then check: –  Are there gaps in the genome? –  Merge of 2 gene predictions on the same scaffold –  Merge of 2 gene predictions from different scaffolds –  Split a gene prediction –  Frameshifts –  error in the genome assembly? –  Selenocysteine, single-base errors, and other inconvenient phenomena Checklist for Accuracy and Integrity 53 | 53 7.  Finalize annotation by adding: –  Important project information in the form of canned and/or customized comments –  IDs from GenBank (via DBXRef), gene symbol(s), common name(s), synonyms, top BLAST hits (with GenBank IDs), orthologs with species names, and everything else you can think of, because you are the expert. –  Whether your model replaces one or more models from the official gene set (so it can be deleted). –  The kinds of changes you made to the gene model of interest, if any. E.g.: splits, merges, whether the 5’ or 3’ ends had to be modified to include ‘Start’ or ‘Stop’ codons, additional exons had to be added, or non-canonical splice sites were accepted. –  Any functional assignments that you think are of interest to the community (e.g. via BLAST, RNA- Seq data, literature searches, etc.) 4. Becoming Acquainted with Web Apollo.
  • 54. Exercises Live Demonstration using the Apis mellifera genome. 4. Becoming Acquainted with Web Apollo. 54 1. Evidence in support of protein coding gene models. 1.1 Consensus Gene Sets: Official Gene Set v3.2 Official Gene Set v1.0 1.2 Consensus Gene Sets comparison: OGSv3.2 genes that merge OGSv1.0 and RefSeq genes OGSv3.2 genes that split OGSv1.0 and RefSeq genes 1.3 Protein Coding Gene Predictions Supported by Biological Evidence: NCBI Gnomon Fgenesh++ with RNASeq training data Fgenesh++ without RNASeq training data NCBI RefSeq Protein Coding Genes and Low Quality Protein Coding Genes 1.4 Ab initio protein coding gene predictions: Augustus Set 12, Augustus Set 9, Fgenesh, GeneID, N-SCAN, SGP2 1.5 Transcript Sequence Alignment: NCBI ESTs, Apis cerana RNA-Seq, Forager Bee Brain Illumina Contigs, Nurse Bee Brain Illumina Contigs, Forager RNA-Seq reads, Nurse RNA-Seq reads, Abdomen 454 Contigs, Brain and Ovary 454 Contigs, Embryo 454 Contigs, Larvae 454 Contigs, Mixed Antennae 454 Contigs, Ovary 454 Contigs Testes 454 Contigs, Forager RNA-Seq HeatMap, Forager RNA-Seq XY Plot, Nurse RNA-Seq HeatMap, Nurse RNA-Seq XY Plot
  • 55. Exercises Live Demonstration using the Apis mellifera genome. 4. Becoming Acquainted with Web Apollo. 55 1. Evidence in support of protein coding gene models (Ctd). 1.6 Protein homolog alignment: Acep_OGSv1.2 Aech_OGSv3.8 Cflo_OGSv3.3 Dmel_r5.42 Hsal_OGSv3.3 Lhum_OGSv1.2 Nvit_OGSv1.2 Nvit_OGSv2.0 Pbar_OGSv1.2 Sinv_OGSv2.2.3 Znev_OGSv2.1 Metazoa_Swissprot 2. Evidence in support of non protein coding gene models 2.1 Non-protein coding gene predictions: NCBI RefSeq Noncoding RNA NCBI RefSeq miRNA 2.2 Pseudogene predictions: NCBI RefSeq Pseudogene
  • 56. Web Apollo Workshop This Workshop’s Documentation found at https://uofi.box.com/webapollo The Web Apollo Demo http://icebox.lbl.gov/WebApolloDemo/
  • 57. Arthropodcentric Thanks! AgriPest Base FlyBase Hymenoptera Genome Database VectorBase Acromyrmex echinatior Acyrthosiphon pisum Apis mellifera Atta cephalotes Bombus terrestris Camponotus floridanus Helicoverpa armigera Linepithema humile Manduca sexta Mayetiola destructor Nasonia vitripennis Pogonomyrmex barbatus Solenopsis invicta Tribolium castaneum…and many more! 57 57
  • 58. Thanks! •  Berkeley Bioinformatics Open-source Projects (BBOP), Berkeley Lab: Web Apollo and Gene Ontology teams. Suzanna E. Lewis (PI). •  Christine G. Elsik (PI). § University of Missouri. •  Ian Holmes (PI). * University of California Berkeley. •  Arthropod genomics community: i5K Steering Committee, Alexie Papanicolaou at CSIRO, Monica Poelchau at USDA/NAL, fringy Richards at HGSC- BCM, BGI, Oliver Niehuis at 1KITE http://www.1kite.org/, and the Honey Bee Genome Sequencing Consortium. •  Web Apollo is supported by NIH grants 5R01GM080203 from NIGMS, and 5R01HG004483 from NHGRI, and by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE- AC02-05CH11231. •  Insect images used with permission: http://AlexanderWild.com and O. Niehuis. •  For your attention, thank you! Thank you. 58 Web Apollo Gregg Helt Ed Lee Colin Diesh § Deepak Unni § Rob Buels * Gene Ontology Chris Mungall Seth Carbon Heiko Dietze BBOP Web Apollo: http://GenomeArchitect.org GO: http://GeneOntology.org i5K: http://arthropodgenomes.org/wiki/i5K