💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
BG7, a new system for bacterial genome annotation designed for NGS data
1. BG7
A new system for bacterial genome
annotation designed for NGS data
www.ohnosequences.com www.era7bioinformatics.com
2. Motivation
Motivation
Features
The need of a system specially designed for NGS data
annotation with a pipeline unbiased by existing annotation systems
How it works? designed for Sanger sequences
The need of a versatile system able to annotate genes even in the
Comparisons
step of preliminary assembly of the genome
Upcoming features Special focus is given to the detection of “unexpected
proteins” without orthologous in close genomes (horizontally
acquired genes, phage genes, plasmid genes…)
A fast, automated and scalable process to face the
challenge of analyzing the huge amount of genomes that are being
sequenced with NGS technologies
www.ohnosequences.com www.era7bioinformatics.com
3. Motivation
Features
Features
1. A new approach
How it works?
2. It’s tolerant to NGS errors
Comparisons
3. It’s based on cloud computing
Upcoming features
4. It uses bio4j
www.ohnosequences.com www.era7bioinformatics.com
4. Motivation
Features: Approach
Features
How it works?
ORF prediction
Comparisons
is based on
Upcoming features
protein similarity
www.ohnosequences.com www.era7bioinformatics.com
5. Motivation
Features: Approach
Features
Use as much information as you can
(not just start/stop signals)
How it works?
TGGATGTGGCTCAGGACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGGAAAGGCTGA
Comparisons
Upcoming features
TGGATGTGGCTCAGGACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGGAAAGGCTGA
A B C D E
www.ohnosequences.com www.era7bioinformatics.com
6. Motivation
Features: Approach
Features
Standard BG7
How it works? Sequence Sequence
Comparisons
Protein searching
Upcoming features ORF prediction (Blast)
(Glimmer)
CDS prediction
Function prediction RNA searching
(Blast) (Blast)
www.ohnosequences.com www.era7bioinformatics.com
7. Motivation
Features: NGS errors
Features
Issue Technology
How it works?
Genomes in several contigs All
Comparisons Sequencing errors in start/stop codons Illumina substitutions
454 indels
Upcoming features Frameshifts 454 indels
Horizontal gene transfer None
BG7 system is tolerant to all these issues
www.ohnosequences.com www.era7bioinformatics.com
8. Motivation
Features: Cloud computing
Features
AWS (Amazon Web Services)
How it works?
Comparisons Completely Scalable On demand
Upcoming features
Fast Cheap
Useful in tracking outbreaks
1 genome in ~2 hours
100 genomes in ~2 hours once you’ve got the reference proteins
www.ohnosequences.com www.era7bioinformatics.com
9. Motivation
Features: bio4j
Features
It uses
How it works?
Comparisons
Upcoming features
Much richer
annotations
www.bio4j.com
www.ohnosequences.com www.era7bioinformatics.com
10. Motivation
How it works?
Features
How it works?
Comparisons
Upcoming features
www.ohnosequences.com www.era7bioinformatics.com
11. • Expert Manual Selection of reference sequences
1
• Protein search
2 • Blast
• CDS definition
• HSPs merge
3 • Extension of the similarity region searching for start/stop signals
• Solving conflicts
• Solving duplicates
4 • Solving overlaps
• RNA search
5 • Blast
• Incorporation of RNA genes
• Definition of RNA genes
6 • Conflicts with protein coding genes previously annotated are solved
www.ohnosequences.com www.era7bioinformatics.com
12. Motivation
Step 2: Protein search with tBlastn
Features
A B C
How it works?
Comparisons
Upcoming features
Reference
Proteins (aa)
are searched in
the contigs sequences Input contigs (aa)
www.ohnosequences.com www.era7bioinformatics.com
13. Motivation
Step 3: CDS definition
Features Merging HSPs
How it works?
Several HSPs
Comparisons
Input contigs (aa)
Upcoming features
Protein
www.ohnosequences.com www.era7bioinformatics.com
14. Motivation
Step 3: CDS definition
Features Merging HSPs
How it works?
Several HSPs
Comparisons
Input contigs (aa)
Upcoming features
Protein
We merge the HSPs to form a single similarity region
www.ohnosequences.com www.era7bioinformatics.com
15. Motivation
Step 3: CDS definition
Features Search for start/stop signals
How it works?
Comparisons
Upcoming features
We then search for start/stop signals upstream and
downstream the region with high similarity with the protein
www.ohnosequences.com www.era7bioinformatics.com
16. Motivation
Step 3: CDS definition
Features
Although we don’t find an start/stop codon for a given
How it works?
CDS we keep it
Comparisons We just mark it accordingly
Upcoming features
www.ohnosequences.com www.era7bioinformatics.com
17. Motivation
Step 4: Solving conflicts
Features Duplicates
How it works?
Comparisons
Upcoming features
www.ohnosequences.com www.era7bioinformatics.com
18. Motivation
Step 4: Solving conflicts
Features Duplicates
How it works?
Comparisons
Upcoming features
www.ohnosequences.com www.era7bioinformatics.com
19. Motivation
Step 4: Solving conflicts
Features Overlapping CDS
How it works?
Comparisons
Upcoming features
www.ohnosequences.com www.era7bioinformatics.com
20. Motivation
Step 5: RNA search
Features Blastn
Input contigs (nt)
How it works?
Comparisons
Upcoming features
Reference RNAs (nt) are searched in the contigs
www.ohnosequences.com www.era7bioinformatics.com
21. Motivation
Step 6: Incorporation of RNA genes
Features Definition of RNA genes
Input contigs (nt)
How it works?
Comparisons
Upcoming features
www.ohnosequences.com www.era7bioinformatics.com
22. Motivation
Step 6: Incorporation of RNA genes
Features Conflicts with protein coding genes are solved
How it works?
Comparisons
Upcoming features
If in a particular region we find a protein coding gene and
a RNA gene. RNA gene is selected over the protein coding
one
www.ohnosequences.com www.era7bioinformatics.com
23. Motivation
Finally
Features
How it works?
Comparisons
Upcoming features
TGGATGTGGCTCAGGACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGGAAAGGCTGA
A B C D E
www.ohnosequences.com www.era7bioinformatics.com
24. Motivation
Comparisons
Features
We’ve compared the NCBI annotations for
How it works?
Escherichia coli str. K-12 substr. MG1655
(Refseq ID NC_000913)
Comparisons
Upcoming features
With BG7 annotations
www.ohnosequences.com www.era7bioinformatics.com
25. Motivation
Comparisons
Features
The results we got were:
How it works?
Comparisons
Feature NCBI BG7
Upcoming features Protein coding genes 4145 43701
49512
RNA 175 156
1 Selected genes
2 All detected genes: Selected + dismissed
www.ohnosequences.com www.era7bioinformatics.com
26. Motivation
Comparisons
Features
How it works?
Comparisons
Upcoming features
www.ohnosequences.com www.era7bioinformatics.com
27. Motivation
Comparisons
Features
Conclusions
How it works?
Even in a not advantageous situation
Comparisons (not a NGS project and a very well annotated genome)
Upcoming features We got in one round annotation step
- ~95% of the NCBI protein coding genes
- ~89% of the NCBI RNA genes
- 419 new proteins detected
www.ohnosequences.com www.era7bioinformatics.com
28. Motivation
Upcoming features
Features
Improvements now focused on:
How it works?
- Overlapping solving phase
Comparisons
- Detection of very small proteins
Upcoming features
And any new need we find using it
www.ohnosequences.com www.era7bioinformatics.com
29. Motivation
Thanks:
Features
Oh no sequences! team
How it works?
Raquel Tobes: Bioinformatician, main advisor
Comparisons
Pablo Pareja: Main developer
Upcoming features
Eduardo Pareja: Scientific advisor
Eduardo Pareja-Tobes: Mathematician, advisor
Carmen Torrecillas: Junior Bioinformatician
Marina Manrique: Bioinformatician
www.ohnosequences.com www.era7bioinformatics.com
30. Thanks for your attention!
www.ohnosequences.com www.era7bioinformatics.com