Finology Group â Insurtech Innovation Award 2024
Â
Assembly and finishing
1.
2. A typical Microbial (and not only) project FINISHING Annotation Public release Sequencing Draft assembly Goals: Completely restore genome Produce high quality consensus
9. Draft assembly - what we get Assembly: set of contigs 10 16 21 10 21 Clone walk (Sanger lib) Ordered sets of contigs (scaffolds) New technologies: no clones to walk off even if you can scaffold contigs (bPCR â new approach of gap closing) PE 16 PCR - sequence pri1 pri2 PCR product
10. Primer walking Clone walk (captured gaps) Clone A PCR â sequence (un captured gaps) Template: gDNA PCR product
13. High GC sequencing problems: The presence of small hairpins (inverted repeat sequences) in the DNA that re anneal ether during sequencing or electrophoresis resulting in failed sequencing reactions or unreadable electrophoresis results. (This can be aided by adding modifiers to the reaction, sequencing smaller clones and running gels at higher temperatures in the presence of stronger denaturants).
14.
15. 454 (pyrosequence) and low GC genomes Thermotoga lettingae TMO Sanger based draft assembly: - 55 total contigs; 41 contigs >2kb - 38GC% - biased Sanger libraries Draft assembly +454 - 2 total contigs; 1 contigs >2kb - 454 â no cloning <166bp> - average length of gaps
16. 454 and High GC projects Xylanimonas cellulosilytica DSM 15894 (3.8 MB; 72.1% GC) PGA assembly - 9x of 8kb PGA assembly - 9x of 8kb +454 Assembly Total contigs Major contigs Scaffolds Misassenblies* N50 PGA-8kb 210 166 4 165 41,048 PGA-8kb+454 33 23 2 14 288,369
17. NextGen high Quality Drafts at JGI (multiple sequencing platforms) 454/Sanger contig Fosmid ends* and 454 PE 1.Pyrosequence and Sanger to obtain main ordered and oriented part of the assembly â Newbler assembler 3. Solexa reads to detect and correct errors in consensus â in house created tool (the Polisher) and close gaps (Velvet) 2. GapResolution (in house tool) to close some (up to 40%) gaps using unassembled 454 data â PGA or Newbler assemblers * Fosmids ends not used for microbes Unassembled 454 reads Solexa contig Solexa
18. Solving gaps: gapResopution tool Contig Gap (due to repeat) Read pairs that are found in contigs outside of this scaffold Step 1 For each gap, identify read pairs from contigs found on different scaffolds Step 2 Assemble reads in contigs adjacent to the gap and reads obtained from contigs outside the scaffold. Sometimes use assembler other than Newbler for sub-assemblies (PGA) Contig Gap Consensus from sub-assembly
19. Solving gaps: gapResopution tool (II) Step 3 If gap is not closed, tool designs designs primers for sequencing reactions Step 4 Iterate as necessary (in sub-assemblies) http://www.jgi.doe.gov/ [email_address] Contig Gap Design sequencing reactions to close gap
20.
21. Low quality areas â areas of potential frameshifts Assemblies contain low quality regions (red tags)
22. Frameshift 1 (AAAAA, should be AAAA) Frameshift 2 (CCCC, should be CCC) homopolymers (n>=3) Modified from N. Ivanova (JGI) Homopoymer related frameshifts
23. Polisher: software for consensus quality improvement Step 1: Align Illumina data to 454-only or Sanger/454 hybrid assembly Contig Illumina reads Step 2: Analyze and correct consensus errors C T T G A A A A A Corrections Illumina coverage >= 10X and at least 70% llumina bases disagrees with the reference base Unsupported a. Illumina coverage < 10X b. Illumina coverage >= 10X and <70% of Illumina bases agree with the reference base Step 3: Design sequencing reactions for low quality and unsupported Illumina areas Unsupported Illumina region Sanger/454 low quality
28. QC: Annotation of poor quality sequence To avoid this: -make sure you use high quality sequence -choose proper assembler A Bioinformatician's Guide to Metagenomics . Microbiol Mol Biol Rev. 2008 December; 72(4): 557â578.
29. Assembly mistakes A Bioinformatician's Guide to Metagenomics . Microbiol Mol Biol Rev. 2008 December; 72(4): 557â578.
30.
31. Metagenomic finishing: approach Binning: Which DNA fragment derived from which phylotype? (BLAST; GC%; read depth) Complete genome of Candidatus Accumulibacter phosphatis Lucy/PGA Candidatus Accumulibacter phosphatis (CAP) ~ 45% Non-CAP reads CAP reads +
1 and 3 â assemble reads of the same nature; can use particular assemblers 2 reads from different platforms -> assembler? -> shred like an option 454/Solexa co-assembly - ? - Newbler OK with longer Soelxa reads
We need to order our contigs and build scaffolds
Primers1 â standard primers to check the template
Kak slomatâ venik? Pereprigivaet.
You still have gaps after thatâŠ.
454 reads when aligned cover the entire genome
Do six por ispolâzujut!
Results varies from project to project â depends in GC%
Quality of the sequence you work with is even more important than for single genomes