Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Sl4.0 and ITAG4.0

343 Aufrufe

Veröffentlicht am

Solanum lycopersicum Heinz 1706 genome assembly and annotation SL4.0 and ITAG4.0

Veröffentlicht in: Wissenschaft
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Sl4.0 and ITAG4.0

  1. 1. Solanum lycopersicum Heinz 1706 genome assembly and annotation SL4.0 and ITAG4.0 Sol Genomics Network https://solgenomics.net/
  2. 2. SL4.0 assembly ● 80X Pacbio coverage with RSII and Sequel (13kb read N50) ● Canu assembly (N50 5.5 Mb) ● Hi-C scaffolding (12 chromosomes and unplaced contigs) ● Corrected with Illumina DNAseq (coverage 60x) ● Filtered for mitochondrial and chloroplast contigs ● Validated with Bionano optical maps and 10X linked reads
  3. 3. Comparison with the previous assemblies Genome Assembly versions SL4.0 SL3.0 SL2.5 Assembly Size (bp) 782,520,133 828,076,956 823,944,041 Non-N bases 782,475,302 746,357,470 737,636,348 N’s (bp) 44,831 81,719,486 86,307,693 Chr 00 / unplaced contig size (bp) 9,643,350 20,852,292 21,805,821 Number of Chr 00 contigs 152 3,141 4,410 Repeat content (RepeatModeler/RepeatMasker) 64.19% 56.39% 56.34% Repeat content (REPET) 71.77% 61.55% 60.94% Assembly completeness estimation based on kmer's 99.24% 98.96% 98.83%
  4. 4. SL3.0 vs SL4.0 Genome assembly co-linearity
  5. 5. Input data for genome annotation - Full-length cDNA sequenced using PacBio IsoSeq (Breaker and Mature green fruit stages) - RNAseq Illumina data from >1,300 libraries with >14 billion reads - Disease resistance data (Martin and Jones labs) - 3’ and 5’ UTR enriched data (Giovannoni, Aharoni and Sinha labs) - Public data from NCBI SRA - NCBI EST sequences (~300 K) - Full-length cDNA sequences (~13 K) from Micro-Tom (Aoki et. al., 2010)
  6. 6. Annotation of protein-coding gene models ITAG4.0 ITAG2.4 Number of protein-coding genes 34,075 34,725 Average transcript length 1,303 1,209 Average number of exons per gene 4.74 4.61 Fraction of genes with 5' UTR 0.49 0.34 Fraction of genes with 3' UTR 0.58 0.41 Long non-coding RNA in ITAG4.0 - 5,874 with 6,694 alternately spliced isoforms
  7. 7. Annotation Edit Distance (AED) Annotation Edit Distance (AED) provides a means to evaluate quality of annotations given the evidence set. AED cumulative plot shows improvements in the ITAG4.0 compared to ITAG2.4.
  8. 8. Novel protein coding genes in ITAG4.0 Novel genes in ITAG4.0 are enriched in stress response genes. GO-terms enriched in novel genes are shown as fold enriched in minus log10 of their corresponding P-values.
  9. 9. Thank you! Submit your annotation corrections using Tomato Apollo annotation editor - contact SGN for account https://solgenomics.net/contact/form

×