Although sequencing technology and price performance per base-pair-sequenced continue to advance at an impressive rate, Finished whole genome sequencing projects are still costly and lengthy endeavors. Next Gen sequencing technology (and even next-next gen technology) isn’t addressing some of the common issues faced with creating a “Finished” quality genome, namely Contig Placement, Gap Closure and Validation. Addressing these issues takes several months and a substantial amount of the budget in a sequencing project.
This webinar will discuss how using Whole Genome Mapping technology, also called Optical Mapping, can significantly reduce the length and cost of sequencing projects.
Whole Genome (Optical) Mapping is a de novo process that generates whole genome, ordered, restriction maps with no requirement for previous sequence information & provides a comprehensive view of genomic architecture.
Thanks for joining; introduce myself, Allen, Charles MIRAIBIO Group of HISALLogistics of webinarAsk questions during using GTM control panelAnswer some of the good ones at the endWebinar is being recorded for viewing later
Who is the MiraiBio Hitachi Solutions America, Ltd.MBI group has been serving the life science community since 1991Part of Hitachi family of companiesSubsidiary of Hitachi Solutions in JP which is a 13k+, multi-billion dollar companyDiverse product portfolio for the life sciencesWet/dry lab servicesIncluding MapIt which we talk about todayConsumablesInstrumentationSoftware –DNASIS – first life science software released in 1983MasterPlex suiteMoving to the cloud
I am going to start with the answer to the question “How can you reduce the cost and length of your sequencing project by 50% or more?”The answer lies in a technology called “Whole Genome Mapping” or sometimes called Optical Mapping.This will not be a comprehensive discussion of either sequencing finishing or Whole Genome mapping. In our limited time today I will probably create more answers than questions for you. My goal is to introduce WGM and pique your interest enough to want to learn more.
In this presentation I will explain what Whole Genome Mapping is, Why whole genomes matterHow WGM can save you 50% or more AND improve the quality of your resultsLook at an example of how one lab was able to reduce the amount of PCR and sequencing reactionsWe look at a resequencing Example that really shows the value of using a WGMLook at the results of a study that shows why Sequence Read quality doesn’t necessarily translate into Assembly qualityFinally we will discuss how to get your own WGMs through the MapIt service being offered by Hitachi Solutions
What is Whole Genome Mapping- read definitionHere is an example of a map. You can see that the genome or chromosome was digested by a 6 base cutter.As we will see in a second the process of generating a map Locates and identifies the distance between restriction sitesThese patterns are highly specific to individual organisms even individual isolates
Using fluorescent microscopy and sophisticated software the fragments are identified and given a size. Please note that the order of the fragments is always maintained.
Applications Whole Genome Maps are compared to perform high resolution epidemiology, discover genetic variation, and accelerate sequence assemblyWe are going to focus on the sequence assembly application today but before we do let’s talk about when whole genome matter
The top of this slide shows the various stages in a sequencing project. As you would imagine the further you get into your sequence finishing project the higher the quality your sequence. The figure below describes the criteria that must be achieved at each stage.For reference shot gun sequencing produces only the standard draft or first stage.
Not everybody has the resources to generate finished whole genomes of the organisms they are studying. Many of you might be thinking that you don’t need to achieve some of the later stages of quality to answer your biological questions. Or perhaps the value of the added information is not justified by the cost. Or you lab is resource constrained. That last one may be the main reason why you are listen to me today.Well Using WGM in sequencing finishing projects reduces the cost significantly increasing the accessibility of finished whole genomes.When is whole-genome sequence finishing important? Gene dosageUnfinished genomes often call duplications as single regions and obviously having two sets of genes instead of one can result in 2X as many gene products.Genome variationRepetitive regions may be incomplete or uncharacterized and these may make one organism different from another. Location of genes across the genome can be important. Operon Structure and Gene RegulationGenes near each other may work in concert. Knowing whether genes are near each other or across the genome from each other can be important.Genes closer to the origin of replication will have different expression than genes further away.Genomic rearrangements like inversions can be missed and these events can lead to different gene expression because they change the order of genes.Accuracy and Quality Simply have high quality and accurate sequence data to have the confidence to identify missing genes or sequences is important. Is it really missing or is an artifact of a sequencing error.Some of the Most interesting biology in missing sequence and/or missing genes. Could you be missing some of these important discoveries? We will see later how even some so called “finished” sequences still have accuracy and quality issues and WGM can help improve the quality and accuracy.
Let’s look again at the sequence project workflow. This time we’ll add more detail inlcuding estimated cost and length of each stage.Sequencing technology is rapidly advancing12 years first human genomeSmall genomes can be sequenced using shot gun methods in a couple of days. New advances are certainly improving the price performance Most of the advances are Focused on the first stage of Shot gun sequencing.The bottle neck of generating base pair has been moved downstreamBottom line it is fairly easy to get lots of base pairs these days and it is only getting cheaper. But that leaves you with the lowest quality draft.After the initial assembly the hard part starts: Closing gaps between your contigs, navigating regions with a high number of repeats, resequencing for validation etc. These tasks can represent over 50% of the length of a sequencing project and over 50% the cost!With Whole Genome Mapping our workflow now looks like this In the next couple slides you will see how having a WGM allows you to easily order and orient your contigs, identify any missassemblies and identity gaps that need to be sequenced across. Also a WGM acts as a sequence independent source of validation when resequencing.
So what you are actually doing is building a scaffold that is generated de novo and most importantly IN ORDERUsing software you can convert your Contigs to WGMs in silico, align to your Whole Genome Map and that is where the fun starts
Zoom in to get a better understanding of how this worksOrder and orient your ContigsRemember your WGM map is whole and in order!Identify gaps and overlaps
Let’s take a look at how accurately identifying your gaps by using a Whole genome map can lead to real life cost and time savingsI actually wrote a blog post that heavily referenced this paper that was published in BMC genomics last year.By knowing exactly where the gaps where this team was able to reduce the number of PCR experiments to just 43. Where working with the original assembly could have required anywhere from 600-3000 additional PCR experiments.
Remember this slide from earlier. We were talking about the amount of data (bps) we can generate and how it is increasing and price performance is improving dramatically. Another facet of this that further drives home the point that the bioinformatics is still the bottle neck is the concept of Sequencing Read Quality. That is also improving with the technology. However as the next example shows Sequencing Read quality does not equal Assembly quality.
While the quality of sequencing reads for some platforms can be as high as 99%; that doesn’t mean that the quality of the assembly is 99%OpGen presented data at the Sequence Finishing in the Future Meeting demonstrating that 62% of these Genbank sequences that may be used as reference Genomes for further resequencing projects contain significant Missassembly Errors
The question this raises “What would these types of error mean to your re-sequencing projects?”Transition—A recent publication has highlighted the concern of interpreted un-validated sequences and their use in down stream applications….
Sequence only the really important strains—NMRC story---They’ve OM over 60 isolates—found 2 isolate that are extremely novel (only 1 chromosome when there should be 2)—they are now sequencing the really important ones! Where would they be if they tried to sequence all these? What if the most interesting of isolates are not in the first 10 you sequence?Improve your sequence assembly pipeline—spend less time doing paired-end and mate-pair librariesFinishing genomes with limited resources: lessons from an ensemble of microbial genomes Nagarajan et al. BMC Genomics 2010, 11;242Optical Mapping reduces the number of PCRs needed“From a finishing perspective, these scaffolds are particularly useful, as for a set of n contigs, they help reduce the number of PCR experiments needed from roughly n2 to n [23].” P 7Example: “Using only 43 PCR experiments and 26 sequencing reactions 33 of the gaps were closed, leaving only 7 gaps to close. In contrast, working with the original assembly (59 large contigs) could have necessitated on the order of 592 ≈ 3000 PCR experiments (see Table 1).” P 4Optical Mapping is the Preferred method—new paper…..