Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
British Columbia Cancer Agency<br />Genome Sciences Centre<br />Vancouver . British Columbia . Canada<br />Complementing C...
Discovery path<br />Biological Sample<br />Genomic Data<br />Scientific Insight<br />
Discovery path<br />Biological Sample<br />Genomic Data<br />Scientific Insight<br />
Components of Data Analysis<br />Automation<br />Analysis<br />Genomic Data<br />Scientific Insight<br />Human Judgment<br />
Outline<br />Genome Assembly Visualization<br />ABySS-Explorer<br />Complement to genome browsing <br />Using clustering a...
Outline<br />Genome Assembly Visualization<br />ABySS-Explorer<br />Complement to genome browsing <br />Using clustering a...
Genome Sequencing<br />cell population<br />extracted DNA<br />Shotgun approach<br />sheared DNA<br />sequencing reads<br ...
ABySS – Assembly ByShort Sequences<br />Simpson et al. Genome Res 2009<br />Sequencing read set (read length = 7 nt):<br /...
ABySS – Assembly ByShort Sequences<br />Simpson et al. Genome Res 2009<br />Sequencing read set (read length = 7 nt):<br /...
Assembly Ambiguities<br />True genome sequence<br />GGATTGAAAAAAAAAAAAAAAAGTAGCACGAATATACATAGAAAAAAAAAAAAAAAAATTACG<br />
Assembly Ambiguities<br />True genome sequence<br />GGATTGAAAAAAAAAAAAAAAAGTAGCACGAATATACATAGAAAAAAAAAAAAAAAAATTACG<br />A...
Starting Point<br />Shaun Jackman<br />
Example of existing tools: Consed<br />
Example of existing tools: Consed<br />
Properties of DNA<br />
Capture sequence strand<br />AAAAAT<br />2+<br />1+<br />
Capture sequence strand<br />AAAAAT<br />2+<br />1+<br />TTTTTA<br />2-<br />1-<br />
Capture sequence strand<br />AAAAAT<br />1+<br />2+<br />TTTTTA<br />
Capture sequence strand<br />AAAAAT<br />1-<br />2-<br />TTTTTA<br />
Capture sequence length<br />one oscillation = 100 nt<br />
Genome Sequencing<br />cell population<br />extracted DNA<br />read pair information<br />read<br />sheared DNA<br />dsDNA...
Capture read pair information<br />After building the initial single-end (SE) contigs from k-mer sequences, ABySS uses pai...
Capture read pair information<br />Paired end read information is used the construct paired end (PE) contigs<br />… 13+  4...
ABySS-Explorer<br /><ul><li> Visual representation of:
 contig adjacency information
 contig strand
 contig length
 paired-end relationships
 paired-end contigs
 Implemented using the Java Universal Network/Graph Framework (JUNG)
 Applied the Kamada-Kawai layout algorithm (JUNG implementation)
 Use ABySS files as input (version 1.1.0 and higher)</li></li></ul><li>
http://www.bcgsc.ca/platform/bioinfo/software/abyss-explorer<br />
Part 1: Conclusions and Future Work<br /><ul><li> Graph encoding provides a integrated display of genome assemblies and as...
 This representation is particularly powerful for revealing high-level genome assembly structure, not readily viewable in ...
 Future work includes:
 support for other assembly algorithm outputs
enable flexible annotation display
 integrate with existing assembly editing tools</li></li></ul><li>Outline<br />Genome Assembly Visualization<br />ABySS-Ex...
Genome Sequencing<br />cell population<br />extracted DNA<br />sheared DNA<br />sequencing reads<br />(typically produce m...
Genome Sequencing<br />cell population<br />extracted DNA<br />sheared DNA<br />sequencing reads<br />(typically produce m...
Genome Sequencing<br />cell population<br />Chromatin Immunoprecipitationand Sequencing <br />(ChIP-Seq)<br />extracted DN...
Align sequences to the genome<br />CCGAGTACAGCCTGACAGA<br />GCATGACAGTCCGAGTAC<br />TTGCATGACAGTCCGAGT<br />AGCGGATTGCATGA...
Genome browser can reveal local patterns<br />H3K4me3<br />H3K36me3<br />H3K27me3<br />H3K9me3<br />H3K9Ac<br />MRE<br />
Difficult to get global overview<br />
Focus on regions of interest<br />1. For example, transcriptional start sites (TSS +/- 3000 nt)<br />H3K4me3<br />H3K9Ac<b...
Nächste SlideShare
Wird geladen in …5
×

Complementing Computation with Visualization in Genomics

1.552 Aufrufe

Veröffentlicht am

A look at Genome Assembly Visualization with ABySS-Explorer, as well as complementing genome browsing
(Using clustering and interactive data exploration)

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

Complementing Computation with Visualization in Genomics

  1. 1. British Columbia Cancer Agency<br />Genome Sciences Centre<br />Vancouver . British Columbia . Canada<br />Complementing Computation with Visualization in Genomics<br />March 11, 2010<br />EBI Interfaces Interest Forum<br />Cydney Nielsen<br />
  2. 2. Discovery path<br />Biological Sample<br />Genomic Data<br />Scientific Insight<br />
  3. 3. Discovery path<br />Biological Sample<br />Genomic Data<br />Scientific Insight<br />
  4. 4. Components of Data Analysis<br />Automation<br />Analysis<br />Genomic Data<br />Scientific Insight<br />Human Judgment<br />
  5. 5. Outline<br />Genome Assembly Visualization<br />ABySS-Explorer<br />Complement to genome browsing <br />Using clustering and interactive data exploration<br />
  6. 6. Outline<br />Genome Assembly Visualization<br />ABySS-Explorer<br />Complement to genome browsing <br />Using clustering and interactive data exploration<br />
  7. 7. Genome Sequencing<br />cell population<br />extracted DNA<br />Shotgun approach<br />sheared DNA<br />sequencing reads<br />AGCGGATTGCATGACAGT<br />GTACAGCCTGACAGAAGC<br />GCGCTACGATCAGATCAA<br />CATGACAGTCCGAGTACA<br />TTCAGAATGGTACAGCAG<br />
  8. 8. ABySS – Assembly ByShort Sequences<br />Simpson et al. Genome Res 2009<br />Sequencing read set (read length = 7 nt):<br />GGACATC<br />GGACAGA<br />Corresponding de Bruijn graph (k = 5 nt):<br />
  9. 9. ABySS – Assembly ByShort Sequences<br />Simpson et al. Genome Res 2009<br />Sequencing read set (read length = 7 nt):<br />GGACATC<br />GGACAGA<br />Corresponding de Bruijn graph (k = 5 nt):<br />ABySS merges unambiguously connected vertices to form contigs<br />
  10. 10. Assembly Ambiguities<br />True genome sequence<br />GGATTGAAAAAAAAAAAAAAAAGTAGCACGAATATACATAGAAAAAAAAAAAAAAAAATTACG<br />
  11. 11. Assembly Ambiguities<br />True genome sequence<br />GGATTGAAAAAAAAAAAAAAAAGTAGCACGAATATACATAGAAAAAAAAAAAAAAAAATTACG<br />Assembled sequence <br />de Bruijn graph representation<br />
  12. 12. Starting Point<br />Shaun Jackman<br />
  13. 13. Example of existing tools: Consed<br />
  14. 14. Example of existing tools: Consed<br />
  15. 15.
  16. 16.
  17. 17.
  18. 18. Properties of DNA<br />
  19. 19. Capture sequence strand<br />AAAAAT<br />2+<br />1+<br />
  20. 20. Capture sequence strand<br />AAAAAT<br />2+<br />1+<br />TTTTTA<br />2-<br />1-<br />
  21. 21. Capture sequence strand<br />AAAAAT<br />1+<br />2+<br />TTTTTA<br />
  22. 22. Capture sequence strand<br />AAAAAT<br />1-<br />2-<br />TTTTTA<br />
  23. 23.
  24. 24. Capture sequence length<br />one oscillation = 100 nt<br />
  25. 25. Genome Sequencing<br />cell population<br />extracted DNA<br />read pair information<br />read<br />sheared DNA<br />dsDNA<br />fragment<br />(known size)<br />sequencing reads<br />(typically produce millions)<br />AGCGGATTGCATGACAGT<br />read<br />GTACAGCCTGACAGAAGC<br />GCGCTACGATCAGATCAA<br />CATGACAGTCCGAGTACA<br />TTCAGAATGGTACAGCAG<br />
  26. 26. Capture read pair information<br />After building the initial single-end (SE) contigs from k-mer sequences, ABySS uses paired-end reads to resolve ambiguities.<br />
  27. 27. Capture read pair information<br />Paired end read information is used the construct paired end (PE) contigs<br />… 13+ 44- 46+ 4+ 79+ 70+ …<br />blue gradient = paired end contig<br />orange = selected single end contig<br />
  28. 28. ABySS-Explorer<br /><ul><li> Visual representation of:
  29. 29. contig adjacency information
  30. 30. contig strand
  31. 31. contig length
  32. 32. paired-end relationships
  33. 33. paired-end contigs
  34. 34. Implemented using the Java Universal Network/Graph Framework (JUNG)
  35. 35. Applied the Kamada-Kawai layout algorithm (JUNG implementation)
  36. 36. Use ABySS files as input (version 1.1.0 and higher)</li></li></ul><li>
  37. 37. http://www.bcgsc.ca/platform/bioinfo/software/abyss-explorer<br />
  38. 38. Part 1: Conclusions and Future Work<br /><ul><li> Graph encoding provides a integrated display of genome assemblies and associated meta-data
  39. 39. This representation is particularly powerful for revealing high-level genome assembly structure, not readily viewable in any other interactive tool
  40. 40. Future work includes:
  41. 41. support for other assembly algorithm outputs
  42. 42. enable flexible annotation display
  43. 43. integrate with existing assembly editing tools</li></li></ul><li>Outline<br />Genome Assembly Visualization<br />ABySS-Explorer<br />Complement to genome browsing <br />Using clustering and interactive data exploration<br />
  44. 44. Genome Sequencing<br />cell population<br />extracted DNA<br />sheared DNA<br />sequencing reads<br />(typically produce millions)<br />AGCGGATTGCATGACAGT<br />GTACAGCCTGACAGAAGC<br />GCGCTACGATCAGATCAA<br />CATGACAGTCCGAGTACA<br />TTCAGAATGGTACAGCAG<br />
  45. 45. Genome Sequencing<br />cell population<br />extracted DNA<br />sheared DNA<br />sequencing reads<br />(typically produce millions)<br />AGCGGATTGCATGACAGT<br />GTACAGCCTGACAGAAGC<br />GCGCTACGATCAGATCAA<br />CATGACAGTCCGAGTACA<br />TTCAGAATGGTACAGCAG<br />
  46. 46. Genome Sequencing<br />cell population<br />Chromatin Immunoprecipitationand Sequencing <br />(ChIP-Seq)<br />extracted DNA<br />selection<br />sheared DNA<br />sequencing reads<br />(typically produce millions)<br />AGCGGATTGCATGACAGT<br />GTACAGCCTGACAGAAGC<br />GCGCTACGATCAGATCAA<br />GTACAGCCTGACAGAAGC<br />CATGACAGTCCGAGTACA<br />TTCAGAATGGTACAGCAG<br />TTCAGAATGGTACAGCAG<br />
  47. 47. Align sequences to the genome<br />CCGAGTACAGCCTGACAGA<br />GCATGACAGTCCGAGTAC<br />TTGCATGACAGTCCGAGT<br />AGCGGATTGCATGACAGT<br />AGCGGATTGCATGACAGT<br />AGCGGATTGCATGACAGT<br />Reference Genome<br />AGCGGATTGCATGACAGTCCGAGTACAGCCTGACAGA<br />Read coverage<br />Genomic coordinate<br />
  48. 48. Genome browser can reveal local patterns<br />H3K4me3<br />H3K36me3<br />H3K27me3<br />H3K9me3<br />H3K9Ac<br />MRE<br />
  49. 49. Difficult to get global overview<br />
  50. 50. Focus on regions of interest<br />1. For example, transcriptional start sites (TSS +/- 3000 nt)<br />H3K4me3<br />H3K9Ac<br />H3K4me1<br />H3K36me3<br />MeDIP<br />MRE<br />2. Extract data matrices<br />Normalization for bin i, sample h:<br />3. Cluster matrices (k-means clustering with Euclidean distance)<br />
  51. 51. Focus on regions of interest<br />1. For example, transcriptional start sites (TSS +/- 3000 nt)<br />H3K4me3<br />H3K9Ac<br />H3K4me1<br />H3K36me3<br />MeDIP<br />MRE<br />2. Extract data matrices<br />Normalization for bin i, sample h:<br />3. Cluster matrices (k-means clustering with Euclidean distance)<br />
  52. 52. Focus on regions of interest<br />1. For example, transcriptional start sites (TSS +/- 3000 nt)<br />H3K4me3<br />H3K9Ac<br />H3K4me1<br />H3K36me3<br />MeDIP<br />MRE<br />2. Extract data matrices<br />Normalization for bin i, sample h:<br />3. Cluster matrices (k-means clustering with Euclidean distance)<br />
  53. 53. Enable interactive exploration<br />4. Interactive cluster visualization (data from H1 cells)<br />cluster size indicator (total n= 15,618)<br />H3K4me3<br />H3K9Ac<br />H3K4me1<br />H3K36me3<br />H3K27me3<br />H3K9me3<br />MeDIP<br />MRE<br />mRNA<br />H3K4me3<br />H3K9Ac<br />H3K4me1<br />H3K36me3<br />H3K27me3<br />H3K9me3<br />MeDIP<br />MRE<br />mRNA<br />cluster <br />(average values displayed)<br />individual TSS<br />H3K4me3<br />H3K9Ac<br />H3K4me1<br />H3K36me3<br />H3K27me3<br />H3K9me3<br />MeDIP<br />MRE<br />mRNA<br />HOXC12 gene<br />scroll bar to explore all cluster members<br />5. Link-out to UCSC genome browser<br />
  54. 54. Enable interactive exploration<br />4. Interactive cluster visualization (data from H1 cells)<br />cluster size indicator (total n= 15,618)<br />H3K4me3<br />H3K9Ac<br />H3K4me1<br />H3K36me3<br />H3K27me3<br />H3K9me3<br />MeDIP<br />MRE<br />mRNA<br />H3K4me3<br />H3K9Ac<br />H3K4me1<br />H3K36me3<br />H3K27me3<br />H3K9me3<br />MeDIP<br />MRE<br />mRNA<br />cluster <br />(average values displayed)<br />individual TSS<br />H3K4me3<br />H3K9Ac<br />H3K4me1<br />H3K36me3<br />H3K27me3<br />H3K9me3<br />MeDIP<br />MRE<br />mRNA<br />scroll bar to explore all cluster members<br />
  55. 55. Enable interactive exploration<br />4. Interactive cluster visualization (data from H1 cells)<br />cluster size indicator (total n= 15,618)<br />H3K4me3<br />H3K9Ac<br />H3K4me1<br />H3K36me3<br />H3K27me3<br />H3K9me3<br />MeDIP<br />MRE<br />mRNA<br />H3K4me3<br />H3K9Ac<br />H3K4me1<br />H3K36me3<br />H3K27me3<br />H3K9me3<br />MeDIP<br />MRE<br />mRNA<br />cluster <br />(average values displayed)<br />individual TSS<br />H3K4me3<br />H3K9Ac<br />H3K4me1<br />H3K36me3<br />H3K27me3<br />H3K9me3<br />MeDIP<br />MRE<br />mRNA<br />HOXC12 gene<br />scroll bar to explore all cluster members<br />5. Link-out to UCSC genome browser<br />
  56. 56.
  57. 57. Part 2: Conclusions and Future Work<br /><ul><li> Clustering reveals patterns that were not obvious using a genome browser.
  58. 58. Access to both global and detailed view is valuable
  59. 59. Future work includes:
  60. 60. search functionality (e.g. by region id)
  61. 61. integration with other clustering tools
  62. 62. richer analysis functionality (e.g. interactive clustering)</li></li></ul><li>Acknowledgements<br />NIH Epigenomics Roadmap<br />ABySS-Explorer<br />Joe Costello, UCSF<br />Peggy Farnham, UC Davis<br />Thea Tlsty, UCSF<br />Marco Marra<br />Martin Hirst<br />Yongjun Zhao<br />Nina Thiessen<br />Richard Varhol<br />Shaun Jackman<br />İnanç Birol<br />Jason Chang<br />Lymphoma Project Analyst<br />Karen Mungall<br />Supervisor<br />Primary Data Generation<br />Steven Jones<br />Lymphoma Genomics Team<br />
  63. 63.
  64. 64.
  65. 65. Complementing Computation with Visualization in Genomics<br />March 11, 2010<br />Cydney Nielsen<br />BC Cancer Agency<br />Genome Sciences Centre<br />Vancouver, Canada<br />

×