SlideShare ist ein Scribd-Unternehmen logo
1 von 26
VISUALISING ERRORS IN
ANIMAL PEDIGREE
GENOTYPE DATA




Martin Graham, Jessie Kennedy, Trevor Paterson & Andy
Law
Edinburgh Napier University & The Roslin Institute, Univ of
Edinburgh, UK
Pedigrees
   Animal pedigrees are their family trees – who’s
    whose father, mother etc




   In animal breeding these pedigrees are strictly
    controlled to maximise traits of value or
    suppress unwanted ones
Pedigree Genotypes
   A genotype is the genetic make-up of an
    animal
                                          Example
   Pedigree + genotype = pedigree genotype
                                         Individual
                                             Marker Values

                                             M1     C|T
                                             M2     A|A
                                             M3     A|G
                                             ...    ...




   Not the whole genotype, use sets of markers
   Marker type: SNP (Single Nucleotide
    Polymorphism)
But...
   However, most large datasets have errors
     Errors when recording pedigree
     Technical errors e.g. wrongly detected marker

     Misassigned samples

     Also incomplete data



   These errors make the data genetically
    inconsistent
     Thismakes them unusable for most downstream
     analyses
Example

                   Mum              Dad
                                     ?
                   A|A             G || G
                                   G G
                                   C ?



                          Junior
                           A| C
                              C

   Various possibilities here
     Dad  is Juniors’ father but the genotyping is
      incorrect
     Dad isn’t Junior’s father and the genotypes are
      correct
   Need to find/isolate/clean such data
Table Viewer




   Current table-based viewer
       Grid of markers x individuals; genotype values in
        cells
       Universally ‘bad’ markers or individuals stand out
Table Viewer




   Expert biologists are needed to pinpoint the
    source of reported errors
   But without a pedigree context to anchor the
    errors in, it’s impossible to do this
Previous Work
   Multitude of pedigree viewers, but all have
    issues with scalability or handling extra
    (genotype) data
Voyage of Discovery
   Mainly discovering representations that didn’t
    work
   Iterated through a number of different
    representation styles that failed for various
    reasons
Node-Link View




   Can see that the pedigree clusters around a few
    males
   But hard to follow edges/directions, loss of
    generational context
Hierarchical Node-Link View




   Regain visual generation structure of pedigree
   But plagued with more edge crossings than
    before
Matrix View




   Matrices are the main alternative to drawing node-
    link diagrams for relational information
   We rejected having one overall matrix due to
    sparsity
Matrix View




   One matrix per generation ‘gap’ (parent 
    offspring)
       Rather than sources v sinks - sires v dams; offspring
        in cells
Sandwich View

   Realised that in these matrices, either the rows
    or columns will only have one filled cell each if
    one of the parent genders is monogamous
   In animal experiments this tends to be the
    case, a female breeds with only one male per
    generation
   Each matrix can thus be replaced with a
    compressed view
Sandwich View
          The sandwich view is a specialised view of the
           bipartite graph between two generations
            With
                the top layer split into males/females and the
            females pushed beneath the bottom layer
Parents                            Sires

Offsprin                           Offsprin
g                                  g

                                   Dams

                                    Connectors to repeated
                                    node representations if
                                    necessary
Sandwich View

   Sandwich view of the relationships between
    two adjacent generations
                     Sires (Male Parents)

                          Offspring

                   Dams (Female Parents)

                1 male has children
                with multiple females

   All the other pedigree views of full generations
    involved tracing paths between
    parents/offspring
Sandwich View
Error Information

   Colour is used to convey an individual’s error
    status over all the markers in a data set
   More errors = higher saturation
   Parent – coloured by overall error count
   Offspring drawn as hexagonal glyphs
     ‘Up’ triangle – incompatibilities with sire
     ‘Down’ triangle – incompatibilities with dam

     Middle portion – markers exist that are not present
      in either parent
Error Information
   Aggregating offspring




   Groups of siblings who share the same
    parents can be aggregated under one glyph
     Colouringnow represents errors in all markers
     over a group of individuals
   Troublesome families & parents can be clearly
Filtering
   Error Filtering
     The   table view (            ) clearly showed
      rogue markers and individuals, and these can be
      filtered by a user in that application
     To the sandwich view we add two complementary
      histograms that perform the same purpose
Filtering
   Error Filtering
     Each histogram shows number of errors along the X
      axis
     Number of individuals/markers with that number of
      errors on the Y axis
     Typical pattern: A few individuals / markers have lots
      of errors, and the majority have a few or no errors
        Mantra is to discard bad markers and look at bad
          individuals
Sandwich view
   Pic/Vid of full view (To Do)
Video
Conclusion
   Developed new style of pedigree visualisation

     Shows   detailed errors at a family level

     Shows   overview of errors in an entire pedigree

     Keeps  offspring close to their parents for family-
     centric view
Future Work
   Single marker views of errors

   Making the sandwich into a club sandwich
     Split the middle layer into multiple layers
     i.e. By gender to spot sex-related marker errors
Acknowledgements
   Reviewers
   BBSRC funded project

Weitere ähnliche Inhalte

Ähnlich wie Visualising errors in animal pedigree genotype data

Final Viper Prototype Presentation
Final Viper Prototype PresentationFinal Viper Prototype Presentation
Final Viper Prototype Presentationmartinjgraham
 
Gene tree-species tree methods in RevBayes
Gene tree-species tree methods in RevBayesGene tree-species tree methods in RevBayes
Gene tree-species tree methods in RevBayesboussau
 
Genome to pangenome : A doorway into crops genome exploration
Genome to pangenome : A doorway into crops genome explorationGenome to pangenome : A doorway into crops genome exploration
Genome to pangenome : A doorway into crops genome explorationKiranKm11
 
Genetics chapter 5 part 2(1)
Genetics chapter 5 part 2(1)Genetics chapter 5 part 2(1)
Genetics chapter 5 part 2(1)vanessawhitehawk
 
Back to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex DiseasesBack to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex DiseasesGolden Helix Inc
 
Comparative Genomics and Visualisation - Part 1
Comparative Genomics and Visualisation - Part 1Comparative Genomics and Visualisation - Part 1
Comparative Genomics and Visualisation - Part 1Leighton Pritchard
 
Geneticschapter5part21 140222104013-phpapp01
Geneticschapter5part21 140222104013-phpapp01Geneticschapter5part21 140222104013-phpapp01
Geneticschapter5part21 140222104013-phpapp01Cleophas Rwemera
 
Comparative genomics @ sid 2003 format
Comparative genomics @ sid 2003 formatComparative genomics @ sid 2003 format
Comparative genomics @ sid 2003 formatsidjena70
 
Ch 12 gene linkage groups and practice problems
Ch 12 gene linkage groups and practice problemsCh 12 gene linkage groups and practice problems
Ch 12 gene linkage groups and practice problemsStephanie Beck
 
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...Torsten Seemann
 
Pedigree analysis
Pedigree analysisPedigree analysis
Pedigree analysissbarkanic
 
Ch.5 Chromosome Mapping in Eukaryotes.pdf
Ch.5 Chromosome Mapping in Eukaryotes.pdfCh.5 Chromosome Mapping in Eukaryotes.pdf
Ch.5 Chromosome Mapping in Eukaryotes.pdfMaguyH1
 
17465299.ppt
17465299.ppt17465299.ppt
17465299.pptsanarao25
 
C value paradox unit-ii
C value paradox unit-iiC value paradox unit-ii
C value paradox unit-iiKamlakar More
 
Breeding Management Software | Phenome Networks
Breeding Management Software | Phenome NetworksBreeding Management Software | Phenome Networks
Breeding Management Software | Phenome NetworksPhenome Networks
 
Lesson 3 mendellian genetics & heredity
Lesson 3 mendellian genetics & heredityLesson 3 mendellian genetics & heredity
Lesson 3 mendellian genetics & hereditykimedillon
 
Essential Biology 10.2 Dihybrid Crosses & Gene Linkage (AHL)
Essential Biology 10.2 Dihybrid Crosses & Gene Linkage (AHL)Essential Biology 10.2 Dihybrid Crosses & Gene Linkage (AHL)
Essential Biology 10.2 Dihybrid Crosses & Gene Linkage (AHL)Stephen Taylor
 

Ähnlich wie Visualising errors in animal pedigree genotype data (20)

Final Viper Prototype Presentation
Final Viper Prototype PresentationFinal Viper Prototype Presentation
Final Viper Prototype Presentation
 
Gene tree-species tree methods in RevBayes
Gene tree-species tree methods in RevBayesGene tree-species tree methods in RevBayes
Gene tree-species tree methods in RevBayes
 
Genome to pangenome : A doorway into crops genome exploration
Genome to pangenome : A doorway into crops genome explorationGenome to pangenome : A doorway into crops genome exploration
Genome to pangenome : A doorway into crops genome exploration
 
Genetics chapter 5 part 2(1)
Genetics chapter 5 part 2(1)Genetics chapter 5 part 2(1)
Genetics chapter 5 part 2(1)
 
Back to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex DiseasesBack to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex Diseases
 
Comparative Genomics and Visualisation - Part 1
Comparative Genomics and Visualisation - Part 1Comparative Genomics and Visualisation - Part 1
Comparative Genomics and Visualisation - Part 1
 
Geneticschapter5part21 140222104013-phpapp01
Geneticschapter5part21 140222104013-phpapp01Geneticschapter5part21 140222104013-phpapp01
Geneticschapter5part21 140222104013-phpapp01
 
Comparative genomics @ sid 2003 format
Comparative genomics @ sid 2003 formatComparative genomics @ sid 2003 format
Comparative genomics @ sid 2003 format
 
Ch 12 gene linkage groups and practice problems
Ch 12 gene linkage groups and practice problemsCh 12 gene linkage groups and practice problems
Ch 12 gene linkage groups and practice problems
 
Combining ability study
Combining ability study Combining ability study
Combining ability study
 
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
 
Pedigree analysis
Pedigree analysisPedigree analysis
Pedigree analysis
 
Hertweck uva2012
Hertweck uva2012Hertweck uva2012
Hertweck uva2012
 
Ch.5 Chromosome Mapping in Eukaryotes.pdf
Ch.5 Chromosome Mapping in Eukaryotes.pdfCh.5 Chromosome Mapping in Eukaryotes.pdf
Ch.5 Chromosome Mapping in Eukaryotes.pdf
 
Introduction to heredity curriculum final
Introduction to heredity curriculum finalIntroduction to heredity curriculum final
Introduction to heredity curriculum final
 
17465299.ppt
17465299.ppt17465299.ppt
17465299.ppt
 
C value paradox unit-ii
C value paradox unit-iiC value paradox unit-ii
C value paradox unit-ii
 
Breeding Management Software | Phenome Networks
Breeding Management Software | Phenome NetworksBreeding Management Software | Phenome Networks
Breeding Management Software | Phenome Networks
 
Lesson 3 mendellian genetics & heredity
Lesson 3 mendellian genetics & heredityLesson 3 mendellian genetics & heredity
Lesson 3 mendellian genetics & heredity
 
Essential Biology 10.2 Dihybrid Crosses & Gene Linkage (AHL)
Essential Biology 10.2 Dihybrid Crosses & Gene Linkage (AHL)Essential Biology 10.2 Dihybrid Crosses & Gene Linkage (AHL)
Essential Biology 10.2 Dihybrid Crosses & Gene Linkage (AHL)
 

Mehr von martinjgraham

Exploring and Examining Assessment Data via a Matrix Visualisation
Exploring and Examining Assessment Data via a Matrix VisualisationExploring and Examining Assessment Data via a Matrix Visualisation
Exploring and Examining Assessment Data via a Matrix Visualisationmartinjgraham
 
Concept Visualisation over Multiple Taxonomic Hierarchies
Concept Visualisation over Multiple Taxonomic HierarchiesConcept Visualisation over Multiple Taxonomic Hierarchies
Concept Visualisation over Multiple Taxonomic Hierarchiesmartinjgraham
 
Final VIPER presentation at BioVis 2013
Final VIPER presentation at BioVis 2013Final VIPER presentation at BioVis 2013
Final VIPER presentation at BioVis 2013martinjgraham
 
Visualising Multiple Overlapping Hierarchies
Visualising Multiple Overlapping HierarchiesVisualising Multiple Overlapping Hierarchies
Visualising Multiple Overlapping Hierarchiesmartinjgraham
 
Re-architecting visualisations in Java Swing
Re-architecting visualisations in Java SwingRe-architecting visualisations in Java Swing
Re-architecting visualisations in Java Swingmartinjgraham
 
Set vs Graph-based visualisations of multiple trees
Set vs Graph-based visualisations of multiple treesSet vs Graph-based visualisations of multiple trees
Set vs Graph-based visualisations of multiple treesmartinjgraham
 
Enhancing Parallel Coordinates with Curves
Enhancing Parallel Coordinates with CurvesEnhancing Parallel Coordinates with Curves
Enhancing Parallel Coordinates with Curvesmartinjgraham
 

Mehr von martinjgraham (9)

Exploring and Examining Assessment Data via a Matrix Visualisation
Exploring and Examining Assessment Data via a Matrix VisualisationExploring and Examining Assessment Data via a Matrix Visualisation
Exploring and Examining Assessment Data via a Matrix Visualisation
 
Concept Visualisation over Multiple Taxonomic Hierarchies
Concept Visualisation over Multiple Taxonomic HierarchiesConcept Visualisation over Multiple Taxonomic Hierarchies
Concept Visualisation over Multiple Taxonomic Hierarchies
 
Final VIPER presentation at BioVis 2013
Final VIPER presentation at BioVis 2013Final VIPER presentation at BioVis 2013
Final VIPER presentation at BioVis 2013
 
TDWG 2013 Vesper
TDWG 2013 VesperTDWG 2013 Vesper
TDWG 2013 Vesper
 
Visualising Multiple Overlapping Hierarchies
Visualising Multiple Overlapping HierarchiesVisualising Multiple Overlapping Hierarchies
Visualising Multiple Overlapping Hierarchies
 
Re-architecting visualisations in Java Swing
Re-architecting visualisations in Java SwingRe-architecting visualisations in Java Swing
Re-architecting visualisations in Java Swing
 
Set vs Graph-based visualisations of multiple trees
Set vs Graph-based visualisations of multiple treesSet vs Graph-based visualisations of multiple trees
Set vs Graph-based visualisations of multiple trees
 
Enhancing Parallel Coordinates with Curves
Enhancing Parallel Coordinates with CurvesEnhancing Parallel Coordinates with Curves
Enhancing Parallel Coordinates with Curves
 
InfoVis General
InfoVis GeneralInfoVis General
InfoVis General
 

Kürzlich hochgeladen

Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 

Kürzlich hochgeladen (20)

Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 

Visualising errors in animal pedigree genotype data

  • 1. VISUALISING ERRORS IN ANIMAL PEDIGREE GENOTYPE DATA Martin Graham, Jessie Kennedy, Trevor Paterson & Andy Law Edinburgh Napier University & The Roslin Institute, Univ of Edinburgh, UK
  • 2. Pedigrees  Animal pedigrees are their family trees – who’s whose father, mother etc  In animal breeding these pedigrees are strictly controlled to maximise traits of value or suppress unwanted ones
  • 3. Pedigree Genotypes  A genotype is the genetic make-up of an animal Example  Pedigree + genotype = pedigree genotype Individual Marker Values M1 C|T M2 A|A M3 A|G ... ...  Not the whole genotype, use sets of markers  Marker type: SNP (Single Nucleotide Polymorphism)
  • 4. But...  However, most large datasets have errors  Errors when recording pedigree  Technical errors e.g. wrongly detected marker  Misassigned samples  Also incomplete data  These errors make the data genetically inconsistent  Thismakes them unusable for most downstream analyses
  • 5. Example Mum Dad ? A|A G || G G G C ? Junior A| C C  Various possibilities here  Dad is Juniors’ father but the genotyping is incorrect  Dad isn’t Junior’s father and the genotypes are correct  Need to find/isolate/clean such data
  • 6. Table Viewer  Current table-based viewer  Grid of markers x individuals; genotype values in cells  Universally ‘bad’ markers or individuals stand out
  • 7. Table Viewer  Expert biologists are needed to pinpoint the source of reported errors  But without a pedigree context to anchor the errors in, it’s impossible to do this
  • 8. Previous Work  Multitude of pedigree viewers, but all have issues with scalability or handling extra (genotype) data
  • 9. Voyage of Discovery  Mainly discovering representations that didn’t work  Iterated through a number of different representation styles that failed for various reasons
  • 10. Node-Link View  Can see that the pedigree clusters around a few males  But hard to follow edges/directions, loss of generational context
  • 11. Hierarchical Node-Link View  Regain visual generation structure of pedigree  But plagued with more edge crossings than before
  • 12. Matrix View  Matrices are the main alternative to drawing node- link diagrams for relational information  We rejected having one overall matrix due to sparsity
  • 13. Matrix View  One matrix per generation ‘gap’ (parent  offspring)  Rather than sources v sinks - sires v dams; offspring in cells
  • 14. Sandwich View  Realised that in these matrices, either the rows or columns will only have one filled cell each if one of the parent genders is monogamous  In animal experiments this tends to be the case, a female breeds with only one male per generation  Each matrix can thus be replaced with a compressed view
  • 15. Sandwich View  The sandwich view is a specialised view of the bipartite graph between two generations  With the top layer split into males/females and the females pushed beneath the bottom layer Parents Sires Offsprin Offsprin g g Dams Connectors to repeated node representations if necessary
  • 16. Sandwich View  Sandwich view of the relationships between two adjacent generations Sires (Male Parents) Offspring Dams (Female Parents) 1 male has children with multiple females  All the other pedigree views of full generations involved tracing paths between parents/offspring
  • 18. Error Information  Colour is used to convey an individual’s error status over all the markers in a data set  More errors = higher saturation  Parent – coloured by overall error count  Offspring drawn as hexagonal glyphs  ‘Up’ triangle – incompatibilities with sire  ‘Down’ triangle – incompatibilities with dam  Middle portion – markers exist that are not present in either parent
  • 19. Error Information  Aggregating offspring  Groups of siblings who share the same parents can be aggregated under one glyph  Colouringnow represents errors in all markers over a group of individuals  Troublesome families & parents can be clearly
  • 20. Filtering  Error Filtering  The table view ( ) clearly showed rogue markers and individuals, and these can be filtered by a user in that application  To the sandwich view we add two complementary histograms that perform the same purpose
  • 21. Filtering  Error Filtering  Each histogram shows number of errors along the X axis  Number of individuals/markers with that number of errors on the Y axis  Typical pattern: A few individuals / markers have lots of errors, and the majority have a few or no errors  Mantra is to discard bad markers and look at bad individuals
  • 22. Sandwich view  Pic/Vid of full view (To Do)
  • 23. Video
  • 24. Conclusion  Developed new style of pedigree visualisation  Shows detailed errors at a family level  Shows overview of errors in an entire pedigree  Keeps offspring close to their parents for family- centric view
  • 25. Future Work  Single marker views of errors  Making the sandwich into a club sandwich  Split the middle layer into multiple layers  i.e. By gender to spot sex-related marker errors
  • 26. Acknowledgements  Reviewers  BBSRC funded project

Hinweis der Redaktion

  1. By controlled, i.e. Controlling which animal mates with which other animals
  2. This data is the basis for studying genetic inheritance and mapping genes of interestSNPs are places along chromosomes where there is variation in a population’s genotypesTypically 1000s of markers and 1000s of individualsA restricted graph with multivariate data at each nodeIn a perfect world this would be the end of the presentation
  3. Incomplete data isn’t bad or erroneous though – it’s just missing
  4. Good for spotting bad markers and bad individuals (i.e. Obviously wrong individuals)
  5. Is the father bad? Are groups of offspring from the same mating reported bad? Etc etc
  6. Issues with handling multivariate data (genotyping) or easily associating family groups (offspring drawn distant from parents, parents not shown together)Individual centric views not appropriateA lot of the issues we repeated with our prototypes...
  7. Traditional force-directed view, is rubbishHierarchical data needs a hierarchy-preserving representation
  8. More edge crossings as placement is more restricted. Are methods for alleviating edge crossings (our dag drawing)
  9. Matrices avoid edge crossings. Also allows sorting of parents by properties. Still very sparse
  10. Matrices avoid edge crossings. Also allows sorting of parents by properties. Still very sparse
  11. In no way general purpose, works only because offspring have 2 links, one to a female parent, one to a male parent
  12. Males span several columns at a time. Vis is just an adapted Jtable at heart
  13. Larger scale view of the sandwich
  14. Can go by average error metric across individuals or max error metric of any individual in a family