SlideShare a Scribd company logo
1 of 8
Download to read offline
IBM Software                                                                                                               Case Study




                                                           Large Gene Interaction
                                                           Analytics at University
                                                           at Buffalo, SUNY
                                                           Giving researchers the ability to speed computations
                                                           and increase data sets


                                                           The State University of New York (SUNY) at Buffalo is home to one of
               Overview                                    the leading multiple sclerosis (MS) research centers in the world. MS is
                                                           a devastating, chronic neurological disease that affects nearly one million
               The need
                                                           people worldwide. The disease causes physical and cognitive disabilities in
               Researchers required the ability to
               quickly build models using a range          individuals and is characterized by inflammation and neuro-degeneration
               of variable types and run them on a         of the brain and spinal cord. From the beginning, the genetics of MS were
               high-performing environment on huge         known to be complex and it was apparent that no single gene was likely
               data sets.
                                                           causative for the disease.
               The solution
               The solution helped the SUNY Buffalo        Since 2007, the SUNY team has been looking at data obtained from
               researchers consolidate all reporting and   scanned genomes of MS patients to identify genes whose variations could
               analysis in one location to improve the
                                                           contribute to the risk of developing MS. New technologies now enable
               efficiency, sophistication and impact of
               their research.                             hundreds of thousands of genetic variations, called single nucleotide
                                                           polymorphisms (SNPs), to be obtained from single samples. According
               The benefit                                 to research lead, Dr. Murali Ramanathan, a critical fact in the study of
               The SUNY researchers were able to           MS is that “gene products work by interacting with both other gene
               reduce the time required to conduct
               analysis from 27.2 hours without the        products and environmental factors.”
               IBM Netezza data warehouse appliance
               to 11.7 minutes with it.
IBM Software                                                                                                                Case Study




                                                            “Because of this, researchers have postulated that multiple SNPs—
                                                            combined with environmental variables—would better explain the risk of
         “There are lots of good                            developing MS. Finding these combinations is analogous to the prover-
          reasons to use Revolution                         bial search for a needle in a haystack,” says Dr. Ramanathan. Identifying a
                                                            candidate’s environmental factors that could be used to prevent the
          Analytics in an analytics                         disease from progressing in patients was of great importance. Examples
          appliance super computer                          of this include sun exposure and vitamin D levels, Epstein-Barr virus
          like Netezza. It’s faster                         infection and smoking.

          and easier to program.                            The researchers needed to create algorithms that would efficiently
          It will speed up our                              identify interactions and attempt ‘parsimony’—an extreme efficiency.
          computation. And, we                              The team was looking for the fewest number of SNPs, environmental
                                                            and phenotypic variable combinations that would help explain the
          have more data sets                               presence of MS.
          available to us because
                                                            The researchers at the State University of New York developed an
          of how flexible and quick                         approach they called AMBIENCE that is distinctive in its use of new
          Revolution Analytics                              information theoretic methods along with its versatility and scalability.
          makes it to add and                               The theoretic underpinnings of AMBIENCE enable the detection of
                                                            both linear and non-linear dependencies in the data. The AMBIENCE
          delete variables in our                           algorithm is capable of conducting an efficient search of the large
          model.”                                           combinatorial space because of the unique nature of the information
                                                            theoretic metrics, which allows for greedy search identification of the
           —Dr. Murali Ramanathan, University at Buffalo,
                                                            most promising combinations.
            State University of New York




                                                                  2
IBM Software                                                                                                                  Case Study




          Solution components
          ●●
               IBM Netezza® 1000-24
          ●●
               IBM Netezza Analytics Appliance

          IBM Business Partner
          ●●
               Revolution R Enterprise for
               IBM Netezza




                                                 Figure 1: A single nucleotide polymorphism (SNP) is a DNA sequence variation occur-
                                                 ring when a single nucleotide—A, C, G or T—in the shared genome differs in an individual.
                                                 If many individuals with a disease (or ‘phenotype’) have the same polymorphisms that
                                                 healthy individuals do not, this may be a clue to finding out genetic clues to what causes
                                                 a disease (Wikipedia).



                                                 The need
                                                 The data sets used in this type of multi-variable research are very large
                                                 and the analysis is computationally very demanding because the research-
                                                 ers are looking for significant interactions between thousands of genetic
                                                 and environmental factors. There are two issues to overcome: crunching
                                                 through the immense data set and building analytic models that allow the
                                                 team to look at more than simply first order interactions. The researchers
                                                 want to see not only which variable is significant, but also which pairs of
                                                 variables or which three variables are significant. This requires the ability
                                                 to quickly build models using a range of variable types and run them on a
                                                 high-performing environment on huge data sets. It also requires the abil-
                                                 ity to include an almost limitless variety of dependent variable types.




                                                        3
IBM Software                                                                      Case Study




               The computational challenge in gene-environmental interaction analyses
               is due to a phenomenon called ‘combinatorial explosion.’ Considering
               that there are thousands of SNPs, the number of combinations of SNPs
               that have to be assessed for uncovering potential interaction becomes
               incredibly large.

               Before they could run the analysis using all of the variables presented in
               the data set, the researchers needed an analytic framework that would
               allow them to add and remove variables from the model quickly and
               easily, without having to write hundreds of lines of code. The sheer
               number of SNPs combined with environmental variables and phenotype
               values mean that the amount of computations necessary for data mining
               could number in the quintillions (18 zeros).



                 Individual       SNiP1     SNiP2      SNiP3          SNiPn   Phenotype

                 Patient 1        A         C          C              C       No Disease

                 Patient 2        T         G          T              A       Disease

                 Patient 3        A         A          T              T       Disease




                 Patient N        C         A                         C       No Disease




               Figure 2: A heuristic example of the SUNY data table


               The solution
               SUNY Buffalo is using Revolution R Enterprise for IBM Netezza® in
               conjunction with the IBM Netezza Analytics Appliance to dramatically
               simplify and speed up very complex analysis on large data sets.

               The organization’s researchers knew they needed the level of processing
               only available through a high-performance computer (HPC). They also
               needed capabilities found in relational databases—not often included




                      4
IBM Software                                                                    Case Study




               in HPC platforms. HPCs typically run their processes only in parallel,
               breaking up calculations to run simultaneously across multiple processors.
               In addition, they also often used field-programmable gate array (FPGA)
               architectures for additional speed.

               The IBM Netezza data warehouse appliance offered the features and
               performance the researchers needed and much more. From an analytics
               perspective, the SUNY Buffalo researchers were able to write a version of
               their software tools in Revolution R Enterprise for IBM Netezza, so that
               all their reporting and analysis were consolidated in one location. This
               prevented the need to move very large amounts of data in and out of the
               IBM Netezza data warehouse appliance, which would cause further
               delays. They were also able to use a wider variety of data sets.

               Due to the nature of SUNY Buffalo’s research work, there is immense
               value in now being able to use a wide variety of data sets to study the
               interactions among a greater range of variables. The solution improved
               the efficiency, sophistication and impact of the research. SUNY’s team
               adopted R—the core of Revolution R Enterprise for IBM Netezza—
               because it offered the flexibility to include diverse types of variables
               such as categorical, discrete Poisson-dependent, or continuous normally-
               distributed variables by simply adding a few lines of code. In the past, the
               SUNY team would have to rewrite entire algorithms, requiring a great
               deal of staff time. Now, scientists can change the algorithm themselves by
               adding a new entropy function, and focus less on writing algorithms and
               more on the science.

               The benefit
               Once SUNY Buffalo deployed the IBM Netezza data warehouse
               appliance as its research analytics infrastructure along with Revolution R
               Enterprise for IBM Netezza, and the genetic data were assembled, the
               environmental and phenotype data were combined, and the algorithms
               were customized, the researchers were empowered to look for potential
               factors contributing to the risk of developing MS.




                     5
IBM Software                                                                       Case Study




               The SUNY researchers were able to:

               ●● ●
                      Use the new algorithms and add multiple variables that before were
                      nearly impossible to achieve.
               ●● ●
                      Reduce the time required to conduct analysis from 27.2 hours without
                      the IBM Netezza data warehouse appliance to 11.7 minutes with it.
               ●● ●
                      Carry out their research with little to no database administration.
                      (Unlike other HPC platforms or databases available, the IBM Netezza
                      data warehouse appliance was designed to require a minimum amount
                      of maintenance.)
               ●● ●
                      Publish multiple articles in scientific journals, with more in process.
               ●● ●
                      Proceed with studies based on ‘vector phenotypes’—a more complex
                      variable that will further push the IBM Netezza data warehouse
                      appliance platform.




               Figure 3: One of the articles published by SUNY researchers




                         6
IBM Software                                                                       Case Study




               For more information
               To learn more about the IBM Netezza data warehouse appliances,
               please contact your IBM representative or IBM Business Partner,
               or visit the following website:
               ibm.com/software/data/netezza

               For more information about Revolution Analytics, visit
               www.revolutionanalytics.com

               For more information on SUNY Buffalo, visit: www.buffalo.edu/, or for
               more information on SUNY’s research, consult the following sample
               publications:

               AMBIENCE: a novel approach and efficient algorithm for identifying informa-
               tive genetic and environmental associations with complex phenotypes. Chanda P,
               Sucheston L, Zhang A, Brazeau D, Freudenheim JL, Ambrosone C,
               Ramanathan M. Genetics. 2008 Oct; 180(2): 1191-210. Epub 2008 Sep 9.
               PMID: 18780753 [PubMed - indexed for MEDLINE]

               Information-theoretic gene-gene and gene-environment interaction analysis of
               quantitative traits. Chanda P, Sucheston L, Liu S, Zhang A, Ramanathan
               M. BMC Genomics. 2009 Nov 4; 10:509. PMID: 19889230.

               Comparison of information-theoretic to statistical methods for gene-gene
               interactions in the presence of genetic heterogeneity. Sucheston L, Chanda P,
               Zhang A, Tritchler D, Ramanathan M. BMC Genomics. 2010 Sep 3;
               11:487. PMID: 20815886.

               The interaction index, a novel information-theoretic metric for prioritizing
               interacting genetic variations and environmental factors. Chanda P, Sucheston
               L, Zhang A, Ramanathan M. Eur J Hum Genet. 2009 Oct;17(10):
               1274-86. Epub 2009 Mar 18. PMID: 19293841.




                     7
Please Recycle




                 IMC14675-USEN-01

More Related Content

What's hot

Stephen Friend Institute for Cancer Research 2011-11-01
Stephen Friend Institute for Cancer Research 2011-11-01Stephen Friend Institute for Cancer Research 2011-11-01
Stephen Friend Institute for Cancer Research 2011-11-01Sage Base
 
Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...
Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...
Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...eventi-ITBbari
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
 
Bioinformatics group presentation
Bioinformatics group presentationBioinformatics group presentation
Bioinformatics group presentationNaeem Ahmed
 
Paper Biology 280 S Minireview Advances In Cancer Detection And Therapeutics
Paper Biology 280 S Minireview Advances In Cancer Detection And TherapeuticsPaper Biology 280 S Minireview Advances In Cancer Detection And Therapeutics
Paper Biology 280 S Minireview Advances In Cancer Detection And TherapeuticsJoshua Mendoza-Elias
 
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p collegeSKUASTKashmir
 
Bioinformatics resources and search tools - report on summer training proj...
Bioinformatics   resources and search tools -  report on summer training proj...Bioinformatics   resources and search tools -  report on summer training proj...
Bioinformatics resources and search tools - report on summer training proj...Sapan Anand
 
Intro to metagenomic binning
Intro to metagenomic binningIntro to metagenomic binning
Intro to metagenomic binningA. Murat Eren
 
MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)
MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)
MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)IJCSEA Journal
 
Pistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier DatathonPistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier DatathonPistoia Alliance
 
Introduction to Bioinformatics Slides
Introduction to Bioinformatics SlidesIntroduction to Bioinformatics Slides
Introduction to Bioinformatics SlidesSaide OER Africa
 

What's hot (18)

Itbi
ItbiItbi
Itbi
 
Stephen Friend Institute for Cancer Research 2011-11-01
Stephen Friend Institute for Cancer Research 2011-11-01Stephen Friend Institute for Cancer Research 2011-11-01
Stephen Friend Institute for Cancer Research 2011-11-01
 
Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...
Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...
Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...
 
nm0915-965-2
nm0915-965-2nm0915-965-2
nm0915-965-2
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
 
Bioinformatics group presentation
Bioinformatics group presentationBioinformatics group presentation
Bioinformatics group presentation
 
Bioinformatics on internet
Bioinformatics on internetBioinformatics on internet
Bioinformatics on internet
 
WiML Poster
WiML PosterWiML Poster
WiML Poster
 
Paper Biology 280 S Minireview Advances In Cancer Detection And Therapeutics
Paper Biology 280 S Minireview Advances In Cancer Detection And TherapeuticsPaper Biology 280 S Minireview Advances In Cancer Detection And Therapeutics
Paper Biology 280 S Minireview Advances In Cancer Detection And Therapeutics
 
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p college
 
Bioinformatics resources and search tools - report on summer training proj...
Bioinformatics   resources and search tools -  report on summer training proj...Bioinformatics   resources and search tools -  report on summer training proj...
Bioinformatics resources and search tools - report on summer training proj...
 
Intro to metagenomic binning
Intro to metagenomic binningIntro to metagenomic binning
Intro to metagenomic binning
 
MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)
MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)
MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)
 
Pistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier DatathonPistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier Datathon
 
Bio Informatics
Bio InformaticsBio Informatics
Bio Informatics
 
Introduction to Bioinformatics Slides
Introduction to Bioinformatics SlidesIntroduction to Bioinformatics Slides
Introduction to Bioinformatics Slides
 
Kirmitzoglou_PhD_Final
Kirmitzoglou_PhD_FinalKirmitzoglou_PhD_Final
Kirmitzoglou_PhD_Final
 

Similar to Large Gene Interaction Analytics at University at Buffalo, SUNY Giving researchers the ability to speed computations and increase data sets

GENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMERGENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMERijcsit
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei LinChien-Wei Lin
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingIncedo
 
Friend harvard 2013-01-30
Friend harvard 2013-01-30Friend harvard 2013-01-30
Friend harvard 2013-01-30Sage Base
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBenjamin Good
 
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...rahulmonikasharma
 
CHAVEZ_SESSION23_ACADEMICPAPER.docx
CHAVEZ_SESSION23_ACADEMICPAPER.docxCHAVEZ_SESSION23_ACADEMICPAPER.docx
CHAVEZ_SESSION23_ACADEMICPAPER.docxArvieChavez1
 
Supervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For CancerSupervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For Cancerpaperpublications3
 
Application of Microarray Technology and softcomputing in cancer Biology
Application of Microarray Technology and softcomputing in cancer BiologyApplication of Microarray Technology and softcomputing in cancer Biology
Application of Microarray Technology and softcomputing in cancer BiologyCSCJournals
 
Introducción a la bioinformatica
Introducción a la bioinformaticaIntroducción a la bioinformatica
Introducción a la bioinformaticaMartín Arrieta
 
Personalized medicine through wes and big data analytics
Personalized medicine through wes and big data analyticsPersonalized medicine through wes and big data analytics
Personalized medicine through wes and big data analyticsJunaidAKG
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsmikaelhuss
 
BREAST CANCER DIAGNOSIS USING MACHINE LEARNING ALGORITHMS –A SURVEY
BREAST CANCER DIAGNOSIS USING MACHINE LEARNING ALGORITHMS –A SURVEYBREAST CANCER DIAGNOSIS USING MACHINE LEARNING ALGORITHMS –A SURVEY
BREAST CANCER DIAGNOSIS USING MACHINE LEARNING ALGORITHMS –A SURVEYijdpsjournal
 
enabling transparent, reproducible research
enabling transparent, reproducible researchenabling transparent, reproducible research
enabling transparent, reproducible researchBrian Bot
 

Similar to Large Gene Interaction Analytics at University at Buffalo, SUNY Giving researchers the ability to speed computations and increase data sets (20)

eScience-School-Oct2012-Campinas-Brazil
eScience-School-Oct2012-Campinas-BrazileScience-School-Oct2012-Campinas-Brazil
eScience-School-Oct2012-Campinas-Brazil
 
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMERGENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
 
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMERGENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei Lin
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Thesis ppt
Thesis pptThesis ppt
Thesis ppt
 
rheumatoid arthritis
rheumatoid arthritisrheumatoid arthritis
rheumatoid arthritis
 
Friend harvard 2013-01-30
Friend harvard 2013-01-30Friend harvard 2013-01-30
Friend harvard 2013-01-30
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiers
 
C0344023028
C0344023028C0344023028
C0344023028
 
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...
 
CHAVEZ_SESSION23_ACADEMICPAPER.docx
CHAVEZ_SESSION23_ACADEMICPAPER.docxCHAVEZ_SESSION23_ACADEMICPAPER.docx
CHAVEZ_SESSION23_ACADEMICPAPER.docx
 
Supervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For CancerSupervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For Cancer
 
Application of Microarray Technology and softcomputing in cancer Biology
Application of Microarray Technology and softcomputing in cancer BiologyApplication of Microarray Technology and softcomputing in cancer Biology
Application of Microarray Technology and softcomputing in cancer Biology
 
Introducción a la bioinformatica
Introducción a la bioinformaticaIntroducción a la bioinformatica
Introducción a la bioinformatica
 
Personalized medicine through wes and big data analytics
Personalized medicine through wes and big data analyticsPersonalized medicine through wes and big data analytics
Personalized medicine through wes and big data analytics
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 
BREAST CANCER DIAGNOSIS USING MACHINE LEARNING ALGORITHMS –A SURVEY
BREAST CANCER DIAGNOSIS USING MACHINE LEARNING ALGORITHMS –A SURVEYBREAST CANCER DIAGNOSIS USING MACHINE LEARNING ALGORITHMS –A SURVEY
BREAST CANCER DIAGNOSIS USING MACHINE LEARNING ALGORITHMS –A SURVEY
 
UNMSymposium2014
UNMSymposium2014UNMSymposium2014
UNMSymposium2014
 
enabling transparent, reproducible research
enabling transparent, reproducible researchenabling transparent, reproducible research
enabling transparent, reproducible research
 

More from IBM India Smarter Computing

Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments IBM India Smarter Computing
 
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...IBM India Smarter Computing
 
A Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization PerformanceA Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization PerformanceIBM India Smarter Computing
 
IBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architectureIBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architectureIBM India Smarter Computing
 

More from IBM India Smarter Computing (20)

Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments
 
All-flash Needs End to End Storage Efficiency
All-flash Needs End to End Storage EfficiencyAll-flash Needs End to End Storage Efficiency
All-flash Needs End to End Storage Efficiency
 
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
 
IBM FlashSystem 840 Product Guide
IBM FlashSystem 840 Product GuideIBM FlashSystem 840 Product Guide
IBM FlashSystem 840 Product Guide
 
IBM System x3250 M5
IBM System x3250 M5IBM System x3250 M5
IBM System x3250 M5
 
IBM NeXtScale nx360 M4
IBM NeXtScale nx360 M4IBM NeXtScale nx360 M4
IBM NeXtScale nx360 M4
 
IBM System x3650 M4 HD
IBM System x3650 M4 HDIBM System x3650 M4 HD
IBM System x3650 M4 HD
 
IBM System x3300 M4
IBM System x3300 M4IBM System x3300 M4
IBM System x3300 M4
 
IBM System x iDataPlex dx360 M4
IBM System x iDataPlex dx360 M4IBM System x iDataPlex dx360 M4
IBM System x iDataPlex dx360 M4
 
IBM System x3500 M4
IBM System x3500 M4IBM System x3500 M4
IBM System x3500 M4
 
IBM System x3550 M4
IBM System x3550 M4IBM System x3550 M4
IBM System x3550 M4
 
IBM System x3650 M4
IBM System x3650 M4IBM System x3650 M4
IBM System x3650 M4
 
IBM System x3500 M3
IBM System x3500 M3IBM System x3500 M3
IBM System x3500 M3
 
IBM System x3400 M3
IBM System x3400 M3IBM System x3400 M3
IBM System x3400 M3
 
IBM System x3250 M3
IBM System x3250 M3IBM System x3250 M3
IBM System x3250 M3
 
IBM System x3200 M3
IBM System x3200 M3IBM System x3200 M3
IBM System x3200 M3
 
IBM PowerVC Introduction and Configuration
IBM PowerVC Introduction and ConfigurationIBM PowerVC Introduction and Configuration
IBM PowerVC Introduction and Configuration
 
A Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization PerformanceA Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization Performance
 
IBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architectureIBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architecture
 
X6: The sixth generation of EXA Technology
X6: The sixth generation of EXA TechnologyX6: The sixth generation of EXA Technology
X6: The sixth generation of EXA Technology
 

Recently uploaded

TriStar Gold Corporate Presentation - April 2024
TriStar Gold Corporate Presentation - April 2024TriStar Gold Corporate Presentation - April 2024
TriStar Gold Corporate Presentation - April 2024Adnet Communications
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Kirill Klimov
 
Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Peter Ward
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdfShaun Heinrichs
 
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...Operational Excellence Consulting
 
Pitch deck sample detail for New Business Proposal
Pitch deck sample detail for New Business ProposalPitch deck sample detail for New Business Proposal
Pitch deck sample detail for New Business ProposalEvelina300651
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfrichard876048
 
business environment micro environment macro environment.pptx
business environment micro environment macro environment.pptxbusiness environment micro environment macro environment.pptx
business environment micro environment macro environment.pptxShruti Mittal
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy Verified Accounts
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfRbc Rbcua
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607dollysharma2066
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?Olivia Kresic
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCRashishs7044
 
Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...
Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...
Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...ssuserf63bd7
 
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...ssuserf63bd7
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaoncallgirls2057
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Pereraictsugar
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdfKhaled Al Awadi
 

Recently uploaded (20)

TriStar Gold Corporate Presentation - April 2024
TriStar Gold Corporate Presentation - April 2024TriStar Gold Corporate Presentation - April 2024
TriStar Gold Corporate Presentation - April 2024
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024
 
Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf
 
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
 
Pitch deck sample detail for New Business Proposal
Pitch deck sample detail for New Business ProposalPitch deck sample detail for New Business Proposal
Pitch deck sample detail for New Business Proposal
 
Corporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information TechnologyCorporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information Technology
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdf
 
business environment micro environment macro environment.pptx
business environment micro environment macro environment.pptxbusiness environment micro environment macro environment.pptx
business environment micro environment macro environment.pptx
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail Accounts
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdf
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
 
Call Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North GoaCall Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North Goa
 
Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...
Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...
Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...
 
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Perera
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
 

Large Gene Interaction Analytics at University at Buffalo, SUNY Giving researchers the ability to speed computations and increase data sets

  • 1. IBM Software Case Study Large Gene Interaction Analytics at University at Buffalo, SUNY Giving researchers the ability to speed computations and increase data sets The State University of New York (SUNY) at Buffalo is home to one of Overview the leading multiple sclerosis (MS) research centers in the world. MS is a devastating, chronic neurological disease that affects nearly one million The need people worldwide. The disease causes physical and cognitive disabilities in Researchers required the ability to quickly build models using a range individuals and is characterized by inflammation and neuro-degeneration of variable types and run them on a of the brain and spinal cord. From the beginning, the genetics of MS were high-performing environment on huge known to be complex and it was apparent that no single gene was likely data sets. causative for the disease. The solution The solution helped the SUNY Buffalo Since 2007, the SUNY team has been looking at data obtained from researchers consolidate all reporting and scanned genomes of MS patients to identify genes whose variations could analysis in one location to improve the contribute to the risk of developing MS. New technologies now enable efficiency, sophistication and impact of their research. hundreds of thousands of genetic variations, called single nucleotide polymorphisms (SNPs), to be obtained from single samples. According The benefit to research lead, Dr. Murali Ramanathan, a critical fact in the study of The SUNY researchers were able to MS is that “gene products work by interacting with both other gene reduce the time required to conduct analysis from 27.2 hours without the products and environmental factors.” IBM Netezza data warehouse appliance to 11.7 minutes with it.
  • 2. IBM Software Case Study “Because of this, researchers have postulated that multiple SNPs— combined with environmental variables—would better explain the risk of “There are lots of good developing MS. Finding these combinations is analogous to the prover- reasons to use Revolution bial search for a needle in a haystack,” says Dr. Ramanathan. Identifying a candidate’s environmental factors that could be used to prevent the Analytics in an analytics disease from progressing in patients was of great importance. Examples appliance super computer of this include sun exposure and vitamin D levels, Epstein-Barr virus like Netezza. It’s faster infection and smoking. and easier to program. The researchers needed to create algorithms that would efficiently It will speed up our identify interactions and attempt ‘parsimony’—an extreme efficiency. computation. And, we The team was looking for the fewest number of SNPs, environmental and phenotypic variable combinations that would help explain the have more data sets presence of MS. available to us because The researchers at the State University of New York developed an of how flexible and quick approach they called AMBIENCE that is distinctive in its use of new Revolution Analytics information theoretic methods along with its versatility and scalability. makes it to add and The theoretic underpinnings of AMBIENCE enable the detection of both linear and non-linear dependencies in the data. The AMBIENCE delete variables in our algorithm is capable of conducting an efficient search of the large model.” combinatorial space because of the unique nature of the information theoretic metrics, which allows for greedy search identification of the —Dr. Murali Ramanathan, University at Buffalo, most promising combinations. State University of New York 2
  • 3. IBM Software Case Study Solution components ●● IBM Netezza® 1000-24 ●● IBM Netezza Analytics Appliance IBM Business Partner ●● Revolution R Enterprise for IBM Netezza Figure 1: A single nucleotide polymorphism (SNP) is a DNA sequence variation occur- ring when a single nucleotide—A, C, G or T—in the shared genome differs in an individual. If many individuals with a disease (or ‘phenotype’) have the same polymorphisms that healthy individuals do not, this may be a clue to finding out genetic clues to what causes a disease (Wikipedia). The need The data sets used in this type of multi-variable research are very large and the analysis is computationally very demanding because the research- ers are looking for significant interactions between thousands of genetic and environmental factors. There are two issues to overcome: crunching through the immense data set and building analytic models that allow the team to look at more than simply first order interactions. The researchers want to see not only which variable is significant, but also which pairs of variables or which three variables are significant. This requires the ability to quickly build models using a range of variable types and run them on a high-performing environment on huge data sets. It also requires the abil- ity to include an almost limitless variety of dependent variable types. 3
  • 4. IBM Software Case Study The computational challenge in gene-environmental interaction analyses is due to a phenomenon called ‘combinatorial explosion.’ Considering that there are thousands of SNPs, the number of combinations of SNPs that have to be assessed for uncovering potential interaction becomes incredibly large. Before they could run the analysis using all of the variables presented in the data set, the researchers needed an analytic framework that would allow them to add and remove variables from the model quickly and easily, without having to write hundreds of lines of code. The sheer number of SNPs combined with environmental variables and phenotype values mean that the amount of computations necessary for data mining could number in the quintillions (18 zeros). Individual SNiP1 SNiP2 SNiP3 SNiPn Phenotype Patient 1 A C C C No Disease Patient 2 T G T A Disease Patient 3 A A T T Disease Patient N C A C No Disease Figure 2: A heuristic example of the SUNY data table The solution SUNY Buffalo is using Revolution R Enterprise for IBM Netezza® in conjunction with the IBM Netezza Analytics Appliance to dramatically simplify and speed up very complex analysis on large data sets. The organization’s researchers knew they needed the level of processing only available through a high-performance computer (HPC). They also needed capabilities found in relational databases—not often included 4
  • 5. IBM Software Case Study in HPC platforms. HPCs typically run their processes only in parallel, breaking up calculations to run simultaneously across multiple processors. In addition, they also often used field-programmable gate array (FPGA) architectures for additional speed. The IBM Netezza data warehouse appliance offered the features and performance the researchers needed and much more. From an analytics perspective, the SUNY Buffalo researchers were able to write a version of their software tools in Revolution R Enterprise for IBM Netezza, so that all their reporting and analysis were consolidated in one location. This prevented the need to move very large amounts of data in and out of the IBM Netezza data warehouse appliance, which would cause further delays. They were also able to use a wider variety of data sets. Due to the nature of SUNY Buffalo’s research work, there is immense value in now being able to use a wide variety of data sets to study the interactions among a greater range of variables. The solution improved the efficiency, sophistication and impact of the research. SUNY’s team adopted R—the core of Revolution R Enterprise for IBM Netezza— because it offered the flexibility to include diverse types of variables such as categorical, discrete Poisson-dependent, or continuous normally- distributed variables by simply adding a few lines of code. In the past, the SUNY team would have to rewrite entire algorithms, requiring a great deal of staff time. Now, scientists can change the algorithm themselves by adding a new entropy function, and focus less on writing algorithms and more on the science. The benefit Once SUNY Buffalo deployed the IBM Netezza data warehouse appliance as its research analytics infrastructure along with Revolution R Enterprise for IBM Netezza, and the genetic data were assembled, the environmental and phenotype data were combined, and the algorithms were customized, the researchers were empowered to look for potential factors contributing to the risk of developing MS. 5
  • 6. IBM Software Case Study The SUNY researchers were able to: ●● ● Use the new algorithms and add multiple variables that before were nearly impossible to achieve. ●● ● Reduce the time required to conduct analysis from 27.2 hours without the IBM Netezza data warehouse appliance to 11.7 minutes with it. ●● ● Carry out their research with little to no database administration. (Unlike other HPC platforms or databases available, the IBM Netezza data warehouse appliance was designed to require a minimum amount of maintenance.) ●● ● Publish multiple articles in scientific journals, with more in process. ●● ● Proceed with studies based on ‘vector phenotypes’—a more complex variable that will further push the IBM Netezza data warehouse appliance platform. Figure 3: One of the articles published by SUNY researchers 6
  • 7. IBM Software Case Study For more information To learn more about the IBM Netezza data warehouse appliances, please contact your IBM representative or IBM Business Partner, or visit the following website: ibm.com/software/data/netezza For more information about Revolution Analytics, visit www.revolutionanalytics.com For more information on SUNY Buffalo, visit: www.buffalo.edu/, or for more information on SUNY’s research, consult the following sample publications: AMBIENCE: a novel approach and efficient algorithm for identifying informa- tive genetic and environmental associations with complex phenotypes. Chanda P, Sucheston L, Zhang A, Brazeau D, Freudenheim JL, Ambrosone C, Ramanathan M. Genetics. 2008 Oct; 180(2): 1191-210. Epub 2008 Sep 9. PMID: 18780753 [PubMed - indexed for MEDLINE] Information-theoretic gene-gene and gene-environment interaction analysis of quantitative traits. Chanda P, Sucheston L, Liu S, Zhang A, Ramanathan M. BMC Genomics. 2009 Nov 4; 10:509. PMID: 19889230. Comparison of information-theoretic to statistical methods for gene-gene interactions in the presence of genetic heterogeneity. Sucheston L, Chanda P, Zhang A, Tritchler D, Ramanathan M. BMC Genomics. 2010 Sep 3; 11:487. PMID: 20815886. The interaction index, a novel information-theoretic metric for prioritizing interacting genetic variations and environmental factors. Chanda P, Sucheston L, Zhang A, Ramanathan M. Eur J Hum Genet. 2009 Oct;17(10): 1274-86. Epub 2009 Mar 18. PMID: 19293841. 7
  • 8. Please Recycle IMC14675-USEN-01