SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Applications and Trends in
       Data Mining


        Data Mining
             For
   Biological Data Analysis
Factors that led for the
           development
• The past decade has seen an explosive growth in:
     1.Genomics
     2.Proteomics
     3.Functional genomics
     4.Biomedical research

• Identification and comparative analysis of genomes of humans
  and other species for investigation of genetic networks.

• Development of new Pharmaceuticals and advances in cancer
  therapies.
• DNA sequences form the foundation of genetic codes of all
  living organisms.

• DNA sequences are comprised of four basic building blocks
  called nucleotides:
      1.adenine (A)
      2.cytosine (C)
      3.guanine (G)
      4.thymine (T)

• These four nucleotides (or bases) are combined to form long
  chains that resemble a twisted ladder.
• DNA sequence      … CTA CAC ACG TGT AAC …

• A gene usually comprises hundreds of individual nucleotides
  arranged in particular order.

• A genome is the complete set of genes of an organism.

• Genomics is the analysis of genome sequences.

• A proteome is the complete set of protein molecules present
  in a cell, tissue, or organism.

• Proteomics is the study of proteome sequences.
Data mining may contribute to
the biological data analysis in
    the following aspects.
Biological data mining has
become an essential part of
 new research field called
     bioinformatics.
1)Semantic integration of
heterogeneous, distributed genomic and
proteomic data bases.
• Genomic and proteomic data sets are often generated at
  different labs and by different methods.

• They are distributed, heterogeneous, and of wide variety.

• Integration of such data is essential to cross-site analysis of
  biological data .

• Such integration and linkage analysis would facilitate the
  systematic and coordinated analysis of genome and biological
  data.
• This has promoted the development of integrated data
  warehouses to store and manage derived biological data.

• Data cleaning, data integration, reference
  reconciliation, classification, and clustering methods will
  facilitate the integration of biological data and the
  construction of data warehouses for biological data analysis.
2)Alignment, indexing, similarity search, and
comparative analysis of multiple nucleotide/protein
sequences.
• BLAST and FASTA, in particular, are the tools for the systematic
  analysis of genomic and proteomic data.

• Biological sequence analysis methods differ from many
  sequential pattern analysis algorithms proposed in data
  mining.

• For protein sequences, two amino acids should also be
  considered a “match” if one can be derived from the other by
  substitutions that are likely to occur in nature.
• There is a combinatorial number of ways to approximately
  align multiple sequences:
  1)reducing a multiple alignment to a series of pair wise
  alignments and then combining the result.
   2)using Hidden Markow Models or HMMs.

• Multiple alignment can be used to identify highly conserved
  residues among genomes and they can be used to build
  phylogenetic trees to infer evolutionary relationships among
  species.

• Genomic and proteomic sequences isolated from diseased
  and healthy tissues can be compared to identify critical
  differences between them.

• Sequences occurring in the diseased samples may indicate the
  genetic factor of the disease.
3)Discovery of structural patterns and analysis of
genetic networks and protein pathways.
• Protein sequences are folded into 3D structures, and such
  structures interact with each other based on the relative
  position and distances between them.

• Such complex interactions lead to the formation of genetic
  networks and protein pathways.

• It is important to develop powerful and scalable data mining
  to discover patterns and to study about regularities and
  irregularities among complex biological network.
4)Association and path analysis: identifying co-
occurring gene sequences and linking genes to
different stages of disease development .
• Many studies have been focused on comparison of one gene
  to another.

• Most diseases are not triggered by a single gene but by a
  combination of genes acting together.

• Association analysis methods can be used to determine the
  kinds of genes that are likely to co-occur in target samples.

• A group of genes may contribute to a disease process, here
  path analysis is expected to play an important role.
5)Visualization tools in genetic data analysis.

• Alignments among genomic or proteomic sequences and
  interactions between them can be expressed in
     1)Graphic forms.
     2)Transformed into various kinds of easy-to-understand
        visual displays.
• They facilitate pattern understanding, knowledge
  discovery, and interactive data exploration.
Thank you

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
 
Genomics types
Genomics typesGenomics types
Genomics types
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
Genomic library
Genomic libraryGenomic library
Genomic library
 
Tools and database of NCBI
Tools and database of NCBITools and database of NCBI
Tools and database of NCBI
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
 
Protein Structure Prediction
Protein Structure PredictionProtein Structure Prediction
Protein Structure Prediction
 
Structural Variation Detection
Structural Variation DetectionStructural Variation Detection
Structural Variation Detection
 
Prosite
PrositeProsite
Prosite
 
Data retrieval tools
Data retrieval toolsData retrieval tools
Data retrieval tools
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
Microarray
MicroarrayMicroarray
Microarray
 
Genome Database Systems
Genome Database Systems Genome Database Systems
Genome Database Systems
 
Gene bank by kk sahu
Gene bank by kk sahuGene bank by kk sahu
Gene bank by kk sahu
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Expressed sequence tag (EST), molecular marker
Expressed sequence tag (EST), molecular markerExpressed sequence tag (EST), molecular marker
Expressed sequence tag (EST), molecular marker
 
Molecular phylogenetics
Molecular phylogeneticsMolecular phylogenetics
Molecular phylogenetics
 
DNA Sequencing
DNA SequencingDNA Sequencing
DNA Sequencing
 

Andere mochten auch

Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Miningidnats
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data miningDataminingTools Inc
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining ConceptsDung Nguyen
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial Salah Amean
 
Introduction to Data Mining / Bioinformatics
Introduction to Data Mining / BioinformaticsIntroduction to Data Mining / Bioinformatics
Introduction to Data Mining / BioinformaticsGerald Lushington
 
Chapter - 8.4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.4 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 8.4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformaticsbiinoida
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data MiningAmritanshu Mehra
 
Bioinformatics
BioinformaticsBioinformatics
BioinformaticsJTADrexel
 
5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological data5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological dataKrish_ver2
 
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and OpportunitiesApplied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and OpportunitiesHezekiah Fatoki
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES nadeem akhter
 
Open Reading Frames
Open Reading FramesOpen Reading Frames
Open Reading FramesOsama Zahid
 

Andere mochten auch (20)

Data mining
Data miningData mining
Data mining
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Introduction to Data Mining / Bioinformatics
Introduction to Data Mining / BioinformaticsIntroduction to Data Mining / Bioinformatics
Introduction to Data Mining / Bioinformatics
 
Chapter - 8.4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.4 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 8.4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Introduction data mining
Introduction data miningIntroduction data mining
Introduction data mining
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological data5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological data
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and OpportunitiesApplied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Open Reading Frames
Open Reading FramesOpen Reading Frames
Open Reading Frames
 

Ähnlich wie Data mining ppt

Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformaticsmaulikchaudhary8
 
BASIC OF BIOINFORMATICS.pptx
BASIC OF BIOINFORMATICS.pptxBASIC OF BIOINFORMATICS.pptx
BASIC OF BIOINFORMATICS.pptxDevaprasadPanda
 
genomics proteomics metbolomics.pptx
genomics proteomics metbolomics.pptxgenomics proteomics metbolomics.pptx
genomics proteomics metbolomics.pptxRajesh Yadav
 
Introduction to Modern Biosystemaics for Fungal Classification
Introduction to Modern Biosystemaics for Fungal ClassificationIntroduction to Modern Biosystemaics for Fungal Classification
Introduction to Modern Biosystemaics for Fungal ClassificationMrinal Vashisth
 
Molecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contructionMolecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contructionUdayBhanushali111
 
Integrative omics approches
Integrative omics approches   Integrative omics approches
Integrative omics approches Sayali Magar
 
Concept of genomics, proteomics and metabolomics
Concept of genomics, proteomics and metabolomicsConcept of genomics, proteomics and metabolomics
Concept of genomics, proteomics and metabolomicsMuragendraswami Astagimath
 
Genome data management
Genome data managementGenome data management
Genome data managementShareb Ismaeel
 
Bioinformatics, comparative genemics and proteomics
Bioinformatics, comparative genemics and proteomicsBioinformatics, comparative genemics and proteomics
Bioinformatics, comparative genemics and proteomicsjuancarlosrise
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSMSCW Mysore
 
Amia tb-review-12
Amia tb-review-12Amia tb-review-12
Amia tb-review-12Russ Altman
 
Epigeneticsand methylation
Epigeneticsand methylationEpigeneticsand methylation
Epigeneticsand methylationShubhda Roy
 
A comparative study using different measure of filteration
A comparative study using different measure of filterationA comparative study using different measure of filteration
A comparative study using different measure of filterationpurkaitjayati29
 
Genomics and proteomics by shreeman
Genomics and proteomics by shreemanGenomics and proteomics by shreeman
Genomics and proteomics by shreemanshreeman cs
 
Human genome project - Decoding the codes of life
Human genome project - Decoding the codes of lifeHuman genome project - Decoding the codes of life
Human genome project - Decoding the codes of lifearjunaa7
 

Ähnlich wie Data mining ppt (20)

Data Mining
Data Mining Data Mining
Data Mining
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
Types of genomics ppt
Types of genomics pptTypes of genomics ppt
Types of genomics ppt
 
BASIC OF BIOINFORMATICS.pptx
BASIC OF BIOINFORMATICS.pptxBASIC OF BIOINFORMATICS.pptx
BASIC OF BIOINFORMATICS.pptx
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
genomics proteomics metbolomics.pptx
genomics proteomics metbolomics.pptxgenomics proteomics metbolomics.pptx
genomics proteomics metbolomics.pptx
 
Introduction to Modern Biosystemaics for Fungal Classification
Introduction to Modern Biosystemaics for Fungal ClassificationIntroduction to Modern Biosystemaics for Fungal Classification
Introduction to Modern Biosystemaics for Fungal Classification
 
Molecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contructionMolecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contruction
 
Integrative omics approches
Integrative omics approches   Integrative omics approches
Integrative omics approches
 
Concept of genomics, proteomics and metabolomics
Concept of genomics, proteomics and metabolomicsConcept of genomics, proteomics and metabolomics
Concept of genomics, proteomics and metabolomics
 
Genome data management
Genome data managementGenome data management
Genome data management
 
Genetics and genomic
Genetics and genomicGenetics and genomic
Genetics and genomic
 
Bioinformatics, comparative genemics and proteomics
Bioinformatics, comparative genemics and proteomicsBioinformatics, comparative genemics and proteomics
Bioinformatics, comparative genemics and proteomics
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICS
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Amia tb-review-12
Amia tb-review-12Amia tb-review-12
Amia tb-review-12
 
Epigeneticsand methylation
Epigeneticsand methylationEpigeneticsand methylation
Epigeneticsand methylation
 
A comparative study using different measure of filteration
A comparative study using different measure of filterationA comparative study using different measure of filteration
A comparative study using different measure of filteration
 
Genomics and proteomics by shreeman
Genomics and proteomics by shreemanGenomics and proteomics by shreeman
Genomics and proteomics by shreeman
 
Human genome project - Decoding the codes of life
Human genome project - Decoding the codes of lifeHuman genome project - Decoding the codes of life
Human genome project - Decoding the codes of life
 

Data mining ppt

  • 1. Applications and Trends in Data Mining Data Mining For Biological Data Analysis
  • 2. Factors that led for the development • The past decade has seen an explosive growth in: 1.Genomics 2.Proteomics 3.Functional genomics 4.Biomedical research • Identification and comparative analysis of genomes of humans and other species for investigation of genetic networks. • Development of new Pharmaceuticals and advances in cancer therapies.
  • 3. • DNA sequences form the foundation of genetic codes of all living organisms. • DNA sequences are comprised of four basic building blocks called nucleotides: 1.adenine (A) 2.cytosine (C) 3.guanine (G) 4.thymine (T) • These four nucleotides (or bases) are combined to form long chains that resemble a twisted ladder.
  • 4.
  • 5. • DNA sequence … CTA CAC ACG TGT AAC … • A gene usually comprises hundreds of individual nucleotides arranged in particular order. • A genome is the complete set of genes of an organism. • Genomics is the analysis of genome sequences. • A proteome is the complete set of protein molecules present in a cell, tissue, or organism. • Proteomics is the study of proteome sequences.
  • 6. Data mining may contribute to the biological data analysis in the following aspects.
  • 7. Biological data mining has become an essential part of new research field called bioinformatics.
  • 8. 1)Semantic integration of heterogeneous, distributed genomic and proteomic data bases. • Genomic and proteomic data sets are often generated at different labs and by different methods. • They are distributed, heterogeneous, and of wide variety. • Integration of such data is essential to cross-site analysis of biological data . • Such integration and linkage analysis would facilitate the systematic and coordinated analysis of genome and biological data.
  • 9. • This has promoted the development of integrated data warehouses to store and manage derived biological data. • Data cleaning, data integration, reference reconciliation, classification, and clustering methods will facilitate the integration of biological data and the construction of data warehouses for biological data analysis.
  • 10. 2)Alignment, indexing, similarity search, and comparative analysis of multiple nucleotide/protein sequences. • BLAST and FASTA, in particular, are the tools for the systematic analysis of genomic and proteomic data. • Biological sequence analysis methods differ from many sequential pattern analysis algorithms proposed in data mining. • For protein sequences, two amino acids should also be considered a “match” if one can be derived from the other by substitutions that are likely to occur in nature.
  • 11. • There is a combinatorial number of ways to approximately align multiple sequences: 1)reducing a multiple alignment to a series of pair wise alignments and then combining the result. 2)using Hidden Markow Models or HMMs. • Multiple alignment can be used to identify highly conserved residues among genomes and they can be used to build phylogenetic trees to infer evolutionary relationships among species. • Genomic and proteomic sequences isolated from diseased and healthy tissues can be compared to identify critical differences between them. • Sequences occurring in the diseased samples may indicate the genetic factor of the disease.
  • 12. 3)Discovery of structural patterns and analysis of genetic networks and protein pathways. • Protein sequences are folded into 3D structures, and such structures interact with each other based on the relative position and distances between them. • Such complex interactions lead to the formation of genetic networks and protein pathways. • It is important to develop powerful and scalable data mining to discover patterns and to study about regularities and irregularities among complex biological network.
  • 13. 4)Association and path analysis: identifying co- occurring gene sequences and linking genes to different stages of disease development . • Many studies have been focused on comparison of one gene to another. • Most diseases are not triggered by a single gene but by a combination of genes acting together. • Association analysis methods can be used to determine the kinds of genes that are likely to co-occur in target samples. • A group of genes may contribute to a disease process, here path analysis is expected to play an important role.
  • 14. 5)Visualization tools in genetic data analysis. • Alignments among genomic or proteomic sequences and interactions between them can be expressed in 1)Graphic forms. 2)Transformed into various kinds of easy-to-understand visual displays. • They facilitate pattern understanding, knowledge discovery, and interactive data exploration.