SlideShare ist ein Scribd-Unternehmen logo
1 von 11
Dr. C.V. Suresh Babu
(CentreforKnowledgeTransfer)
institute
(CentreforKnowledgeTransfer)
institute
DISCUSSIONTOPICS
 Data Analytics Lifecycle
 Importance of Data Analytics Lifecycle
 Phase 1: Discovery
 Phase 2: Data Preparation
 Phase 3: Model Planning
 Phase 4: Model Building
 Phase 5: Communication Results
 Phase 6: Operationalize
 Data Analytics Lifecycle Example
(CentreforKnowledgeTransfer)
institute
DATA ANALYTICS LIFECYCLE
 The Data analytic lifecycle is designed for Big Data problems
and data science projects.
 The cycle is iterative to represent real project.
 To address the distinct requirements for performing analysis
on Big Data, step – by – step methodology is needed to
organize the activities and tasks involved with
 acquiring,
 processing,
 analyzing, and
 repurposing data.
(CentreforKnowledgeTransfer)
institute
IMPORTANCE OF DATA ANALYTICS
LIFECYCLE
 DataAnalytics Lifecycle defines the roadmap of how data is generated, collected,
processed, used, and analyzed to achieve business goals.
 It offers a systematic way to manage data for converting it into information that
can be used to fulfil organizational and project goals.
 The process provides the direction and methods to extract information from the
data and proceed in the right direction to accomplish business goals.
 Data professionals use the lifecycle’s circular form to proceed with data analytics
in either forward or backward direction.
 Based on the newly received insights, they can decide whether to proceed with
their existing research or scrap it and redo the complete analysis.
 The Data Analytics lifecycle guides them throughout this process.
(CentreforKnowledgeTransfer)
institute
PHASE 1: DISCOVERY
 The data science team learn and investigate the problem.
 Develop context and understanding.
 Come to know about data sources needed and available for the project.
 The team formulates initial hypothesis that can be later tested with data.
(CentreforKnowledgeTransfer)
institute
PHASE 2: DATA PREPARATION
 Steps to explore, preprocess, and condition data prior to modeling and analysis.
 It requires the presence of an analytic sandbox, the team execute, load, and
transform, to get data into the sandbox.
 Data preparation tasks are likely to be performed multiple times and not in
predefined order.
 Several tools commonly used for this phase are – Hadoop,Alpine Miner, Open
Refine, etc.
(CentreforKnowledgeTransfer)
institute
PHASE 3: MODEL PLANNING
 Team explores data to learn about relationships between variables and
subsequently, selects key variables and the most suitable models.
 In this phase, data science team develop data sets for training, testing, and
production purposes.
 Team builds and executes models based on the work done in the model planning
phase.
 Several tools commonly used for this phase are – Matlab, STASTICA
(CentreforKnowledgeTransfer)
institute
PHASE 4: MODEL BUILDING
 Team develops datasets for testing, training, and production purposes.
 Team also considers whether its existing tools will suffice for running the models
or if they need more robust environment for executing models.
 Free or open-source tools – Rand PL/R, Octave,WEKA.
 Commercial tools – Matlab , STASTICA.
(CentreforKnowledgeTransfer)
institute
PHASE 5: COMMUNICATION RESULTS
 After executing model team need to compare outcomes of modeling to criteria
established for success and failure.
 Team considers how best to articulate findings and outcomes to various team
members and stakeholders, taking into account warning, assumptions.
 Team should identify key findings, quantify business value, and develop narrative
to summarize and convey findings to stakeholders
(CentreforKnowledgeTransfer)
institute
PHASE 6: OPERATIONALIZE
 The team communicates benefits of project more broadly and sets up pilot
project to deploy work in controlled way before broadening the work to full
enterprise of users.
 This approach enables team to learn about performance and related constraints
of the model in production environment on small scale , and make adjustments
before full deployment.
 The team delivers final reports, briefings, codes.
 Free or open source tools – Octave,WEKA, SQL, MADlib.
(CentreforKnowledgeTransfer)
institute
DATA ANALYTICS LIFECYCLE EXAMPLE
 Consider an example of a retail store chain that wants to optimize its products’ prices for boosting its
revenue.
 The store chain has thousands of products over hundreds of outlets, making it a highly complex scenario.
 Once you identify the store chain’s objective, you find the data you need, prepare it, and go through the
Data Analytics lifecycle process.
 You observe different types of customers, such as ordinary customers and customers like contractors who
buy in bulk.
 According to you, treating various types of customers differently can give you the solution.
 However, you don’t have enough information about it and need to discuss this with the client team.
 In this case, you need to get the definition, find data, and conduct the hypothesis testing to check whether
various customer types impact the model results and get the right output.
 Once you are convinced with the model results, you can deploy the model, integrate it into the business,
and you are all set to deploy the prices you think are the most optimal across the outlets of the store.

Weitere ähnliche Inhalte

Was ist angesagt?

1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalitiesKrish_ver2
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Miningidnats
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data miningSlideshare
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notesMohit Saini
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data MiningValerii Klymchuk
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data miningDataminingTools Inc
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATAGauravBiswas9
 
Data Warehouse Architectures
Data Warehouse ArchitecturesData Warehouse Architectures
Data Warehouse ArchitecturesTheju Paul
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reductionKrish_ver2
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecturepcherukumalla
 

Was ist angesagt? (20)

1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Kdd process
Kdd processKdd process
Kdd process
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
Clustering
ClusteringClustering
Clustering
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Big Data & Data Mining
Big Data & Data MiningBig Data & Data Mining
Big Data & Data Mining
 
Data Models
Data ModelsData Models
Data Models
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATA
 
Data Warehouse Architectures
Data Warehouse ArchitecturesData Warehouse Architectures
Data Warehouse Architectures
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reduction
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 

Ähnlich wie Data Analytics Life Cycle

MODULE 1_Introduction to Data analytics and life cycle..pptx
MODULE 1_Introduction to Data analytics and life cycle..pptxMODULE 1_Introduction to Data analytics and life cycle..pptx
MODULE 1_Introduction to Data analytics and life cycle..pptxnikshaikh786
 
Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...RINUSATHYAN
 
Foundational Methodology for Data Science
Foundational Methodology for Data ScienceFoundational Methodology for Data Science
Foundational Methodology for Data ScienceJohn B. Rollins, Ph.D.
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data scienceShilpaKrishna6
 
Data warehouse design
Data warehouse designData warehouse design
Data warehouse designines beltaief
 
data-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdfdata-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdfDanilo Cardona
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Miningtobiemuir
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxShanmugasundaram M
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfData Science Council of America
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
 
IRJET- Testing Improvement in Business Intelligence Area
IRJET- Testing Improvement in Business Intelligence AreaIRJET- Testing Improvement in Business Intelligence Area
IRJET- Testing Improvement in Business Intelligence AreaIRJET Journal
 
Sachin Sawant_232644_CV
Sachin Sawant_232644_CVSachin Sawant_232644_CV
Sachin Sawant_232644_CVSachin Sawant
 
Sachin Sawant_232644_CV
Sachin Sawant_232644_CVSachin Sawant_232644_CV
Sachin Sawant_232644_CVSachin Sawant
 
Successfully supporting managerial decision-making is critically dep.pdf
Successfully supporting managerial decision-making is critically dep.pdfSuccessfully supporting managerial decision-making is critically dep.pdf
Successfully supporting managerial decision-making is critically dep.pdfanushasarees
 
The Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology WhitepaperThe Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology WhitepaperEdgar Alejandro Villegas
 

Ähnlich wie Data Analytics Life Cycle (20)

MODULE 1_Introduction to Data analytics and life cycle..pptx
MODULE 1_Introduction to Data analytics and life cycle..pptxMODULE 1_Introduction to Data analytics and life cycle..pptx
MODULE 1_Introduction to Data analytics and life cycle..pptx
 
Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...
 
Foundational Methodology for Data Science
Foundational Methodology for Data ScienceFoundational Methodology for Data Science
Foundational Methodology for Data Science
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data science
 
gn-160406200425 (1).pdf
gn-160406200425 (1).pdfgn-160406200425 (1).pdf
gn-160406200425 (1).pdf
 
Data warehouse design
Data warehouse designData warehouse design
Data warehouse design
 
data-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdfdata-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdf
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Mining
 
Sadchap3
Sadchap3Sadchap3
Sadchap3
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
IRJET- Testing Improvement in Business Intelligence Area
IRJET- Testing Improvement in Business Intelligence AreaIRJET- Testing Improvement in Business Intelligence Area
IRJET- Testing Improvement in Business Intelligence Area
 
Unit 2
Unit 2Unit 2
Unit 2
 
Sachin Sawant_232644_CV
Sachin Sawant_232644_CVSachin Sawant_232644_CV
Sachin Sawant_232644_CV
 
Sachin Sawant_232644_CV
Sachin Sawant_232644_CVSachin Sawant_232644_CV
Sachin Sawant_232644_CV
 
ml-02x01.pdf
ml-02x01.pdfml-02x01.pdf
ml-02x01.pdf
 
Dolap13 v9 7.docx
Dolap13 v9 7.docxDolap13 v9 7.docx
Dolap13 v9 7.docx
 
Successfully supporting managerial decision-making is critically dep.pdf
Successfully supporting managerial decision-making is critically dep.pdfSuccessfully supporting managerial decision-making is critically dep.pdf
Successfully supporting managerial decision-making is critically dep.pdf
 
The Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology WhitepaperThe Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology Whitepaper
 

Mehr von Dr. C.V. Suresh Babu (20)

Data analytics with R
Data analytics with RData analytics with R
Data analytics with R
 
Association rules
Association rulesAssociation rules
Association rules
 
Clustering
ClusteringClustering
Clustering
 
Classification
ClassificationClassification
Classification
 
Blue property assumptions.
Blue property assumptions.Blue property assumptions.
Blue property assumptions.
 
Introduction to regression
Introduction to regressionIntroduction to regression
Introduction to regression
 
DART
DARTDART
DART
 
Mycin
MycinMycin
Mycin
 
Expert systems
Expert systemsExpert systems
Expert systems
 
Dempster shafer theory
Dempster shafer theoryDempster shafer theory
Dempster shafer theory
 
Bayes network
Bayes networkBayes network
Bayes network
 
Bayes' theorem
Bayes' theoremBayes' theorem
Bayes' theorem
 
Knowledge based agents
Knowledge based agentsKnowledge based agents
Knowledge based agents
 
Rule based system
Rule based systemRule based system
Rule based system
 
Formal Logic in AI
Formal Logic in AIFormal Logic in AI
Formal Logic in AI
 
Production based system
Production based systemProduction based system
Production based system
 
Game playing in AI
Game playing in AIGame playing in AI
Game playing in AI
 
Diagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AIDiagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AI
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 

Kürzlich hochgeladen

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Kürzlich hochgeladen (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Data Analytics Life Cycle

  • 1. Dr. C.V. Suresh Babu (CentreforKnowledgeTransfer) institute
  • 2. (CentreforKnowledgeTransfer) institute DISCUSSIONTOPICS  Data Analytics Lifecycle  Importance of Data Analytics Lifecycle  Phase 1: Discovery  Phase 2: Data Preparation  Phase 3: Model Planning  Phase 4: Model Building  Phase 5: Communication Results  Phase 6: Operationalize  Data Analytics Lifecycle Example
  • 3. (CentreforKnowledgeTransfer) institute DATA ANALYTICS LIFECYCLE  The Data analytic lifecycle is designed for Big Data problems and data science projects.  The cycle is iterative to represent real project.  To address the distinct requirements for performing analysis on Big Data, step – by – step methodology is needed to organize the activities and tasks involved with  acquiring,  processing,  analyzing, and  repurposing data.
  • 4. (CentreforKnowledgeTransfer) institute IMPORTANCE OF DATA ANALYTICS LIFECYCLE  DataAnalytics Lifecycle defines the roadmap of how data is generated, collected, processed, used, and analyzed to achieve business goals.  It offers a systematic way to manage data for converting it into information that can be used to fulfil organizational and project goals.  The process provides the direction and methods to extract information from the data and proceed in the right direction to accomplish business goals.  Data professionals use the lifecycle’s circular form to proceed with data analytics in either forward or backward direction.  Based on the newly received insights, they can decide whether to proceed with their existing research or scrap it and redo the complete analysis.  The Data Analytics lifecycle guides them throughout this process.
  • 5. (CentreforKnowledgeTransfer) institute PHASE 1: DISCOVERY  The data science team learn and investigate the problem.  Develop context and understanding.  Come to know about data sources needed and available for the project.  The team formulates initial hypothesis that can be later tested with data.
  • 6. (CentreforKnowledgeTransfer) institute PHASE 2: DATA PREPARATION  Steps to explore, preprocess, and condition data prior to modeling and analysis.  It requires the presence of an analytic sandbox, the team execute, load, and transform, to get data into the sandbox.  Data preparation tasks are likely to be performed multiple times and not in predefined order.  Several tools commonly used for this phase are – Hadoop,Alpine Miner, Open Refine, etc.
  • 7. (CentreforKnowledgeTransfer) institute PHASE 3: MODEL PLANNING  Team explores data to learn about relationships between variables and subsequently, selects key variables and the most suitable models.  In this phase, data science team develop data sets for training, testing, and production purposes.  Team builds and executes models based on the work done in the model planning phase.  Several tools commonly used for this phase are – Matlab, STASTICA
  • 8. (CentreforKnowledgeTransfer) institute PHASE 4: MODEL BUILDING  Team develops datasets for testing, training, and production purposes.  Team also considers whether its existing tools will suffice for running the models or if they need more robust environment for executing models.  Free or open-source tools – Rand PL/R, Octave,WEKA.  Commercial tools – Matlab , STASTICA.
  • 9. (CentreforKnowledgeTransfer) institute PHASE 5: COMMUNICATION RESULTS  After executing model team need to compare outcomes of modeling to criteria established for success and failure.  Team considers how best to articulate findings and outcomes to various team members and stakeholders, taking into account warning, assumptions.  Team should identify key findings, quantify business value, and develop narrative to summarize and convey findings to stakeholders
  • 10. (CentreforKnowledgeTransfer) institute PHASE 6: OPERATIONALIZE  The team communicates benefits of project more broadly and sets up pilot project to deploy work in controlled way before broadening the work to full enterprise of users.  This approach enables team to learn about performance and related constraints of the model in production environment on small scale , and make adjustments before full deployment.  The team delivers final reports, briefings, codes.  Free or open source tools – Octave,WEKA, SQL, MADlib.
  • 11. (CentreforKnowledgeTransfer) institute DATA ANALYTICS LIFECYCLE EXAMPLE  Consider an example of a retail store chain that wants to optimize its products’ prices for boosting its revenue.  The store chain has thousands of products over hundreds of outlets, making it a highly complex scenario.  Once you identify the store chain’s objective, you find the data you need, prepare it, and go through the Data Analytics lifecycle process.  You observe different types of customers, such as ordinary customers and customers like contractors who buy in bulk.  According to you, treating various types of customers differently can give you the solution.  However, you don’t have enough information about it and need to discuss this with the client team.  In this case, you need to get the definition, find data, and conduct the hypothesis testing to check whether various customer types impact the model results and get the right output.  Once you are convinced with the model results, you can deploy the model, integrate it into the business, and you are all set to deploy the prices you think are the most optimal across the outlets of the store.