SlideShare ist ein Scribd-Unternehmen logo
1 von 17
DSA – 105 Introduction to
Data Science
Week 3 – Steps involved in Data Science
Ferdin Joe John Joseph, PhD
Faculty of Information Technology
Thai-Nichi Institute of Technology
Week 3
Agenda
• Steps involved in Data Science
Faculty of Information Technology, Thai - Nichi Institute of
Technology
2
Process in Data Science Life Cycle (DSLC)
Faculty of Information Technology, Thai - Nichi Institute of
Technology
3
Faculty of Information Technology, Thai - Nichi Institute of
Technology
4
1. Business Understanding
Use data science to answer five types of questions:
• How much or how many? (regression)
• Which category? (classification)
• Which group? (clustering)
• Is this weird? (anomaly detection)
• Which option should be taken? (recommendation)
Faculty of Information Technology, Thai - Nichi Institute of
Technology
5
Data Mining
Decide on database usage
• Data Collection strategies and process
• Using of SQL queries
• Usage of dataframe packages like pandas
• Usage of JSON
• Usage of softwares to store and manage data
Faculty of Information Technology, Thai - Nichi Institute of
Technology
6
Data Cleaning
• Also known as “Data Janitor” work. The most important component.
• Cleaner the data, better the decisions.
• It consumes atleast 50% of the entire process.
• Eg. Manage the datatype of the values and convert wherever needed,
i.e. numerical values stored as integer or strings.
• Eg. Consistent format and spelling for categorical data.
‘Male’ or ‘male’
Faculty of Information Technology, Thai - Nichi Institute of
Technology
7
Data Exploration
• Brainstorming on what to do with ‘cleaned’ data
• Understand the bias and patterns in data
• Analyze a random subset of data and visualize them
• Look for anomalies and outliers in the data’s pattern
• Create hypotheses about data and problem on how the solution has
to be given
Faculty of Information Technology, Thai - Nichi Institute of
Technology
8
Feature Engineering
• A feature is a measurable property or attribute of a phenomenon
being observed.
• Feature engineering is the process of using domain knowledge to
transform your raw data into informative features that represent the
business problem you are trying to solve.
• There are 2 tasks in feature engineering
• Feature Selection
• Feature Construction
Faculty of Information Technology, Thai - Nichi Institute of
Technology
9
Feature Selection
• Feature selection is the process of cutting down the features that add
more noise than information.
• This avoids the complexity due to high-dimensional spaces
• It has three methods
• Filter methods (apply statistical measure to assign scoring to each feature)
• Wrapper methods (frame the selection of features as a search problem and
use a heuristic to perform the search)
• Embedded methods (use machine learning to figure out which features
contribute best to the accuracy)
Faculty of Information Technology, Thai - Nichi Institute of
Technology
10
Feature Construction
• Involves creating new features from the ones that is already available.
• For example, if you have a feature for age, but your model only cares
about if a person is an adult or minor, you could threshold it at 18,
and assign different categories to instances above and below that
threshold.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
11
Predictive Modelling
• Predictive modeling is where the machine learning finally comes into
your data science project.
• Based on the questions you asked in the business understanding
stage, this is where you decide which model to pick for your problem.
• The model that you end up training will be dependent on the size,
type and quality of your data, how much time and computational
resources you are willing to invest, and the type of output you intend
to derive.
• Trained model needs to be evaluated for its accuracy using validation
techniques like k-fold cross validation.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
12
Predictive Modeling
• Percentage of correct classification is used to measure the accuracy of
classification model
• ROC curves are plotted for true positive rate against false positive rate
• Coefficient of determination, Mean Square Error (MSE) and average
absolute error gives the correctness of regression models
Faculty of Information Technology, Thai - Nichi Institute of
Technology
13
Data Visualisation
• Combines the fields of communication, psychology, statistics, and art.
• Communicating the data in a simple yet effective and visually pleasing
way.
• Jupyter notebooks are having lot of packages for visualization. Eg
Matplotlib
• Drag n Drop tools like Tableau and Plotly
Faculty of Information Technology, Thai - Nichi Institute of
Technology
14
Goals of Data Science Process
• The goal of this process is to continue to move a data science project
forward towards a clear engagement end point.
• We recognize that data science is a research activity and that progress
often entails an approach that moves two steps forward and one step
(or worse) backwards.
• Being able to clearly communicate this to customers can help avoid
misunderstanding and frustration for all parties involved, and increase
the odds of success.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
15
Activity
• Perform Data Science Process on Olympic medal tally for events post
WW2
Faculty of Information Technology, Thai - Nichi Institute of
Technology
16
• Tools and Technologies in Data Science
Faculty of Information Technology, Thai - Nichi Institute of
Technology
17

Weitere ähnliche Inhalte

Was ist angesagt?

An insight into Educational Data Mining at Muğla Sıtkı Koçman University, Turkey
An insight into Educational Data Mining at Muğla Sıtkı Koçman University, TurkeyAn insight into Educational Data Mining at Muğla Sıtkı Koçman University, Turkey
An insight into Educational Data Mining at Muğla Sıtkı Koçman University, Turkeystrehlst
 
Education data mining presentation
Education data mining presentationEducation data mining presentation
Education data mining presentationNishabhanot1
 
Data Mining Techniques for School Failure and Dropout System
Data Mining Techniques for School Failure and Dropout SystemData Mining Techniques for School Failure and Dropout System
Data Mining Techniques for School Failure and Dropout SystemKumar Goud
 
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEDATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEIJDKP
 
Advances in Learning Analytics and Educational Data Mining
Advances in Learning Analytics and Educational Data Mining Advances in Learning Analytics and Educational Data Mining
Advances in Learning Analytics and Educational Data Mining MehrnooshV
 
Data mining to predict academic performance.
Data mining to predict academic performance. Data mining to predict academic performance.
Data mining to predict academic performance. Ranjith Gowda
 
Predicting students performance using classification techniques in data mining
Predicting students performance using classification techniques in data miningPredicting students performance using classification techniques in data mining
Predicting students performance using classification techniques in data miningLovely Professional University
 
A Nobel Approach On Educational Data Mining
A Nobel Approach On Educational Data MiningA Nobel Approach On Educational Data Mining
A Nobel Approach On Educational Data Miningijircee
 
Predicting student performance using aggregated data sources
Predicting student performance using aggregated data sourcesPredicting student performance using aggregated data sources
Predicting student performance using aggregated data sourcesOlugbenga Wilson Adejo
 
Evaluation of Data Mining Techniques for Predicting Student’s Performance
Evaluation of Data Mining Techniques for Predicting Student’s PerformanceEvaluation of Data Mining Techniques for Predicting Student’s Performance
Evaluation of Data Mining Techniques for Predicting Student’s PerformanceLovely Professional University
 
Methodology it capstone projet
Methodology it capstone projetMethodology it capstone projet
Methodology it capstone projetjune briones
 
Academic e-learning presentation
Academic e-learning presentationAcademic e-learning presentation
Academic e-learning presentationEslam Mohammed
 
Slides for for JSS "Happy Hour": Aligning software engineering education with...
Slides for for JSS "Happy Hour": Aligning software engineering education with...Slides for for JSS "Happy Hour": Aligning software engineering education with...
Slides for for JSS "Happy Hour": Aligning software engineering education with...Vahid Garousi
 
The Architecture of System for Predicting Student Performance based on the Da...
The Architecture of System for Predicting Student Performance based on the Da...The Architecture of System for Predicting Student Performance based on the Da...
The Architecture of System for Predicting Student Performance based on the Da...Thada Jantakoon
 
Clustering Students of Computer in Terms of Level of Programming
Clustering Students of Computer in Terms of Level of ProgrammingClustering Students of Computer in Terms of Level of Programming
Clustering Students of Computer in Terms of Level of ProgrammingEditor IJCATR
 
IRJET- Evaluation Technique of Student Performance in various Courses
IRJET- Evaluation Technique of Student Performance in various CoursesIRJET- Evaluation Technique of Student Performance in various Courses
IRJET- Evaluation Technique of Student Performance in various CoursesIRJET Journal
 
Students academic performance using clustering technique
Students academic performance using clustering techniqueStudents academic performance using clustering technique
Students academic performance using clustering techniquesaniacorreya
 
Educational Data Mining & Students Performance Prediction using SVM Techniques
Educational Data Mining & Students Performance Prediction using SVM TechniquesEducational Data Mining & Students Performance Prediction using SVM Techniques
Educational Data Mining & Students Performance Prediction using SVM TechniquesIRJET Journal
 
Student Performance Evaluation in Education Sector Using Prediction and Clust...
Student Performance Evaluation in Education Sector Using Prediction and Clust...Student Performance Evaluation in Education Sector Using Prediction and Clust...
Student Performance Evaluation in Education Sector Using Prediction and Clust...IJSRD
 

Was ist angesagt? (20)

An insight into Educational Data Mining at Muğla Sıtkı Koçman University, Turkey
An insight into Educational Data Mining at Muğla Sıtkı Koçman University, TurkeyAn insight into Educational Data Mining at Muğla Sıtkı Koçman University, Turkey
An insight into Educational Data Mining at Muğla Sıtkı Koçman University, Turkey
 
Education data mining presentation
Education data mining presentationEducation data mining presentation
Education data mining presentation
 
Data Mining Techniques for School Failure and Dropout System
Data Mining Techniques for School Failure and Dropout SystemData Mining Techniques for School Failure and Dropout System
Data Mining Techniques for School Failure and Dropout System
 
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEDATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
 
Advances in Learning Analytics and Educational Data Mining
Advances in Learning Analytics and Educational Data Mining Advances in Learning Analytics and Educational Data Mining
Advances in Learning Analytics and Educational Data Mining
 
Data mining to predict academic performance.
Data mining to predict academic performance. Data mining to predict academic performance.
Data mining to predict academic performance.
 
Predicting students performance using classification techniques in data mining
Predicting students performance using classification techniques in data miningPredicting students performance using classification techniques in data mining
Predicting students performance using classification techniques in data mining
 
A Nobel Approach On Educational Data Mining
A Nobel Approach On Educational Data MiningA Nobel Approach On Educational Data Mining
A Nobel Approach On Educational Data Mining
 
Predicting student performance using aggregated data sources
Predicting student performance using aggregated data sourcesPredicting student performance using aggregated data sources
Predicting student performance using aggregated data sources
 
Evaluation of Data Mining Techniques for Predicting Student’s Performance
Evaluation of Data Mining Techniques for Predicting Student’s PerformanceEvaluation of Data Mining Techniques for Predicting Student’s Performance
Evaluation of Data Mining Techniques for Predicting Student’s Performance
 
A critical review of literature in the kenyan context
A critical review of literature in the kenyan contextA critical review of literature in the kenyan context
A critical review of literature in the kenyan context
 
Methodology it capstone projet
Methodology it capstone projetMethodology it capstone projet
Methodology it capstone projet
 
Academic e-learning presentation
Academic e-learning presentationAcademic e-learning presentation
Academic e-learning presentation
 
Slides for for JSS "Happy Hour": Aligning software engineering education with...
Slides for for JSS "Happy Hour": Aligning software engineering education with...Slides for for JSS "Happy Hour": Aligning software engineering education with...
Slides for for JSS "Happy Hour": Aligning software engineering education with...
 
The Architecture of System for Predicting Student Performance based on the Da...
The Architecture of System for Predicting Student Performance based on the Da...The Architecture of System for Predicting Student Performance based on the Da...
The Architecture of System for Predicting Student Performance based on the Da...
 
Clustering Students of Computer in Terms of Level of Programming
Clustering Students of Computer in Terms of Level of ProgrammingClustering Students of Computer in Terms of Level of Programming
Clustering Students of Computer in Terms of Level of Programming
 
IRJET- Evaluation Technique of Student Performance in various Courses
IRJET- Evaluation Technique of Student Performance in various CoursesIRJET- Evaluation Technique of Student Performance in various Courses
IRJET- Evaluation Technique of Student Performance in various Courses
 
Students academic performance using clustering technique
Students academic performance using clustering techniqueStudents academic performance using clustering technique
Students academic performance using clustering technique
 
Educational Data Mining & Students Performance Prediction using SVM Techniques
Educational Data Mining & Students Performance Prediction using SVM TechniquesEducational Data Mining & Students Performance Prediction using SVM Techniques
Educational Data Mining & Students Performance Prediction using SVM Techniques
 
Student Performance Evaluation in Education Sector Using Prediction and Clust...
Student Performance Evaluation in Education Sector Using Prediction and Clust...Student Performance Evaluation in Education Sector Using Prediction and Clust...
Student Performance Evaluation in Education Sector Using Prediction and Clust...
 

Ähnlich wie 2019 DSA 105 Introduction to Data Science Week 3

Introduction to Data Science - Week 3 - Steps involved in Data Science
Introduction to Data Science - Week 3 - Steps involved in Data ScienceIntroduction to Data Science - Week 3 - Steps involved in Data Science
Introduction to Data Science - Week 3 - Steps involved in Data ScienceFerdin Joe John Joseph PhD
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and PlacementAkhilGGM
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?DIGITALSAI1
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabadVamsiNihal
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabadsaitejavella
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training HyderabadNithinsunil1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabadVamsiNihal
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
data science training and placement
data science training and placementdata science training and placement
data science training and placementSaiprasadVella
 
online data science training
online data science trainingonline data science training
online data science trainingDIGITALSAI1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabadVamsiNihal
 
data science online training in hyderabad
data science online training in hyderabaddata science online training in hyderabad
data science online training in hyderabadVamsiNihal
 
Best data science training in Hyderabad
Best data science training in HyderabadBest data science training in Hyderabad
Best data science training in HyderabadKumarNaik21
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training HyderabadNithinsunil1
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)SayyedYusufali
 
Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)SayyedYusufali
 
Data science training in hydpdf converted (1)
Data science training in hydpdf  converted (1)Data science training in hydpdf  converted (1)
Data science training in hydpdf converted (1)SayyedYusufali
 

Ähnlich wie 2019 DSA 105 Introduction to Data Science Week 3 (20)

Introduction to Data Science - Week 3 - Steps involved in Data Science
Introduction to Data Science - Week 3 - Steps involved in Data ScienceIntroduction to Data Science - Week 3 - Steps involved in Data Science
Introduction to Data Science - Week 3 - Steps involved in Data Science
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and Placement
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
data science training and placement
data science training and placementdata science training and placement
data science training and placement
 
online data science training
online data science trainingonline data science training
online data science training
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
data science online training in hyderabad
data science online training in hyderabaddata science online training in hyderabad
data science online training in hyderabad
 
Best data science training in Hyderabad
Best data science training in HyderabadBest data science training in Hyderabad
Best data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)
 
Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)
 
Data science training in hydpdf converted (1)
Data science training in hydpdf  converted (1)Data science training in hydpdf  converted (1)
Data science training in hydpdf converted (1)
 

Mehr von Ferdin Joe John Joseph PhD

Week 11: Cloud Native- DSA 441 Cloud Computing
Week 11: Cloud Native- DSA 441 Cloud ComputingWeek 11: Cloud Native- DSA 441 Cloud Computing
Week 11: Cloud Native- DSA 441 Cloud ComputingFerdin Joe John Joseph PhD
 
Week 10: Cloud Security- DSA 441 Cloud Computing
Week 10: Cloud Security- DSA 441 Cloud ComputingWeek 10: Cloud Security- DSA 441 Cloud Computing
Week 10: Cloud Security- DSA 441 Cloud ComputingFerdin Joe John Joseph PhD
 
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud ComputingWeek 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud ComputingFerdin Joe John Joseph PhD
 
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud ComputingWeek 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud ComputingFerdin Joe John Joseph PhD
 
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...Ferdin Joe John Joseph PhD
 
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...Ferdin Joe John Joseph PhD
 
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud ComputingWeek 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud ComputingFerdin Joe John Joseph PhD
 
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...Ferdin Joe John Joseph PhD
 
Week 2: Virtualization and VM Ware - DSA 441 Cloud Computing
Week 2: Virtualization and VM Ware - DSA 441 Cloud ComputingWeek 2: Virtualization and VM Ware - DSA 441 Cloud Computing
Week 2: Virtualization and VM Ware - DSA 441 Cloud ComputingFerdin Joe John Joseph PhD
 
Week 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
Week 1: Introduction to Cloud Computing - DSA 441 Cloud ComputingWeek 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
Week 1: Introduction to Cloud Computing - DSA 441 Cloud ComputingFerdin Joe John Joseph PhD
 
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculum
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculumSept 6 2021 BTech Artificial Intelligence and Data Science curriculum
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculumFerdin Joe John Joseph PhD
 
Transforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachTransforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachFerdin Joe John Joseph PhD
 

Mehr von Ferdin Joe John Joseph PhD (20)

Invited Talk DGTiCon 2022
Invited Talk DGTiCon 2022Invited Talk DGTiCon 2022
Invited Talk DGTiCon 2022
 
Week 12: Cloud AI- DSA 441 Cloud Computing
Week 12: Cloud AI- DSA 441 Cloud ComputingWeek 12: Cloud AI- DSA 441 Cloud Computing
Week 12: Cloud AI- DSA 441 Cloud Computing
 
Week 11: Cloud Native- DSA 441 Cloud Computing
Week 11: Cloud Native- DSA 441 Cloud ComputingWeek 11: Cloud Native- DSA 441 Cloud Computing
Week 11: Cloud Native- DSA 441 Cloud Computing
 
Week 10: Cloud Security- DSA 441 Cloud Computing
Week 10: Cloud Security- DSA 441 Cloud ComputingWeek 10: Cloud Security- DSA 441 Cloud Computing
Week 10: Cloud Security- DSA 441 Cloud Computing
 
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud ComputingWeek 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
 
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud ComputingWeek 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
 
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
 
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
 
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud ComputingWeek 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
 
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
 
Week 2: Virtualization and VM Ware - DSA 441 Cloud Computing
Week 2: Virtualization and VM Ware - DSA 441 Cloud ComputingWeek 2: Virtualization and VM Ware - DSA 441 Cloud Computing
Week 2: Virtualization and VM Ware - DSA 441 Cloud Computing
 
Week 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
Week 1: Introduction to Cloud Computing - DSA 441 Cloud ComputingWeek 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
Week 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
 
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculum
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculumSept 6 2021 BTech Artificial Intelligence and Data Science curriculum
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculum
 
Hadoop in Alibaba Cloud
Hadoop in Alibaba CloudHadoop in Alibaba Cloud
Hadoop in Alibaba Cloud
 
Cloud Computing Essentials in Alibaba Cloud
Cloud Computing Essentials in Alibaba CloudCloud Computing Essentials in Alibaba Cloud
Cloud Computing Essentials in Alibaba Cloud
 
Transforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachTransforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approach
 
Week 11: Programming for Data Analysis
Week 11: Programming for Data AnalysisWeek 11: Programming for Data Analysis
Week 11: Programming for Data Analysis
 
Week 10: Programming for Data Analysis
Week 10: Programming for Data AnalysisWeek 10: Programming for Data Analysis
Week 10: Programming for Data Analysis
 
Week 9: Programming for Data Analysis
Week 9: Programming for Data AnalysisWeek 9: Programming for Data Analysis
Week 9: Programming for Data Analysis
 
Week 8: Programming for Data Analysis
Week 8: Programming for Data AnalysisWeek 8: Programming for Data Analysis
Week 8: Programming for Data Analysis
 

Kürzlich hochgeladen

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 

Kürzlich hochgeladen (20)

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 

2019 DSA 105 Introduction to Data Science Week 3

  • 1. DSA – 105 Introduction to Data Science Week 3 – Steps involved in Data Science Ferdin Joe John Joseph, PhD Faculty of Information Technology Thai-Nichi Institute of Technology
  • 2. Week 3 Agenda • Steps involved in Data Science Faculty of Information Technology, Thai - Nichi Institute of Technology 2
  • 3. Process in Data Science Life Cycle (DSLC) Faculty of Information Technology, Thai - Nichi Institute of Technology 3
  • 4. Faculty of Information Technology, Thai - Nichi Institute of Technology 4
  • 5. 1. Business Understanding Use data science to answer five types of questions: • How much or how many? (regression) • Which category? (classification) • Which group? (clustering) • Is this weird? (anomaly detection) • Which option should be taken? (recommendation) Faculty of Information Technology, Thai - Nichi Institute of Technology 5
  • 6. Data Mining Decide on database usage • Data Collection strategies and process • Using of SQL queries • Usage of dataframe packages like pandas • Usage of JSON • Usage of softwares to store and manage data Faculty of Information Technology, Thai - Nichi Institute of Technology 6
  • 7. Data Cleaning • Also known as “Data Janitor” work. The most important component. • Cleaner the data, better the decisions. • It consumes atleast 50% of the entire process. • Eg. Manage the datatype of the values and convert wherever needed, i.e. numerical values stored as integer or strings. • Eg. Consistent format and spelling for categorical data. ‘Male’ or ‘male’ Faculty of Information Technology, Thai - Nichi Institute of Technology 7
  • 8. Data Exploration • Brainstorming on what to do with ‘cleaned’ data • Understand the bias and patterns in data • Analyze a random subset of data and visualize them • Look for anomalies and outliers in the data’s pattern • Create hypotheses about data and problem on how the solution has to be given Faculty of Information Technology, Thai - Nichi Institute of Technology 8
  • 9. Feature Engineering • A feature is a measurable property or attribute of a phenomenon being observed. • Feature engineering is the process of using domain knowledge to transform your raw data into informative features that represent the business problem you are trying to solve. • There are 2 tasks in feature engineering • Feature Selection • Feature Construction Faculty of Information Technology, Thai - Nichi Institute of Technology 9
  • 10. Feature Selection • Feature selection is the process of cutting down the features that add more noise than information. • This avoids the complexity due to high-dimensional spaces • It has three methods • Filter methods (apply statistical measure to assign scoring to each feature) • Wrapper methods (frame the selection of features as a search problem and use a heuristic to perform the search) • Embedded methods (use machine learning to figure out which features contribute best to the accuracy) Faculty of Information Technology, Thai - Nichi Institute of Technology 10
  • 11. Feature Construction • Involves creating new features from the ones that is already available. • For example, if you have a feature for age, but your model only cares about if a person is an adult or minor, you could threshold it at 18, and assign different categories to instances above and below that threshold. Faculty of Information Technology, Thai - Nichi Institute of Technology 11
  • 12. Predictive Modelling • Predictive modeling is where the machine learning finally comes into your data science project. • Based on the questions you asked in the business understanding stage, this is where you decide which model to pick for your problem. • The model that you end up training will be dependent on the size, type and quality of your data, how much time and computational resources you are willing to invest, and the type of output you intend to derive. • Trained model needs to be evaluated for its accuracy using validation techniques like k-fold cross validation. Faculty of Information Technology, Thai - Nichi Institute of Technology 12
  • 13. Predictive Modeling • Percentage of correct classification is used to measure the accuracy of classification model • ROC curves are plotted for true positive rate against false positive rate • Coefficient of determination, Mean Square Error (MSE) and average absolute error gives the correctness of regression models Faculty of Information Technology, Thai - Nichi Institute of Technology 13
  • 14. Data Visualisation • Combines the fields of communication, psychology, statistics, and art. • Communicating the data in a simple yet effective and visually pleasing way. • Jupyter notebooks are having lot of packages for visualization. Eg Matplotlib • Drag n Drop tools like Tableau and Plotly Faculty of Information Technology, Thai - Nichi Institute of Technology 14
  • 15. Goals of Data Science Process • The goal of this process is to continue to move a data science project forward towards a clear engagement end point. • We recognize that data science is a research activity and that progress often entails an approach that moves two steps forward and one step (or worse) backwards. • Being able to clearly communicate this to customers can help avoid misunderstanding and frustration for all parties involved, and increase the odds of success. Faculty of Information Technology, Thai - Nichi Institute of Technology 15
  • 16. Activity • Perform Data Science Process on Olympic medal tally for events post WW2 Faculty of Information Technology, Thai - Nichi Institute of Technology 16
  • 17. • Tools and Technologies in Data Science Faculty of Information Technology, Thai - Nichi Institute of Technology 17