SlideShare ist ein Scribd-Unternehmen logo
1 von 31
zekeLabs
Master Guide to become a
Data Scientist
Learning made Simpler !
www.zekeLabs.com
“Goal - Become a Data Scientist”
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
“A Dream becomes a Goal when action is taken towards its achievement” - Bo Bennett
“The Plan”
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
“A Goal without a Plan is just a wish”
Complete Data Science / AI / ML in 20 Modules - 50 hours
Numerical Computation using NumPy Linear Regression
Essential Statistics & Maths Logistic Regression
Pandas & scipy for Data Wrangling & Statistics Naive Bayes
Data Visualization Trees
Introducing Machine Learning & Knowing Datasets Ensemble Methods
Data Preprocessing Nearest Neighbors
Feature Engineering Support Vector Machines
Feature Selection Techniques Clustering
Model Evaluation Machine Learning at Scale & Deployment
Model Selection 10 Projects
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
0. Prerequisite
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Basic Programming using Python
● Object Oriented Programming in Python
● Connecting databases & SQL
● Web scraping
● Parsing
1. Numerical Computation using NumPy - 3 hrs
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Why NumPy ?
● Performance
● Creation
● Access
● Concat & Split
● Axes
● Understanding Vectors
● Reshape
● Matrix Operation
● Utility functions
● Common NumPy utilities
● Broadcasting
2. Essential Statistics & Maths - 5 hrs
● Relationships - Deterministic vs Statistical
● Statistics - Descriptive vs Inferential
● Sampling
● Variables
● Distribution
● Summarizing Distribution
● Correlation, Collinearity, Causation
● Probability
● Normal Distribution
● Confidence Interval
● Hypothesis Testing
● Calculus
● Linear Algebra
● Matrix Ops
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
3. Pandas & scipy for Data Wrangling & Statistics - 5 hrs
● Series vs DataFrames
● Loading CSV, JSON, DB etc.
● Access & Filters
● DataFrame
● Exploratory Data Analysis
● Finding & Handling Missing Data
● Duplicate Handling
● Rolling averages
● Applying functions
● Handling Time Series Data
● Merging & Grouping Data
● Pivot Table & Crosstab
● Random data using scipy
● Comparing datasets using scipy
● Analyzing sample using scipy
● Kernel Density Estimation using scipy
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
4. Data Visualization - 4 hrs
● Understanding matplotlib
● Plotting Quantitative data
● Plotting Qualitative data
● Histograms
● Frequency Polygons
● Box-Plots
● Bar charts
● Line Graphs
● Scatter Plots
● 3D Plots
● Exploring seaborn & Bokeh
● Introduction to Tableau
● Plotting scatter plot
● Bubble chart
● Bullet chart
● Gantt chart
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
5. Introducing Machine Learning & Knowing Datasets - 1 hr
● Introduction to Machine Learning
● Supervised Learning
● Unsupervised Learning
● Reinforced Learning
● Regression
● Classification
● Clustering
● Machine Learning in Big Companies
● Machine Learning in Small Companies
● Machine Learning in startups
● UCI
● Kaggle
● Inbuilt scikit-learn datasets
● Generating datasets
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
6. Data Preprocessing - 4 hrs
● Standardize feature
● Normalize
● Encoding categorical features
● Encoding Ordinal Features
● Non-linear transformation
● Polynomial features
● Handling Time Feature
● Rolling Time window
● Custom Transformers
● DictVectorizer, CountVectorizer, TF-IDF
● NLTK - stemming, lemma, stop-words
● Skimage library for image processing
● Crop, resize, gray
● Outlier detection
● Handling Outlier data
● Handling Imbalanced classes
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
7. Feature Engineering - 3 hrs
● Principal Component Analysis
● Linear Discriminant Analysis
● Generalized Discriminant Analysis
● FastICA
● Non-negative Matrix Factorization
● TruncatedSVD
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
8. Feature Selection 2 hrs
● SelectKBest for Regression
● SelectKBest for Classification
● Variance Threshold
● Drop Highly correlated features
● Dropping based on non null values
● SelectFromModel
● Feature Selection using RandomForest
● Based on correlation with target
● Univariate Feature Selection
● Recursive Feature Elimination
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
9. Model Evaluation - 1 hr
● Why do we need to evaluate at all ?
● Metrics for Classification
● Metrics for Regression
● Clustering matrices
● Probability Calibration
● Pairwise matrices
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
10. Model Selection 1 hr
● Motivation
● KFold
● StratifiedKFold
● Splitting training testing data
● Cross Validate
● GridSearchCV
● RandomizedSearchCV
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
11. Linear Regression - 3 hrs
● Understanding Ordinary Least Squares
● Cost Function
● Bias & Variance
● Coefficients & Intercept
● Simple Linear Regression
● Polynomial Linear Regression
● Ridge
● Lasso
● Elastic Net
● Stochastic Gradient Descent
● Robustness Regression
● Problem - Insurance Payout Prediction
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
12. Logistic Regression - 2 hrs
● Basics of Logistic Regression
● Sigmoid
● Cost Function
● Understanding important
hyperparameters
● Predicting linear separator
● Predicting nonlinear decision boundary
● Handling Imbalanced classes
● Project - Predicting if income is less than
50K or more
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
13. Naive Bayes - 2 hrs
● Bayes Theorem
● Gaussian Naive Bayes
● Multinomial Naive Bayes
● Bernoulli’s Naive Bayes
● Out-of-core naive bayes using partial-fit
● Limitations of naive bayes
● Choosing right
● Problem - Mail data classification
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
14. Trees - 2 hrs
● Understanding Information Theory
● Entropy
● Decision Tree creation
● Tree for Classification
● Tree for Regression
● Advantages of Decision Tree
● Important Hyper-parameters
● Limitations of Decision Tree
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
15. Ensemble Methods - 3 hrs
● Bagging vs Boosting
● Forests
● AdaBoost
● XGBoost
● Gradient Tree Boosting
● Voting Classifier
● Role weak estimators play
● Problem - Attack detection on network
data
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
16. Nearest Neighbors - 2 hrs
● Unsupervised Nearest Neighbor
● Nearest Neighbor for Classification
● Nearest Neighbor for Regression
● Effect of k
● Nearest Neighbor Algorithms
● Choosing algorithm
● Nearest Centroid Classifier
● Developing recommendation engine
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
17. Support Vector Machine 3 hrs
● Understanding SVM
● Classification
● Regression
● OneClassSVM
● Imbalanced Classes
● Kernel Functions
● Understanding Maths behind it
● Problem - Face recognition
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
17b. Novelty & Outlier Detection 1 hr
● Novelty vs Outlier
● OneClassSVM
● Fitting data in Elliptical Envelop
● Isolation Forest
● Local Outlier Factor
● When to use what
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
18. Clustering - 3 hrs
● Objectives of clustering
● Agglomerative clustering
● DBSCAN clustering
● KMeans
● Affinity Propagation
● Meanshift clustering
● Spectral clustering
● Hierarchical clustering
● Birch
● Clustering evaluation
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
19. Deployment & Scaling - 3 hrs
● Bottom-Up approach for dealing with large
data
● Extracting features using Hashing
Techniques
● Incremental learning
● Serializing data for quicker access
● Running as a Python .egg or wheel
● Model behind REST server
● Persisting & Loading model
● Deploying model behind web application
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
20. Use Cases
● Credit Risk - Predicting Defaulters
● Amazon Food Review Sentiment
● Predicting Employee Attrition
● Identify characters on unknown language
● Predicting insurance payout amount
● Text Categorization
● Churn Prediction
● Attack Prediction on network data
● Identifying faces
● Predict patient stay in hospital
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
Way Forward - Deep Learning
● Basics of TensorFlow & Keras
● Foundations of Neural Network
● Activation Functions & Optimizers
● Regularization Techniques & Loss
Functions
● Implementation Deep Neural Network
for Fashion-MNIST
● Introduction to Convolutional Neural
Network
● Filters, pooling, strides
● Different initialization techniques
● Implement CNN for Fashion-MNIST
● Hyper-parameter tuning CNN
● Understanding popular trained model
Complete Deep Learning in 10 Modules - 50 hours
● Transfer Learning & Fine Tuning
● Understanding Recurrent Neural
Networks
● LSTM
● GRU
● Implement Text Classification using
LSTM
● Autoencoders
● GAN
● Implement GAN & DCGAN
● Implementing image captioning
● Implementing chatbot
● Implementing MNIST generator
● Hyperparameter tuning
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
Repositories
● https://github.com/zekelabs/machine-learning-for-beginners
● https://github.com/zekelabs/tensorflow-tutorial/
● Dog breed prediction -
https://www.edyoda.com/resources/watch/54AEA4CDC35394F1183A9D
D17AA47/
● Python learning course -
https://www.edyoda.com/resources/videolisting/98/
info@zekeLabs.com | www.zekeLabs.com | +91
8095465880
Thank You !!!
Visit : www.zekeLabs.com for more details
Let us know how can we help your organization to Upskill the employees to
stay updated in the ever-evolving IT Industry.
www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com

Weitere ähnliche Inhalte

Ähnlich wie Master guide to become a data scientist

Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Lviv Startup Club
 

Ähnlich wie Master guide to become a data scientist (20)

Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
InfoEducatie - Face Recognition Architecture
InfoEducatie - Face Recognition ArchitectureInfoEducatie - Face Recognition Architecture
InfoEducatie - Face Recognition Architecture
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learning
 
Biomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLABBiomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLAB
 
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
 
Knowledge Discovery
Knowledge DiscoveryKnowledge Discovery
Knowledge Discovery
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
 

Mehr von zekeLabs Technologies

Mehr von zekeLabs Technologies (20)

Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
 
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabsDesign Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
 
[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs
 
A curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & KubernetesA curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & Kubernetes
 
Docker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container worldDocker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container world
 
Serverless and cloud computing
Serverless and cloud computingServerless and cloud computing
Serverless and cloud computing
 
SQL
SQLSQL
SQL
 
02 terraform core concepts
02 terraform core concepts02 terraform core concepts
02 terraform core concepts
 
08 Terraform: Provisioners
08 Terraform: Provisioners08 Terraform: Provisioners
08 Terraform: Provisioners
 
Outlier detection handling
Outlier detection handlingOutlier detection handling
Outlier detection handling
 
Nearest neighbors
Nearest neighborsNearest neighbors
Nearest neighbors
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Linear models of classification
Linear models of classificationLinear models of classification
Linear models of classification
 
Grid search, pipeline, featureunion
Grid search, pipeline, featureunionGrid search, pipeline, featureunion
Grid search, pipeline, featureunion
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Essential NumPy
Essential NumPyEssential NumPy
Essential NumPy
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 
Dimentionality reduction
Dimentionality reductionDimentionality reduction
Dimentionality reduction
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Master guide to become a data scientist

  • 1. zekeLabs Master Guide to become a Data Scientist Learning made Simpler ! www.zekeLabs.com
  • 2. “Goal - Become a Data Scientist” info@zekeLabs.com | www.zekeLabs.com | +91 8095465880 “A Dream becomes a Goal when action is taken towards its achievement” - Bo Bennett
  • 3. “The Plan” info@zekeLabs.com | www.zekeLabs.com | +91 8095465880 “A Goal without a Plan is just a wish”
  • 4. Complete Data Science / AI / ML in 20 Modules - 50 hours Numerical Computation using NumPy Linear Regression Essential Statistics & Maths Logistic Regression Pandas & scipy for Data Wrangling & Statistics Naive Bayes Data Visualization Trees Introducing Machine Learning & Knowing Datasets Ensemble Methods Data Preprocessing Nearest Neighbors Feature Engineering Support Vector Machines Feature Selection Techniques Clustering Model Evaluation Machine Learning at Scale & Deployment Model Selection 10 Projects info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 5. 0. Prerequisite info@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Basic Programming using Python ● Object Oriented Programming in Python ● Connecting databases & SQL ● Web scraping ● Parsing
  • 6. 1. Numerical Computation using NumPy - 3 hrs info@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Why NumPy ? ● Performance ● Creation ● Access ● Concat & Split ● Axes ● Understanding Vectors ● Reshape ● Matrix Operation ● Utility functions ● Common NumPy utilities ● Broadcasting
  • 7. 2. Essential Statistics & Maths - 5 hrs ● Relationships - Deterministic vs Statistical ● Statistics - Descriptive vs Inferential ● Sampling ● Variables ● Distribution ● Summarizing Distribution ● Correlation, Collinearity, Causation ● Probability ● Normal Distribution ● Confidence Interval ● Hypothesis Testing ● Calculus ● Linear Algebra ● Matrix Ops info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 8. 3. Pandas & scipy for Data Wrangling & Statistics - 5 hrs ● Series vs DataFrames ● Loading CSV, JSON, DB etc. ● Access & Filters ● DataFrame ● Exploratory Data Analysis ● Finding & Handling Missing Data ● Duplicate Handling ● Rolling averages ● Applying functions ● Handling Time Series Data ● Merging & Grouping Data ● Pivot Table & Crosstab ● Random data using scipy ● Comparing datasets using scipy ● Analyzing sample using scipy ● Kernel Density Estimation using scipy info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 9. 4. Data Visualization - 4 hrs ● Understanding matplotlib ● Plotting Quantitative data ● Plotting Qualitative data ● Histograms ● Frequency Polygons ● Box-Plots ● Bar charts ● Line Graphs ● Scatter Plots ● 3D Plots ● Exploring seaborn & Bokeh ● Introduction to Tableau ● Plotting scatter plot ● Bubble chart ● Bullet chart ● Gantt chart info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 10. 5. Introducing Machine Learning & Knowing Datasets - 1 hr ● Introduction to Machine Learning ● Supervised Learning ● Unsupervised Learning ● Reinforced Learning ● Regression ● Classification ● Clustering ● Machine Learning in Big Companies ● Machine Learning in Small Companies ● Machine Learning in startups ● UCI ● Kaggle ● Inbuilt scikit-learn datasets ● Generating datasets info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 11. 6. Data Preprocessing - 4 hrs ● Standardize feature ● Normalize ● Encoding categorical features ● Encoding Ordinal Features ● Non-linear transformation ● Polynomial features ● Handling Time Feature ● Rolling Time window ● Custom Transformers ● DictVectorizer, CountVectorizer, TF-IDF ● NLTK - stemming, lemma, stop-words ● Skimage library for image processing ● Crop, resize, gray ● Outlier detection ● Handling Outlier data ● Handling Imbalanced classes info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 12. 7. Feature Engineering - 3 hrs ● Principal Component Analysis ● Linear Discriminant Analysis ● Generalized Discriminant Analysis ● FastICA ● Non-negative Matrix Factorization ● TruncatedSVD info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 13. 8. Feature Selection 2 hrs ● SelectKBest for Regression ● SelectKBest for Classification ● Variance Threshold ● Drop Highly correlated features ● Dropping based on non null values ● SelectFromModel ● Feature Selection using RandomForest ● Based on correlation with target ● Univariate Feature Selection ● Recursive Feature Elimination info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 14. 9. Model Evaluation - 1 hr ● Why do we need to evaluate at all ? ● Metrics for Classification ● Metrics for Regression ● Clustering matrices ● Probability Calibration ● Pairwise matrices info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 15. 10. Model Selection 1 hr ● Motivation ● KFold ● StratifiedKFold ● Splitting training testing data ● Cross Validate ● GridSearchCV ● RandomizedSearchCV info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 16. 11. Linear Regression - 3 hrs ● Understanding Ordinary Least Squares ● Cost Function ● Bias & Variance ● Coefficients & Intercept ● Simple Linear Regression ● Polynomial Linear Regression ● Ridge ● Lasso ● Elastic Net ● Stochastic Gradient Descent ● Robustness Regression ● Problem - Insurance Payout Prediction info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 17. 12. Logistic Regression - 2 hrs ● Basics of Logistic Regression ● Sigmoid ● Cost Function ● Understanding important hyperparameters ● Predicting linear separator ● Predicting nonlinear decision boundary ● Handling Imbalanced classes ● Project - Predicting if income is less than 50K or more info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 18. 13. Naive Bayes - 2 hrs ● Bayes Theorem ● Gaussian Naive Bayes ● Multinomial Naive Bayes ● Bernoulli’s Naive Bayes ● Out-of-core naive bayes using partial-fit ● Limitations of naive bayes ● Choosing right ● Problem - Mail data classification info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 19. 14. Trees - 2 hrs ● Understanding Information Theory ● Entropy ● Decision Tree creation ● Tree for Classification ● Tree for Regression ● Advantages of Decision Tree ● Important Hyper-parameters ● Limitations of Decision Tree info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 20. 15. Ensemble Methods - 3 hrs ● Bagging vs Boosting ● Forests ● AdaBoost ● XGBoost ● Gradient Tree Boosting ● Voting Classifier ● Role weak estimators play ● Problem - Attack detection on network data info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 21. 16. Nearest Neighbors - 2 hrs ● Unsupervised Nearest Neighbor ● Nearest Neighbor for Classification ● Nearest Neighbor for Regression ● Effect of k ● Nearest Neighbor Algorithms ● Choosing algorithm ● Nearest Centroid Classifier ● Developing recommendation engine info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 22. 17. Support Vector Machine 3 hrs ● Understanding SVM ● Classification ● Regression ● OneClassSVM ● Imbalanced Classes ● Kernel Functions ● Understanding Maths behind it ● Problem - Face recognition info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 23. 17b. Novelty & Outlier Detection 1 hr ● Novelty vs Outlier ● OneClassSVM ● Fitting data in Elliptical Envelop ● Isolation Forest ● Local Outlier Factor ● When to use what info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 24. 18. Clustering - 3 hrs ● Objectives of clustering ● Agglomerative clustering ● DBSCAN clustering ● KMeans ● Affinity Propagation ● Meanshift clustering ● Spectral clustering ● Hierarchical clustering ● Birch ● Clustering evaluation info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 25. 19. Deployment & Scaling - 3 hrs ● Bottom-Up approach for dealing with large data ● Extracting features using Hashing Techniques ● Incremental learning ● Serializing data for quicker access ● Running as a Python .egg or wheel ● Model behind REST server ● Persisting & Loading model ● Deploying model behind web application info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 26. 20. Use Cases ● Credit Risk - Predicting Defaulters ● Amazon Food Review Sentiment ● Predicting Employee Attrition ● Identify characters on unknown language ● Predicting insurance payout amount ● Text Categorization ● Churn Prediction ● Attack Prediction on network data ● Identifying faces ● Predict patient stay in hospital info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 27. Way Forward - Deep Learning
  • 28. ● Basics of TensorFlow & Keras ● Foundations of Neural Network ● Activation Functions & Optimizers ● Regularization Techniques & Loss Functions ● Implementation Deep Neural Network for Fashion-MNIST ● Introduction to Convolutional Neural Network ● Filters, pooling, strides ● Different initialization techniques ● Implement CNN for Fashion-MNIST ● Hyper-parameter tuning CNN ● Understanding popular trained model Complete Deep Learning in 10 Modules - 50 hours ● Transfer Learning & Fine Tuning ● Understanding Recurrent Neural Networks ● LSTM ● GRU ● Implement Text Classification using LSTM ● Autoencoders ● GAN ● Implement GAN & DCGAN ● Implementing image captioning ● Implementing chatbot ● Implementing MNIST generator ● Hyperparameter tuning info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 29. Repositories ● https://github.com/zekelabs/machine-learning-for-beginners ● https://github.com/zekelabs/tensorflow-tutorial/ ● Dog breed prediction - https://www.edyoda.com/resources/watch/54AEA4CDC35394F1183A9D D17AA47/ ● Python learning course - https://www.edyoda.com/resources/videolisting/98/ info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 31. Visit : www.zekeLabs.com for more details Let us know how can we help your organization to Upskill the employees to stay updated in the ever-evolving IT Industry. www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com