SlideShare ist ein Scribd-Unternehmen logo
1 von 36
zekeLabs
Statistics for Data Science
“Goal - Become a Data Scientist”
“A Dream becomes a Goal when action is taken towards its achievement” - Bo Bennett
“The Plan”
“A Goal without a Plan is just a wish”
● Introduction to Statistics
● Importance of Statistics
● Understanding Variables Types
● Descriptive vs Inferential Statistics
Overview of
Statistics
Introduction to Statistics
● Science of learning from data.
● Methodical data collection.
● Employ correct data analysis.
● Presenting analysis effectively.
● Opposite to statistics is “Anecdotal Evidence”.
Importance
● Avoid getting biased samples
● Prevent overgeneralization
● Wrong causality
● Incorrect Analysis
● Applied to any domain
Variables
● Explanatory (predictor or independent)
● Response (outcome or dependent)
● A variable can serve as independent in one
study and dependent in another
Data Types of Variables - Quantitative
versus Qualitative
● Quantitative - Numerical data. Eg. weight, temperature, number_project
● Qualitative - Non-numerical data. Eg. dept, salary
Types of Quantitative Variables
● Continues - any numeric value. Eg. Sqft
● Discrete - count of the presence of a characteristic, result, item, or activity.
Eg. Floor
Qualitative Data: Categorical, Binary, and
Ordinal
● Categorical or Nominal. Eg - dept ( sales, RD etc. )
● Binary. Eg. Left ( 1 or 0 )
● Ordinal. Eg. salary ( low, medium, high )
Choosing Statistical Analysis based on data
type
Types of Statistical Analysis
● Descriptive Statistics - Describes data.
○ Common Tools - Central tendency, Data distribution, skewness
● Inferential Statistics - Draw conclusions from the sample & generalize for
entire population
○ Common Tools - Hypothesis Testing, Confidence Intervals, Regression Analysis
● Measure of Central Tendency
● Measure of Variability
● Visualizing Data
Summarizing Data
Measure of Central Tendency
● Mean - Average of data, suited for continuous data with no outliers
● Median - Middle value of ordered data, suited for continuous data with
outliers
● Mode - Most occuring data, suited for categorical data ( both nominal and
ordinal )
Measure of Variance
● Range
● Interquartile Range
● Variance
● Standard Deviation
Visualizing Continuous Data
● Histogram
● ScatterPlot
Visualizing Continuous Data - 2
● Box-Plot
Visualizing Discrete Data
● Histogram
● Pie
● Basics of Probability
● Conditional Probability
● Discrete Probability Function
● Continuous Probability Function
● Central Limit Theorem
Probability
Distribution
Probability of Single Event
Probability of Two Independent Events
P(A AND B) = P(A) * P(B)
Probability of heads on tossing of two coins P(A) * P(B) = ½ * ½ = ¼
P(A OR B) = P(A) + P(B) - P(A AND B)
Probability of head in 1st flip or probability of head in 2nd flip or both
½ + ½ - ¼ = ¾
Conditional Probability
Probability of an event given the other event has occurred.
P(B|A) - Probability of event B given A has happened
P(A AND B) = P(A) * P(B|A)
Probability of drawing 2 aces = P(drawing one ace from deck) * P(drawing one
ace given already one ace is pulled out)
Probability of drawing 2 aces = 4/52 * 3/51
Probability distribution
● A function describing the likelihood of obtaining possible values that a
random variable can assume.
● Consider salary of employee data, we can create distribution of salary.
● Such distribution is useful to know which outcome is more likely.
● Sum of probability of all outcomes is 1, so every outcome has likelihood
between 0 & 1
● PDF are divided into two types based on data - Discrete and Continues
Discrete Probability Distribution Function
● Probability mass functions for discrete data
● Binomial Distribution for Binary Data (Yes/No)
● Poisson Distribution for count data (No. of cars per family)
● Uniform Distribution for Data with equal probability (Rolling dice)
Binomial Distribution
Poisson Distribution
Uniform Distribution
Probability distribution for continuous data
● Probability mass function for continuous data
● Central tendency, variation & skewness important parameters
● Normal Probability Distribution or Gaussian Distribution or Bell curve
● Lognormal Probability Distribution
Normal Distribution
● A probability function that
describes how the values of a
variable are distributed.
● Symmetric distribution
● Mean = 69, Std = 2.8
● Notation Alert, mu & sigma term
used for entire population
Height Distribution
Normal Distribution - 2
● Empirical Rule of Normal Distribution : 68 - 95 - 99
● Standard Normal Distribution : Mean = 0, Std = 1.0
● Z-scores is a great way to understand where a specific observation fall wrt
entire population. It is basically number of std far from mean.
Lognormal Distribution
● Introduction
● Central Tendency
● Data Distribution
● Skewness
● Correlation
Descriptive
Statistics
Lognormal Distribution
● Introduction
● Hypothesis Testing
● Confidence Intervals
● Regression Analysis
Inferential
Statistics
● Chi-square Test of Independence
● Correlation and Linear Regression
● Analysis of Variance or ANOVA
Relationships
between Variables
Thank You !!!
Visit : www.zekeLabs.com for more details
Let us know how can we help your organization to Upskill the employees to
stay updated in the ever-evolving IT Industry.
www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com

Weitere ähnliche Inhalte

Was ist angesagt?

Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysisPramod Toraskar
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptxSadhanaParameswaran
 
Introduction to Data Visualization
Introduction to Data VisualizationIntroduction to Data Visualization
Introduction to Data VisualizationStephen Tracy
 
The Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By StatisticiansThe Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By StatisticiansStat Analytica
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine LearningKnoldus Inc.
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data AnalysisUmair Shafique
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization Sourabh Sahu
 
Data visualization in Python
Data visualization in PythonData visualization in Python
Data visualization in PythonMarc Garcia
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using PythonNishantKumar1179
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysisVishwas N
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science ProcessVishal Patel
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Edureka!
 

Was ist angesagt? (20)

Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Analysis in Python
Data Analysis in PythonData Analysis in Python
Data Analysis in Python
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysis
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptx
 
Introduction to Data Visualization
Introduction to Data VisualizationIntroduction to Data Visualization
Introduction to Data Visualization
 
The Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By StatisticiansThe Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By Statisticians
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Data analytics
Data analyticsData analytics
Data analytics
 
Data Management in R
Data Management in RData Management in R
Data Management in R
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data analytics
Data analyticsData analytics
Data analytics
 
Data analytics
Data analyticsData analytics
Data analytics
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine Learning
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization
 
Data visualization in Python
Data visualization in PythonData visualization in Python
Data visualization in Python
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using Python
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science Process
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
 

Ähnlich wie Statistics for data science

Introduction to Statistics53004300.ppt
Introduction to Statistics53004300.pptIntroduction to Statistics53004300.ppt
Introduction to Statistics53004300.pptTripthiDubey
 
Basic for DoE ruchir
Basic for DoE ruchirBasic for DoE ruchir
Basic for DoE ruchirRuchir Shah
 
Statistics in research by dr. sudhir sahu
Statistics in research by dr. sudhir sahuStatistics in research by dr. sudhir sahu
Statistics in research by dr. sudhir sahuSudhir INDIA
 
Basic statistics 1
Basic statistics  1Basic statistics  1
Basic statistics 1Kumar P
 
Basics of statistics
Basics of statisticsBasics of statistics
Basics of statisticsdonthuraj
 
statistics introduction
 statistics introduction statistics introduction
statistics introductionkingstonKINGS
 
Basic stat analysis using excel
Basic stat analysis using excelBasic stat analysis using excel
Basic stat analysis using excelParag Shah
 
Review of Chapters 1-5.ppt
Review of Chapters 1-5.pptReview of Chapters 1-5.ppt
Review of Chapters 1-5.pptNobelFFarrar
 
Statistical Analysis and Hypothesis Tesing
Statistical Analysis and Hypothesis TesingStatistical Analysis and Hypothesis Tesing
Statistical Analysis and Hypothesis TesingManishaPatil932723
 
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data Science  & AI Road Map by Python & Computer science tutor in MalaysiaData Science  & AI Road Map by Python & Computer science tutor in Malaysia
Data Science & AI Road Map by Python & Computer science tutor in MalaysiaAhmed Elmalla
 
Statistical Methods in Research
Statistical Methods in ResearchStatistical Methods in Research
Statistical Methods in ResearchManoj Sharma
 
EDA and Preprocessing in Tabular and Text data .pptx
EDA and Preprocessing in Tabular and Text data .pptxEDA and Preprocessing in Tabular and Text data .pptx
EDA and Preprocessing in Tabular and Text data .pptxBrajkishore23
 
Quant Data Analysis
Quant Data AnalysisQuant Data Analysis
Quant Data AnalysisSaad Chahine
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptxIndhuGreen
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statisticsKapil Dev Ghante
 

Ähnlich wie Statistics for data science (20)

Introduction to Statistics53004300.ppt
Introduction to Statistics53004300.pptIntroduction to Statistics53004300.ppt
Introduction to Statistics53004300.ppt
 
Data in science
Data in science Data in science
Data in science
 
Basic for DoE ruchir
Basic for DoE ruchirBasic for DoE ruchir
Basic for DoE ruchir
 
Statistics in research by dr. sudhir sahu
Statistics in research by dr. sudhir sahuStatistics in research by dr. sudhir sahu
Statistics in research by dr. sudhir sahu
 
Basic statistics 1
Basic statistics  1Basic statistics  1
Basic statistics 1
 
Basics of statistics
Basics of statisticsBasics of statistics
Basics of statistics
 
statistics introduction
 statistics introduction statistics introduction
statistics introduction
 
Basic stat analysis using excel
Basic stat analysis using excelBasic stat analysis using excel
Basic stat analysis using excel
 
POINT_INTERVAL_estimates.ppt
POINT_INTERVAL_estimates.pptPOINT_INTERVAL_estimates.ppt
POINT_INTERVAL_estimates.ppt
 
Review of Chapters 1-5.ppt
Review of Chapters 1-5.pptReview of Chapters 1-5.ppt
Review of Chapters 1-5.ppt
 
Statistical Analysis and Hypothesis Tesing
Statistical Analysis and Hypothesis TesingStatistical Analysis and Hypothesis Tesing
Statistical Analysis and Hypothesis Tesing
 
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data Science  & AI Road Map by Python & Computer science tutor in MalaysiaData Science  & AI Road Map by Python & Computer science tutor in Malaysia
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
 
Chapter34
Chapter34Chapter34
Chapter34
 
Statistical Methods in Research
Statistical Methods in ResearchStatistical Methods in Research
Statistical Methods in Research
 
Machine learning
Machine learningMachine learning
Machine learning
 
EDA and Preprocessing in Tabular and Text data .pptx
EDA and Preprocessing in Tabular and Text data .pptxEDA and Preprocessing in Tabular and Text data .pptx
EDA and Preprocessing in Tabular and Text data .pptx
 
Quant Data Analysis
Quant Data AnalysisQuant Data Analysis
Quant Data Analysis
 
Descriptive Analysis.pptx
Descriptive Analysis.pptxDescriptive Analysis.pptx
Descriptive Analysis.pptx
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 

Mehr von zekeLabs Technologies

Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...zekeLabs Technologies
 
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabsDesign Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabszekeLabs Technologies
 
[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabszekeLabs Technologies
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabszekeLabs Technologies
 
A curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & KubernetesA curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & KuberneteszekeLabs Technologies
 
Docker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container worldDocker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container worldzekeLabs Technologies
 
Master guide to become a data scientist
Master guide to become a data scientist Master guide to become a data scientist
Master guide to become a data scientist zekeLabs Technologies
 

Mehr von zekeLabs Technologies (20)

Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
 
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabsDesign Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
 
[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
A curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & KubernetesA curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & Kubernetes
 
Docker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container worldDocker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container world
 
Serverless and cloud computing
Serverless and cloud computingServerless and cloud computing
Serverless and cloud computing
 
SQL
SQLSQL
SQL
 
02 terraform core concepts
02 terraform core concepts02 terraform core concepts
02 terraform core concepts
 
08 Terraform: Provisioners
08 Terraform: Provisioners08 Terraform: Provisioners
08 Terraform: Provisioners
 
Outlier detection handling
Outlier detection handlingOutlier detection handling
Outlier detection handling
 
Nearest neighbors
Nearest neighborsNearest neighbors
Nearest neighbors
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Master guide to become a data scientist
Master guide to become a data scientist Master guide to become a data scientist
Master guide to become a data scientist
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Linear models of classification
Linear models of classificationLinear models of classification
Linear models of classification
 
Grid search, pipeline, featureunion
Grid search, pipeline, featureunionGrid search, pipeline, featureunion
Grid search, pipeline, featureunion
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Essential NumPy
Essential NumPyEssential NumPy
Essential NumPy
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 

Kürzlich hochgeladen

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Kürzlich hochgeladen (20)

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Statistics for data science

  • 2. “Goal - Become a Data Scientist” “A Dream becomes a Goal when action is taken towards its achievement” - Bo Bennett “The Plan” “A Goal without a Plan is just a wish”
  • 3. ● Introduction to Statistics ● Importance of Statistics ● Understanding Variables Types ● Descriptive vs Inferential Statistics Overview of Statistics
  • 4. Introduction to Statistics ● Science of learning from data. ● Methodical data collection. ● Employ correct data analysis. ● Presenting analysis effectively. ● Opposite to statistics is “Anecdotal Evidence”.
  • 5. Importance ● Avoid getting biased samples ● Prevent overgeneralization ● Wrong causality ● Incorrect Analysis ● Applied to any domain
  • 6. Variables ● Explanatory (predictor or independent) ● Response (outcome or dependent) ● A variable can serve as independent in one study and dependent in another
  • 7. Data Types of Variables - Quantitative versus Qualitative ● Quantitative - Numerical data. Eg. weight, temperature, number_project ● Qualitative - Non-numerical data. Eg. dept, salary
  • 8. Types of Quantitative Variables ● Continues - any numeric value. Eg. Sqft ● Discrete - count of the presence of a characteristic, result, item, or activity. Eg. Floor
  • 9. Qualitative Data: Categorical, Binary, and Ordinal ● Categorical or Nominal. Eg - dept ( sales, RD etc. ) ● Binary. Eg. Left ( 1 or 0 ) ● Ordinal. Eg. salary ( low, medium, high )
  • 10. Choosing Statistical Analysis based on data type
  • 11. Types of Statistical Analysis ● Descriptive Statistics - Describes data. ○ Common Tools - Central tendency, Data distribution, skewness ● Inferential Statistics - Draw conclusions from the sample & generalize for entire population ○ Common Tools - Hypothesis Testing, Confidence Intervals, Regression Analysis
  • 12. ● Measure of Central Tendency ● Measure of Variability ● Visualizing Data Summarizing Data
  • 13. Measure of Central Tendency ● Mean - Average of data, suited for continuous data with no outliers ● Median - Middle value of ordered data, suited for continuous data with outliers ● Mode - Most occuring data, suited for categorical data ( both nominal and ordinal )
  • 14. Measure of Variance ● Range ● Interquartile Range ● Variance ● Standard Deviation
  • 15. Visualizing Continuous Data ● Histogram ● ScatterPlot
  • 16. Visualizing Continuous Data - 2 ● Box-Plot
  • 17. Visualizing Discrete Data ● Histogram ● Pie
  • 18. ● Basics of Probability ● Conditional Probability ● Discrete Probability Function ● Continuous Probability Function ● Central Limit Theorem Probability Distribution
  • 20. Probability of Two Independent Events P(A AND B) = P(A) * P(B) Probability of heads on tossing of two coins P(A) * P(B) = ½ * ½ = ¼ P(A OR B) = P(A) + P(B) - P(A AND B) Probability of head in 1st flip or probability of head in 2nd flip or both ½ + ½ - ¼ = ¾
  • 21. Conditional Probability Probability of an event given the other event has occurred. P(B|A) - Probability of event B given A has happened P(A AND B) = P(A) * P(B|A) Probability of drawing 2 aces = P(drawing one ace from deck) * P(drawing one ace given already one ace is pulled out) Probability of drawing 2 aces = 4/52 * 3/51
  • 22. Probability distribution ● A function describing the likelihood of obtaining possible values that a random variable can assume. ● Consider salary of employee data, we can create distribution of salary. ● Such distribution is useful to know which outcome is more likely. ● Sum of probability of all outcomes is 1, so every outcome has likelihood between 0 & 1 ● PDF are divided into two types based on data - Discrete and Continues
  • 23. Discrete Probability Distribution Function ● Probability mass functions for discrete data ● Binomial Distribution for Binary Data (Yes/No) ● Poisson Distribution for count data (No. of cars per family) ● Uniform Distribution for Data with equal probability (Rolling dice)
  • 27. Probability distribution for continuous data ● Probability mass function for continuous data ● Central tendency, variation & skewness important parameters ● Normal Probability Distribution or Gaussian Distribution or Bell curve ● Lognormal Probability Distribution
  • 28. Normal Distribution ● A probability function that describes how the values of a variable are distributed. ● Symmetric distribution ● Mean = 69, Std = 2.8 ● Notation Alert, mu & sigma term used for entire population Height Distribution
  • 29. Normal Distribution - 2 ● Empirical Rule of Normal Distribution : 68 - 95 - 99 ● Standard Normal Distribution : Mean = 0, Std = 1.0 ● Z-scores is a great way to understand where a specific observation fall wrt entire population. It is basically number of std far from mean.
  • 31. ● Introduction ● Central Tendency ● Data Distribution ● Skewness ● Correlation Descriptive Statistics
  • 33. ● Introduction ● Hypothesis Testing ● Confidence Intervals ● Regression Analysis Inferential Statistics
  • 34. ● Chi-square Test of Independence ● Correlation and Linear Regression ● Analysis of Variance or ANOVA Relationships between Variables
  • 36. Visit : www.zekeLabs.com for more details Let us know how can we help your organization to Upskill the employees to stay updated in the ever-evolving IT Industry. www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com