SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
Data Analysis Course
Basics & Terminology(Version-1)
 Venkat Reddy
Data Analysis Course
• Data analysis design document
•
•   Descriptive statistics
•   Data exploration, validation & sanitization




                                                               2
                                                                 Venkat Reddy
                                                          Data Analysis Course
•   Probability distributions examples and applications
•   Simple correlation and regression analysis
•   Multiple liner regression analysis
•   Logistic regression analysis
•   Testing of hypothesis
•   Clustering and decision trees
•   Time series analysis and forecasting
•   Credit Risk Model building-1
•   Credit Risk Model building-2
Note
• This presentation is just class notes. The course notes for Data
  Analysis Training is by written by me, as an aid for myself.
• The best way to treat this is as a high-level summary; the
  actual session went more in depth and contained other




                                                                          3
                                                                            Venkat Reddy
                                                                     Data Analysis Course
  information.
• Most of this material was written as informal notes, not
  intended for publication
• Please send questions/comments/corrections to
  venkat@trenwiseanalytics.com or 21.venkat@gmail.com
• Please check my website for latest version of this document
                                         -Venkat Reddy
What is “Statistics”?
• Statistics is the science of data that involves:
  •   Collecting
  •   Classifying
  •   Summarizing
  •   Organizing and




                                                                               Venkat Reddy
                                                                        Data Analysis Course
  •   Interpretation
Of numerical information.
• Examples:
  •   Cricket batting averages
  •   Stock price
  •   Climatology data such as rainfall amounts, average temperatures
  •   Marketing information
                                                                               4
  •   Gambling?
Key Terms
• What is Data?
   • facts or information that is relevant or appropriate to a decision
     maker
• Population?
   • the totality of objects under consideration




                                                                                 Venkat Reddy
                                                                          Data Analysis Course
• Sample?
   • a portion of the population that is selected for analysis
• Parameter?
   • a summary measure (e.g., mean) that is computed to describe a
     characteristic of the population
• Statistic?
   • a summary measure (e.g., mean) that is computed to describe a
     characteristic of the sample                                                5
Variables
• Traits or characteristics that can change values from case to
  case.
• Examples:
  •   Age




                                                                         Venkat Reddy
                                                                  Data Analysis Course
  •   Gender
  •   Income
  •   Social class




                                                                         6
Types Of Variables
• In causal relationships:
          CAUSE               EFFECT
   independent variable  dependent variable
• Independent variable: is a variable that can be controlled or




                                                                          Venkat Reddy
                                                                   Data Analysis Course
  manipulated.
• Dependent variable: is a variable that cannot be controlled or
  manipulated. Its values are predicted from the independent
  variable.
• Discrete variables are measured in units that cannot be
  subdivided. Example: Number of children
• Continuous variables are measured in a unit that can be
  subdivided infinitely. Example: Height                                  7
Lab
•   Print product sales data
•   What are cause variables, what are effect variables
•   Identify the continuous & discrete variables
•   What is the population




                                                                              Venkat Reddy
                                                                       Data Analysis Course
•   Filter data and pick a sample
•   Calculate a parameter (Mean of the population)
•   Calculate a statistic
•   How close is the statistics to parameter? Is it a good estimate?
•   Self study: Randomly pick 10 samples, calculate mean for each
    sample. Find the mean of the means & see whether it is a
    good estimate of the population mean                                      8
Descriptive Statistics
•   Gives us the overall picture about data
•   Presents data in the form of tables, charts and graphs
•   Includes summary data
•   Avoids inferences




                                                                    Venkat Reddy
                                                             Data Analysis Course
•   Examples:
    • Measures of central location
       • Mean, median, mode and midrange
    • Measures of Variation
       • Variance, Standard Deviation, z-scores



                                                                    9
Details later
Lab
•   Download product sales data
•   Run proc means to print the descriptive statistics
•   Run proc univariate to print the descriptive statistics
•   Identify Measures of central location




                                                                     Venkat Reddy
                                                              Data Analysis Course
•   Identify Measures of variation




                                                                 10
Inferential Statistics
• Take decision on overall population using a sample
• “Sampled” data are incomplete but can still be representative
  of the population
• Permits the making of generalizations (inferences) about the




                                                                         Venkat Reddy
                                                                  Data Analysis Course
  data
• Probability theory is a major tool used to analyze sampled
  data




-Details later
                                                                      11
Predictive Modeling
• The science of predicting future outcomes based on historical
  events.
• Model Building: “Developing set of equations or
  mathematical formulation to forecast future
  behaviors based on current or historical data.”




                                                                         Venkat Reddy
                                                                  Data Analysis Course
• Regression, logistic Regression, time series analysis etc.,




                                                                     12
-Details later
Statistical Computer Packages

 Typical Software
   •   SAS
   •   R
   •   SPSS
   •   MINITAB
   •   Excel
Venkat Reddy Konasani
Manager at Trendwise Analytics
venkat@TrendwiseAnalytics.com




                                      14
21.venkat@gmail.com




                                        Venkat Reddy
                                 Data Analysis Course
+91 9886 768879

Weitere ähnliche Inhalte

Mehr von Venkata Reddy Konasani

Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Venkata Reddy Konasani
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniquesVenkata Reddy Konasani
 
Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS Venkata Reddy Konasani
 
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau -  Data, Graphs, Filters, Dashboards and Advanced featuresLearning Tableau -  Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced featuresVenkata Reddy Konasani
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitizationVenkata Reddy Konasani
 

Mehr von Venkata Reddy Konasani (20)

Transformers 101
Transformers 101 Transformers 101
Transformers 101
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniques
 
Neural Network Part-2
Neural Network Part-2Neural Network Part-2
Neural Network Part-2
 
GBM theory code and parameters
GBM theory code and parametersGBM theory code and parameters
GBM theory code and parameters
 
Neural Networks made easy
Neural Networks made easyNeural Networks made easy
Neural Networks made easy
 
Decision tree
Decision treeDecision tree
Decision tree
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
 
Credit Risk Model Building Steps
Credit Risk Model Building StepsCredit Risk Model Building Steps
Credit Risk Model Building Steps
 
Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS
 
SAS basics Step by step learning
SAS basics Step by step learningSAS basics Step by step learning
SAS basics Step by step learning
 
Testing of hypothesis case study
Testing of hypothesis case study Testing of hypothesis case study
Testing of hypothesis case study
 
L101 predictive modeling case_study
L101 predictive modeling case_studyL101 predictive modeling case_study
L101 predictive modeling case_study
 
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau -  Data, Graphs, Filters, Dashboards and Advanced featuresLearning Tableau -  Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Online data sources for analaysis
Online data sources for analaysis Online data sources for analaysis
Online data sources for analaysis
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
 
R- Introduction
R- IntroductionR- Introduction
R- Introduction
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitization
 

Kürzlich hochgeladen

Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...HetalPathak10
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxAnupam32727
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17Celine George
 
4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptxmary850239
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
An Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERPAn Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERPCeline George
 
How to Uninstall a Module in Odoo 17 Using Command Line
How to Uninstall a Module in Odoo 17 Using Command LineHow to Uninstall a Module in Odoo 17 Using Command Line
How to Uninstall a Module in Odoo 17 Using Command LineCeline George
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptxmary850239
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...Nguyen Thanh Tu Collection
 
Comparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptxComparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptxAvaniJani1
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroomSamsung Business USA
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfChristalin Nelson
 

Kürzlich hochgeladen (20)

Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...
 
4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
An Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERPAn Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERP
 
How to Uninstall a Module in Odoo 17 Using Command Line
How to Uninstall a Module in Odoo 17 Using Command LineHow to Uninstall a Module in Odoo 17 Using Command Line
How to Uninstall a Module in Odoo 17 Using Command Line
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
 
Comparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptxComparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptx
 
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdf
 

Introduction to Statistical Analysis

  • 1. Data Analysis Course Basics & Terminology(Version-1) Venkat Reddy
  • 2. Data Analysis Course • Data analysis design document • • Descriptive statistics • Data exploration, validation & sanitization 2 Venkat Reddy Data Analysis Course • Probability distributions examples and applications • Simple correlation and regression analysis • Multiple liner regression analysis • Logistic regression analysis • Testing of hypothesis • Clustering and decision trees • Time series analysis and forecasting • Credit Risk Model building-1 • Credit Risk Model building-2
  • 3. Note • This presentation is just class notes. The course notes for Data Analysis Training is by written by me, as an aid for myself. • The best way to treat this is as a high-level summary; the actual session went more in depth and contained other 3 Venkat Reddy Data Analysis Course information. • Most of this material was written as informal notes, not intended for publication • Please send questions/comments/corrections to venkat@trenwiseanalytics.com or 21.venkat@gmail.com • Please check my website for latest version of this document -Venkat Reddy
  • 4. What is “Statistics”? • Statistics is the science of data that involves: • Collecting • Classifying • Summarizing • Organizing and Venkat Reddy Data Analysis Course • Interpretation Of numerical information. • Examples: • Cricket batting averages • Stock price • Climatology data such as rainfall amounts, average temperatures • Marketing information 4 • Gambling?
  • 5. Key Terms • What is Data? • facts or information that is relevant or appropriate to a decision maker • Population? • the totality of objects under consideration Venkat Reddy Data Analysis Course • Sample? • a portion of the population that is selected for analysis • Parameter? • a summary measure (e.g., mean) that is computed to describe a characteristic of the population • Statistic? • a summary measure (e.g., mean) that is computed to describe a characteristic of the sample 5
  • 6. Variables • Traits or characteristics that can change values from case to case. • Examples: • Age Venkat Reddy Data Analysis Course • Gender • Income • Social class 6
  • 7. Types Of Variables • In causal relationships: CAUSE  EFFECT independent variable  dependent variable • Independent variable: is a variable that can be controlled or Venkat Reddy Data Analysis Course manipulated. • Dependent variable: is a variable that cannot be controlled or manipulated. Its values are predicted from the independent variable. • Discrete variables are measured in units that cannot be subdivided. Example: Number of children • Continuous variables are measured in a unit that can be subdivided infinitely. Example: Height 7
  • 8. Lab • Print product sales data • What are cause variables, what are effect variables • Identify the continuous & discrete variables • What is the population Venkat Reddy Data Analysis Course • Filter data and pick a sample • Calculate a parameter (Mean of the population) • Calculate a statistic • How close is the statistics to parameter? Is it a good estimate? • Self study: Randomly pick 10 samples, calculate mean for each sample. Find the mean of the means & see whether it is a good estimate of the population mean 8
  • 9. Descriptive Statistics • Gives us the overall picture about data • Presents data in the form of tables, charts and graphs • Includes summary data • Avoids inferences Venkat Reddy Data Analysis Course • Examples: • Measures of central location • Mean, median, mode and midrange • Measures of Variation • Variance, Standard Deviation, z-scores 9 Details later
  • 10. Lab • Download product sales data • Run proc means to print the descriptive statistics • Run proc univariate to print the descriptive statistics • Identify Measures of central location Venkat Reddy Data Analysis Course • Identify Measures of variation 10
  • 11. Inferential Statistics • Take decision on overall population using a sample • “Sampled” data are incomplete but can still be representative of the population • Permits the making of generalizations (inferences) about the Venkat Reddy Data Analysis Course data • Probability theory is a major tool used to analyze sampled data -Details later 11
  • 12. Predictive Modeling • The science of predicting future outcomes based on historical events. • Model Building: “Developing set of equations or mathematical formulation to forecast future behaviors based on current or historical data.” Venkat Reddy Data Analysis Course • Regression, logistic Regression, time series analysis etc., 12 -Details later
  • 13. Statistical Computer Packages Typical Software • SAS • R • SPSS • MINITAB • Excel
  • 14. Venkat Reddy Konasani Manager at Trendwise Analytics venkat@TrendwiseAnalytics.com 14 21.venkat@gmail.com Venkat Reddy Data Analysis Course +91 9886 768879