SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Data and Data Collection
Quantitative – Numbers, tests, counting,
measuring
Fundamentally--2 types of data
Qualitative – Words, images,
observations, conversations, photographs
Data Collection Techniques
Observations,
Tests,
Surveys,
Document analysis
(the research literature)
Quantitative Methods
Experiment: Research situation with at
least one independent variable, which is
manipulated by the researcher
Independent Variable: The variable in the
study under consideration. The cause for
the outcome for the study.
Dependent Variable: The variable being
affected by the independent variable.
The effect of the study
y = f(x)
Which is which here?
Key Factors for High Quality
Experimental Design
Data should not be contaminated by poor
measurement or errors in procedure.
Eliminate confounding variables from study or
minimize effects on variables.
Representativeness: Does your sample
represent the population you are studying?
Must use random sample techniques.
What Makes a Good
Quantitative Research Design?
4 Key Elements
1. Freedom from Bias
2. Freedom from Confounding
3. Control of Extraneous Variables
4. Statistical Precision to Test Hypothesis
Bias: When observations favor some
individuals in the population over others.
Confounding: When the effects of two or
more variables cause surprise or confusion.
Extraneous Variables: An extraneous variable
is any variable not being investigated that has
the potential to affect the outcome of a
research study.
Need to identify and minimize these variables.
e.g., Erosion potential as a function of clay content.
rainfall intensity, vegetation & duration would be
considered extraneous variables.
Precision versus accuracy
"Precise" means sharply defined or
measured.
"Accurate" means truthful or correct.
Accurate
Not precise
Neither accurate
nor precise
Not accurate
But precise
Both Accurate
and Precise
Interpreting Results of
Experiments
Goal of research is to draw conclusions.
What did the study mean?
What, if any, is the cause and effect of
the outcome?
Introduction to Sampling
Sampling is the problem of accurately
acquiring the necessary data in order to
form a representative view of the
problem.
This is much more difficult to do than is
generally realized.
Overall Methodology:
* State the objectives of the survey
* Define the target population
* Define the data to be collected
* Define the variables to be determined
* Define the required precision & accuracy
* Define the measurement `instrument'
* Define the sample size & sampling
method, then select the sample
Sampling
selecting the group that you will
actually collect data from in your
research.
Distributions:
When you form a sample you often show
it by a plotted distribution known as a
histogram .
A histogram is the distribution of
frequency of occurrence of a certain
variable within a specified range.
NOT A BAR GRAPH WHICH LOOKS VERY SIMILAR
Interpreting quantitative
findings - Statstical Analysis
Descriptive Statistics : Mean, median,
mode, frequencies
Error analyses
Mean
• In science the term mean is really the
arithmetic mean
• Given by the equation
• X = 1/n xi
n
i=1
Or more simply put, the sum of values divided by the
number of values summed
Median
• Consider the set
• 1, 1, 2, 2, 3, 6, 7, 11, 11, 13, 14, 16, 19
– In this case there are 13 values so the median
is the middle value, or (n+1) / 2
– (13+1) /2 = 7
• Consider the set
• 1, 1, 2, 2, 3, 6, 7, 11, 11, 13, 14, 16
– In the second case, the mean of the two middle
values is the median or (n+1) /2
(12 + 1) / 2 = 6.5 ~ (6+7) / 2 = 6.5
Or more simply put the mid value separating all
values in the upper 1/2 of the values from those
in the lower half of the values
Mode
The most frequent value in a data set
• Consider the set
• 1, 1, 1, 1, 2, 2, 3, 6, 11, 11, 11, 13, 14, 16, 19
– In this case the mode is 1 because it is the most
common value
• There may be cases where there are more than
one mode as in this case
• Consider the set
• 1, 1, 1, 1, 2, 2, 3, 6, 11, 11, 11, 11, 13, 14, 16, 19
– In this case there are two modes (bimodal) : 1 and 11
because both occur 4 times in the data set.

Weitere ähnliche Inhalte

Ähnlich wie Data and Data Collection in Data Science.ppt

3. measures of central tendency
3. measures of central tendency3. measures of central tendency
3. measures of central tendency
renz50
 
Introduction To Statistics
Introduction To StatisticsIntroduction To Statistics
Introduction To Statistics
albertlaporte
 
Statistics pres 3.31.2014
Statistics pres 3.31.2014Statistics pres 3.31.2014
Statistics pres 3.31.2014
tjcarter
 
Statistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docxStatistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docx
darwinming1
 
ANALYSIS ANDINTERPRETATION OF DATA Analysis and Interpr.docx
ANALYSIS ANDINTERPRETATION  OF DATA Analysis and Interpr.docxANALYSIS ANDINTERPRETATION  OF DATA Analysis and Interpr.docx
ANALYSIS ANDINTERPRETATION OF DATA Analysis and Interpr.docx
cullenrjzsme
 
Applied statistics lecture_2
Applied statistics lecture_2Applied statistics lecture_2
Applied statistics lecture_2
Daria Bogdanova
 

Ähnlich wie Data and Data Collection in Data Science.ppt (20)

Stat and prob a recap
Stat and prob   a recapStat and prob   a recap
Stat and prob a recap
 
data_management_review_descriptive_statistics.ppt
data_management_review_descriptive_statistics.pptdata_management_review_descriptive_statistics.ppt
data_management_review_descriptive_statistics.ppt
 
3. measures of central tendency
3. measures of central tendency3. measures of central tendency
3. measures of central tendency
 
1.3 collecting sample data
1.3 collecting sample data1.3 collecting sample data
1.3 collecting sample data
 
BIOSTATISTICS.pptx sidhathab.pptx oral pathology
BIOSTATISTICS.pptx sidhathab.pptx oral pathologyBIOSTATISTICS.pptx sidhathab.pptx oral pathology
BIOSTATISTICS.pptx sidhathab.pptx oral pathology
 
Introduction To Statistics
Introduction To StatisticsIntroduction To Statistics
Introduction To Statistics
 
Statistics pres 3.31.2014
Statistics pres 3.31.2014Statistics pres 3.31.2014
Statistics pres 3.31.2014
 
Statistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docxStatistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docx
 
Chapter 11 Psrm
Chapter 11 PsrmChapter 11 Psrm
Chapter 11 Psrm
 
Medical Statistics.ppt
Medical Statistics.pptMedical Statistics.ppt
Medical Statistics.ppt
 
Introduction to Biostatistics_20_4_17.ppt
Introduction to Biostatistics_20_4_17.pptIntroduction to Biostatistics_20_4_17.ppt
Introduction to Biostatistics_20_4_17.ppt
 
2.1 frequency distributions for organizing and summarizing data
2.1 frequency distributions for organizing and summarizing data2.1 frequency distributions for organizing and summarizing data
2.1 frequency distributions for organizing and summarizing data
 
Machine learning pre requisite
Machine learning pre requisiteMachine learning pre requisite
Machine learning pre requisite
 
Week 2 measures of disease occurence
Week 2  measures of disease occurenceWeek 2  measures of disease occurence
Week 2 measures of disease occurence
 
scope and need of biostatics
scope and need of  biostaticsscope and need of  biostatics
scope and need of biostatics
 
STATISTICS.pptx for the scholars and students
STATISTICS.pptx for the scholars and studentsSTATISTICS.pptx for the scholars and students
STATISTICS.pptx for the scholars and students
 
ANALYSIS ANDINTERPRETATION OF DATA Analysis and Interpr.docx
ANALYSIS ANDINTERPRETATION  OF DATA Analysis and Interpr.docxANALYSIS ANDINTERPRETATION  OF DATA Analysis and Interpr.docx
ANALYSIS ANDINTERPRETATION OF DATA Analysis and Interpr.docx
 
Applied statistics lecture_2
Applied statistics lecture_2Applied statistics lecture_2
Applied statistics lecture_2
 
Measures of central tendency and dispersion
Measures of central tendency and dispersionMeasures of central tendency and dispersion
Measures of central tendency and dispersion
 
Basic Statistical Concepts in Machine Learning.pptx
Basic Statistical Concepts in Machine Learning.pptxBasic Statistical Concepts in Machine Learning.pptx
Basic Statistical Concepts in Machine Learning.pptx
 

Kürzlich hochgeladen

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 

Kürzlich hochgeladen (20)

Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 

Data and Data Collection in Data Science.ppt

  • 1. Data and Data Collection Quantitative – Numbers, tests, counting, measuring Fundamentally--2 types of data Qualitative – Words, images, observations, conversations, photographs
  • 3. Quantitative Methods Experiment: Research situation with at least one independent variable, which is manipulated by the researcher
  • 4. Independent Variable: The variable in the study under consideration. The cause for the outcome for the study. Dependent Variable: The variable being affected by the independent variable. The effect of the study y = f(x) Which is which here?
  • 5. Key Factors for High Quality Experimental Design Data should not be contaminated by poor measurement or errors in procedure. Eliminate confounding variables from study or minimize effects on variables. Representativeness: Does your sample represent the population you are studying? Must use random sample techniques.
  • 6. What Makes a Good Quantitative Research Design? 4 Key Elements 1. Freedom from Bias 2. Freedom from Confounding 3. Control of Extraneous Variables 4. Statistical Precision to Test Hypothesis
  • 7. Bias: When observations favor some individuals in the population over others. Confounding: When the effects of two or more variables cause surprise or confusion. Extraneous Variables: An extraneous variable is any variable not being investigated that has the potential to affect the outcome of a research study. Need to identify and minimize these variables. e.g., Erosion potential as a function of clay content. rainfall intensity, vegetation & duration would be considered extraneous variables.
  • 8. Precision versus accuracy "Precise" means sharply defined or measured. "Accurate" means truthful or correct.
  • 9. Accurate Not precise Neither accurate nor precise Not accurate But precise Both Accurate and Precise
  • 10. Interpreting Results of Experiments Goal of research is to draw conclusions. What did the study mean? What, if any, is the cause and effect of the outcome?
  • 11. Introduction to Sampling Sampling is the problem of accurately acquiring the necessary data in order to form a representative view of the problem. This is much more difficult to do than is generally realized.
  • 12. Overall Methodology: * State the objectives of the survey * Define the target population * Define the data to be collected * Define the variables to be determined * Define the required precision & accuracy * Define the measurement `instrument' * Define the sample size & sampling method, then select the sample
  • 13. Sampling selecting the group that you will actually collect data from in your research. Distributions: When you form a sample you often show it by a plotted distribution known as a histogram . A histogram is the distribution of frequency of occurrence of a certain variable within a specified range. NOT A BAR GRAPH WHICH LOOKS VERY SIMILAR
  • 14.
  • 15.
  • 16. Interpreting quantitative findings - Statstical Analysis Descriptive Statistics : Mean, median, mode, frequencies Error analyses
  • 17. Mean • In science the term mean is really the arithmetic mean • Given by the equation • X = 1/n xi n i=1 Or more simply put, the sum of values divided by the number of values summed
  • 18. Median • Consider the set • 1, 1, 2, 2, 3, 6, 7, 11, 11, 13, 14, 16, 19 – In this case there are 13 values so the median is the middle value, or (n+1) / 2 – (13+1) /2 = 7 • Consider the set • 1, 1, 2, 2, 3, 6, 7, 11, 11, 13, 14, 16 – In the second case, the mean of the two middle values is the median or (n+1) /2 (12 + 1) / 2 = 6.5 ~ (6+7) / 2 = 6.5 Or more simply put the mid value separating all values in the upper 1/2 of the values from those in the lower half of the values
  • 19. Mode The most frequent value in a data set • Consider the set • 1, 1, 1, 1, 2, 2, 3, 6, 11, 11, 11, 13, 14, 16, 19 – In this case the mode is 1 because it is the most common value • There may be cases where there are more than one mode as in this case • Consider the set • 1, 1, 1, 1, 2, 2, 3, 6, 11, 11, 11, 11, 13, 14, 16, 19 – In this case there are two modes (bimodal) : 1 and 11 because both occur 4 times in the data set.