3. Chapter Sixteen
EXPLORING, DISPLAYING,
AND EXAMINING DATA
16-3
4. Types of Data Analysis
• Exploratory data analysis
– the data guide the choice of analysis--or a
revision of the planned analysis
• Confirmatory data analysis
– closer to classical statistical inference in its
use of significance and confidence
– may use information from a closely related
data set or by validating findings through
the gathering and analyzing of new data
16-4
5. Techniques to Display
and Examine Distributions
• Frequency Table
• Visual Displays
– Histograms
– Stem-and-leaf display
– Box-plot
• Crosstabulation of Variables
16-5
6. Techniques to Display
and Examine Distributions
• Histograms
– Display all intervals in a distribution, even
without observed values
– Examine the shape of the distribution for
skewness, kurtosis, and the modal pattern
16-6
7. Techniques to Display
and Examine Distributions (cont.)
• Box-plot (box and whisker-plot)
– Rectangular plot encompasses 50% of
the data values
• Edges of the box (hinges)
– Center line through the width of the box
marks the median
– Whiskers extend from the right and left
hinges to the largest and smallest
values
16-7
8. Techniques to Display
and Examine Distributions (cont.)
• Transformation
– To improve interpretation and
compatibility with other data sets
– To enhance symmetry and stabilize
spread
– To improve linear relationships between
and among variables
16-8
9. Improvement & Control Analysis
• Statistical process control
– Uses statistical tools to analyze, monitor,
and improve process performance
– Total Quality Management
– Control chart
• Displays sequential measurements of a
process together with a center line and control
limits
– Upper control limit
– Lower control limit
16-9
10. Types of Control Charts
• Variables data
(ratio or interval measurements)
– X-bar
– R-charts
– s-charts
– Pareto Diagrams
• Bar chart whose percentages sum to 100
percent
16-10
11. Geographic Information Systems
• Systems of hardware, software, and
procedures that capture, store,
manipulate, integrate, and display
spatially-referenced data
16-11
12. Geographic Information Systems
• Minimum four components
– Integrating information from various
sources
– Capturing data
– Projection and restructuring
– Modeling
16-12
13. Crosstabulation
• A technique for comparing two
classification variables
– Cells
– Marginals
– Contingency tables
16-13
14. Percentaging Errors
• Averaging percentages without
weighting
• Using too-large percentages (>100%)
• Using percentage with very small
sample
• Citing percentage decrease exceeding
100 percent
16-14
15. Other Table-based Analysis
• Automatic Interaction Detection (AID)
– Sequential partitioning procedure that uses
a dependent variable and set of predictors
– Searches among up to 300 variables for
the best single division of data into
subsets according to each predictor
variable,
– Chooses one division approach
– Splits the sample using chi-square tests to
create multi-way splits.
16-15