08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Akram najjar exploiting your data (for printing)
1. Quantified Self
Exploiting Your Data
22 March 2012
Akram Najjar
This talk is en “eye opener”
We will Not discuss
Techniques or “How”
Data is Analyzed!
We will Only talk about “What”
such methods can give us
2. What Methods can you Apply to Your Data?
A. The Bell Shaped Curve (Normal Distribution)
B. Correlation of two variables
C. Forecasting using Simple Linear Regression
(Best Line of Fit)
D. Statistical Process Control
3 / 25
Other Tools that work directly on Data . . . .
Goodness of Fit testing
Independence Testing
Moving Averages and Exponential Smoothing
Non-Linear Regression
(polynomial, exponential, logarithmic)
Weighted Index Scoring
Excel: The Pivot Table
Excel: Conditional Formatting
4 / 25
3. A. The Bell Shaped Curve
(The Gaussian or Normal Distribution)
Useful when you have a lot of data
Prepare a Bar Chart or a Frequency Table
Most likely, they will plot as a Bell Shaped Curve
(Normal/Gauss Curve)
Example: Measurements of most natural variables
Example: Measurements of most manufactured items
Prepare a frequency table of your data
How many times did you get a specific value?
Out of 200 measurements, how many times was your Systolic
Blood Pressure = 110,115, 120, 125, 130, 135, 140 . .
5 / 25
Here are 24 Systolic Blood Pressure
Measurements – They Look like a Bell Curve
Probability of
Pressure > 125
= (4 + 2) / 24
= 1/4 = 25%
How many times?
Probability of Pressure > 125 = (4 + 2) / 24 = 1/4 = 25%
4. If we had 201 measurements . . . .
Total Count in Bars
= Area of Bars
= Probability > 122
= 15.83%
The Bell Shaped Curve
is completely defined by:
a) Average (115) of the data
b) Standard deviation (7) of the data. It
indicates how spread is our data from
the average.
(Approx 70% of observations are
between 115-7 and 115+7)
5. What do we get if we use the Bell Shaped
Curve (Normal Distribution)?
Benefit 1: measuring the spread of our data
Benefit 2: we can now compare specific
scores in two different population (next slide)
Benefit 3: if we know the measure, we can
compute the probability of it happening
Benefit 4: if we know the probability, we can
work out the cut off measure that will give it
9 / 25
If I have the same score 78 in Courses A and B,
can I say I am doing the same in both?
78
72
88
6. Benefits 3 and 4
Given a specific measurement or range, what is
the probability of their occurrence?
Probability I will get a fever of more than 38 degrees?
Probability flights will be more than 30 minutes late?
Probability my systolic is > 122
Given the probability, what is the cutoff
measurement?
I want to remain at a sugar level representing the top 15%
allowed, what is the level related to that?
If Human Resources want the top 15% results, what is the
passing grade?
11 / 25
B. Correlation
If we have two sets of data, how are they related?
Example: Blood Pressure vs Intake of Salt
Example: Advertising Expenditure vs Sales Revenue
Example: Hours walked per day vs Weight in Kilograms
What is the direction of the relationship?
Direct or inverse?
What is the strength of the relationship?
Correlation
We use the Correlation Function (Demonstrate in Excel)
12 / 25
7. C. Forecasting using Simple Linear
Regression (Best Line of Fit)
If we have an independent variable (X): Sugar Intake
And a dependent variable (Y): Weight
What is the relationship that allows us to forecast
Weight for different Sugar Intakes?
We need two columns: X and Y
Simple Linear Regression allows us to find the Best
Line to fit our data
13 / 25
Regression finds the Best Line
that Fits our Observations
5
Y
4
3
2
1
0,0 1 2 3 4 5 6 7 8
8. Which Straight Line Best Fits our Observations?
5
Y
4
3
2
1
0,0 1 2 3 4 5 6 7 8
Multiple Regression: allows us to find the
Equation Y = aX1 + bX2 + cX3 + d
X2 X3
X1 Y
16 / 25
9. D. Statistical Process Control (SPC)
The Purpose of SPC is to Monitor a Process
SPC allows us to Check if a variable is behaving properly
Over time
Over different locations/departments
Over different events
Over different samples
Control Charts were first used in Bell Labs (1924)
Although mostly used in industry SPC can be used in any sector
17 / 25
The General Form of a Control Chart: 4 Components
4) Process Data
1) UCL : Upper Control Limit
Our Variable
2) AL : Average Line
3) LCL : Lower Lower Limit
The IDs of the Samples - - - - - OR The Time Series
10. This Process is “In control”
50
45
40
35
Upper Limit
30
25
20
15
10
Lower Limit
5
0
This Process is Regularly “Out of Control”
Look for an explanation INSIDE the system
11. This Process is Irregularly “Out of Control”
Look for an explanation OUTSIDE the system
This Process is Irregularly “Out of Control”.
Trends in either
direction of 5 or
more points
Look for an explanation OUTSIDE the system
12. The 7 Point Rule: there is a problem if 7 points in a
row (Or more) are above the average or below it
Look for an explanation OUTSIDE the system
Types of Control Charts