Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Nächste SlideShare
×

# Statistics

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Als Erste(r) kommentieren

• Gehören Sie zu den Ersten, denen das gefällt!

### Statistics

1. 1. 2013/05/221STATISTICSX-Kit TextbookChapter 9Precalculus TextbookAppendix B: Concepts in StatisticsPar B.2CONTENTTHE GOALLook at ways of summarising a largeamount of sample data in just one or twokey numbers.Two important aspects of a set of data:•The LOCATION•The SPREADMEASURES OF CENTRAL TENDENCY(LOCATION)Arithmetic Mean (Average)Mode (the highest point/frequency)Median (the middle observation)Number of fraudulent cheques received at abank each week for 30 weeksWeek12 3 4 5 6 7 8 9 105 3 8 3 3 1 10 4 6 8Week1112 13 14 15 16 17 18 19 203 5 4 7 6 6 9 3 4 5Week2122 23 24 25 26 27 28 29 307 9 4 5 8 6 4 4 10 4ARITHMETIC MEAN• 𝒙 =𝟏𝟔𝟒𝟑𝟎= 𝟓. 𝟒𝟕• To calculate the MEAN add all the data pointsin our sample and divide by die number ofdata points (sample size).• The MEAN can be a value that doesn’tactually match any observation.• The MEAN gives us useful information aboutthe location of our frequency distribution.
2. 2. 2013/05/222GRAPH0123456781 2 3 4 5 6 7 8 9 10FrequencyFrequencyCALCULATE THE MEANRaw Data• 𝑥 =𝑥𝑛• 𝑥 is datapoints• 𝑛 is numberofobservationsFrequencyTable• 𝑥 =𝑥𝑓𝑛• 𝑥 is datapoints• 𝑛 is numberofobservations• 𝑓 is thefrequencyFrequencyTable (Intervals)• 𝑥 =𝑥𝑓𝑛• 𝑥 is midpointsfor intervals• 𝑛 is numberofobservations• 𝑓 is thefrequencyCALCULATE THE MEAN - FREQUENCY TABLE:NUBEROFFRAUDULENT CHEQUESPERWEEKDistinct Values TallyMarks Frequency1 / 12 03 //// 54 //// // 75 //// 46 //// 47 // 28 /// 39 // 210 // 2Truck Data: weights (in tonnes) of 20 fullyloaded trucksTruck12 3 4 5 6 7 8 9 10Weight4.543.81 4.29 5.16 2.51 4.63 4.75 3.98 5.04 2.80Truck1112 13 14 15 16 17 18 19 20Weight2.525.88 2.95 3.59 3.87 4.17 3.30 5.48 4.26 3.53CALCULATE THE MEAN - GROUPEDFREQUENCY TABLE:TruckData: weights(intonnes)of20fullyloadedtrucksClass Intervals Frequency Midpoint𝟐. 𝟓 ≤ 𝒙 ≤ 𝟑. 𝟎 4 𝟐. 𝟓 + 𝟑. 𝟎 ÷ 𝟐 = 2.75𝟑. 𝟎 < 𝒙 ≤ 𝟑. 𝟓 1 3.25𝟑. 𝟓 < 𝒙 ≤ 𝟒. 𝟎 5 3.75𝟒. 𝟎 < 𝒙 ≤ 𝟒. 𝟓 3 4.25𝟒. 𝟓 < 𝒙 ≤ 𝟓. 𝟎 3 4.75𝟓. 𝟎 < 𝒙 ≤ 𝟓. 𝟓 3 5.25𝟓. 𝟓 < 𝒙 ≤ 𝟔. 𝟎 1 5.75MODE•The mode is the interval with theHIGHEST FREQUENCY.•There can be two or more modes in a setof data – then the mode would not be agood measure of central tendency.•MULTI-MODAL data consist of more thanone mode.•UNI-MODAL data consist of only onemode.
3. 3. 2013/05/223GRAPH: The MODE = 40123456781 2 3 4 5 6 7 8 9 10FrequencyFrequencyCall Centre Data: waiting times (in seconds)for 35 randomly selected customersC1 2 3 4 5 6 7 8 9 10 11 1275 37 13 90 45 23 104 135 30 73 34 12C13 14 15 16 17 18 19 20 21 22 23 2438 40 22 47 26 57 65 33 9 85 87 16C25 26 27 28 29 30 31 32 33 34 35102 115 68 29 142 5 15 10 25 41 49FREQUENCY TABLE: The MODAL CLASS is theinterval 𝟐𝟓 < 𝒙 ≤ 𝟓𝟎Class Intervals TallyMarks Frequency0 ≤ 𝑥 ≤ 25 //// //// 1025 < 𝑥 ≤ 50 //// //// / 1150 < 𝑥 ≤ 75 //// / 675 < 𝑥 ≤ 100 /// 3100 < 𝑥 ≤ 125 /// 3125 < 𝑥 ≤ 150 // 2HISTOGRAM: MODAL CLASS (𝟐𝟓 < 𝒙 ≤ 𝟓𝟎]024681012Intervals[0;25](25;50](50;75](75;100](100;125](125;150]THE MEDIAN – RAW DATA:Numberoffraudulentchequesreceived atabankeach weekfor30weeksWeek12 3 4 5 6 7 8 9 105 3 8 3 3 1 10 4 6 8Week1112 13 14 15 16 17 18 19 203 5 4 7 6 6 9 3 4 5Week2122 23 24 25 26 27 28 29 307 9 4 5 8 6 4 4 10 4MEDIAN• Median = 5• Put all observations in order from smallest tolargest, then the middle observation is theMEDIAN.1, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5,5, 6, 6, 6, 6, 7, 7, 8, 8, 8, 9, 9, 10, 10
4. 4. 2013/05/224DON’T FALL INTO THE COMMON TRAP• The median is NOT the middle of the range ofobservations, for example1, 1, 1, 1, 1, 3, 9The median is 1 (the middle observation).The middle of the range (9 – 1) is 5! Bigdifference!MEDIANOdd Number ofObservations,for example 7Median Position𝒏+𝟏𝟐Even Number ofObservations,for example30Median Positionhalf-way between𝒏𝟐𝒂𝒏𝒅 (𝒏𝟐+ 𝟏)FINDTHE MEDIAN -FREQUENCYTABLE:NUBER OF FRAUDULENT CHEQUES PERWEEKDistinct Values Frequency CumulativeFrequency1 1 12 0 13 5 64 7 135 4 176 4 217 2 238 3 269 2 2810 2 30FIND THE MEDIAN - GROUPED FREQUENCYTABLE:TruckData: weights(intonnes)of20fullyloadedtrucksClassIntervals Frequency Midpoint𝟐. 𝟓 ≤ 𝒙 ≤ 𝟑. 𝟎 4 𝟐. 𝟓 + 𝟑. 𝟎 ÷ 𝟐 = 2.75𝟑. 𝟎 < 𝒙 ≤ 𝟑. 𝟓 1 3.25𝟑. 𝟓 < 𝒙 ≤ 𝟒. 𝟎 5 3.75𝟒. 𝟎 < 𝒙 ≤ 𝟒. 𝟓 3 4.25𝟒. 𝟓 < 𝒙 ≤ 𝟓. 𝟎 3 4.75𝟓. 𝟎 < 𝒙 ≤ 𝟓. 𝟓 3 5.25𝟓. 𝟓 < 𝒙 ≤ 𝟔. 𝟎 1 5.75FIND THE MEDIAN FROM A GROUPEDFREQUENCY TABLE•Median (middle observation)?•Find the class interval in which thatobservation lies.?CALCULATIONSRaw DataMeanModeMedianFrequency Table(UngroupedData)MeanModeMedianFrequency Table(Grouped Data)MeanModeMedian
5. 5. 2013/05/225HOW TO CHOOSE THE BEST MEASURE OFLOCATION?• When choosing the best measure of location, weneed to look as the SHAPE of the distribution.• For nearly symmetric data, the mean is the bestchoice.• For very skewed (asymmetric) data, the mode ormedian is better.• The mean moves further along the tail than themedian, it is more sensitive to the values far fromthe centre.SYMMETRIC histogram:Mean = Median = ModeA POSITIVELY SKEWED (skewed to the right)histogram has a longer tail on the right side:Mode < Median < MeanA NEGATIVELY SKEWED (skewed to the left)histogram has a longer tail on the left side:Mean < Median < ModePROBLEM•We can find two very different data sets (onedistribution very spread out and another veryconcentrated) with measures of centraltendency EQUAL.•To find a true idea of our sample, we have toMEASURE THE SPREAD OF A DISTRIBUTION,called the spread dispersion.MEASURESOF SPREAD(DISPERSION)Interquartile RangeVarianceStandard Deviation
6. 6. 2013/05/226MEASURINGSPREAD•Think of a distribution in terms ofpercentages, a horizontal axis equally dividedinto 100 percentiles.•The 10th percentile marks the point belowwhich 10% of the observations fall, andabove which 90% of observations fall.•The 50th percentile, below which 50% of theobservations lie, is the median.WORKINGWITH A PERCENTILE• 𝑝% of the observationfall belowthe 𝑝 𝑡ℎ percentile.𝑷𝒐𝒔𝒊𝒕𝒊𝒐𝒏 =𝒑𝟏𝟎𝟎𝒏 + 𝟏• Workingwith the example on fraudulentcheques:1, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6,7, 7, 8, 8, 8, 9, 9, 10, 10𝑷 𝟓𝟎 =𝟓𝟎𝟏𝟎𝟎𝟑𝟎 + 𝟏 = 𝟏𝟓. 𝟓• 15.5 tells us where to find our 50th percentile.• 15 tells us which observation to go to, and 0.5 tells us how far tomove along the space between that observation and the nexthighest one.FORMULA• 𝑷 𝟓𝟎 = 𝒙 𝟏𝟓 + 𝟎. 𝟓 𝒙 𝟏𝟔 − 𝒙 𝟏𝟓𝑷 𝒑 = 𝒙 𝒌 + 𝒅 𝒙 𝒌+𝟏 − 𝒙 𝒌• 𝑃 means percentile• 𝑝 tell us which percentile• 𝑘 the whole number calculated from theposition• 𝑑 the decimal fraction calculated from thepositionWORKINGWITH PERCENTILESFROMUNGROUPEDFREQUENCYDATA:NUBEROFFRAUDULENT CHEQUESPERWEEKDistinct Values Frequency Cumulative Frequency1 1 12 0 0 + 1 = 13 5 1 + 5 = 64 7 6 + 7 = 135 4 13 + 4 = 176 4 17 + 4 = 217 2 21 + 2 = 238 3 23 + 3 = 269 2 26 + 2 = 2810 2 28 + 2 = 30WORKING WITH PERCENTILES (ANDMEDIAN) FROM GROUPED DATA• To identify the class interval 𝑳 < 𝒙 ≤ 𝑼 containing the𝑝 𝑡ℎ percentile:𝑷𝒐𝒔𝒊𝒕𝒊𝒐𝒏 =𝒑𝟏𝟎𝟎𝒏 + 𝟏• The decimal fraction for grouped data is:𝒅 =𝑷𝒐𝒔𝒊𝒕𝒊𝒐𝒏−𝑺𝒖𝒎 𝒐𝒇 𝒄𝒍𝒂𝒔𝒔 𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒊𝒆𝒔 𝒕𝒐 𝑳𝑭𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚 𝒐𝒇 𝒄𝒍𝒂𝒔𝒔 𝑳 < 𝒙 ≤ 𝑼• Calculate the 𝑝 𝑡ℎ percentile:𝑷 𝒑 ≈ 𝑳 + 𝒅 𝑼 − 𝑳FIND THE MEDIAN - GROUPED FREQUENCYTABLE:TruckData: weights(intonnes)of20fullyloadedtrucksClass Intervals Frequency CumulativeFrequency𝟐. 𝟓 ≤ 𝒙 ≤ 𝟑. 𝟎 4 4𝟑. 𝟎 < 𝒙 ≤ 𝟑. 𝟓 1 5𝟑. 𝟓 < 𝒙 ≤ 𝟒. 𝟎 5 10𝟒. 𝟎 < 𝐱 ≤ 𝟒. 𝟓 3 13𝟒. 𝟓 < 𝒙 ≤ 𝟓. 𝟎 3 16𝟓. 𝟎 < 𝒙 ≤ 𝟓. 𝟓 3 19𝟓. 𝟓 < 𝒙 ≤ 𝟔. 𝟎 1 20