SlideShare ist ein Scribd-Unternehmen logo
1 von 55
Downloaden Sie, um offline zu lesen
Making sense out of data
(aka doing statistics)
Things you will need
Who I am and what I do
               Corey Chivers
     PhD Student in Biology at McGill
I study biological invasions using statistics
What is a Statistician?
What is a Statistician?
A statistician is
someone who:
What is a Statistician?
A statistician is
                    ●   Turns data into insights.
someone who:
What is a Statistician?
A statistician is
                    ●   Turns data into insights.
someone who:        ●   Answers questions about the world.
What is a Statistician?
                                               var
                                                   iat
A statistician is
                    ●   Turns data into insights.      io   n i
                                                               n
someone who:        ●   Answers questions about the world.
What is a Statistician?
                                                 var
                                                     iat
A statistician is
                    ●   Turns data into insights.        io   n i
                                                                 n
someone who:        ●   Answers questions about the world.
                    ●   Isn't fun to talk to at a party?
Statistics is very cool
Data is Everywhere
Data is Everywhere
Statisticians are in demand
Portrait of a Statistician
Portrait of a Statistician




                  ?
Portrait of a Statistician
The cool kids are calling themselves Data Scientists
Portrait of a Statistician
The cool kids are calling themselves Data Scientists

                            Name: Hilary Mason

                            Title: Chief Data Scientist at bit.ly

                            member of Mayor Bloomberg’s Technology
                            and Innovation Advisory Council

                            From her web bio:
                                “I <3 data and cheeseburgers.”
What do you know about statistics?
●   On a piece of paper, make a list of all the
    words you know about statistics.

●   I'll start:
    –   Average (mean)
    –   Variance
    –   Normal distribution
    –   ...
Despite how exciting we are,
     statisticians always start by
    assuming the world is boring
The Null Hypothesis, or Ho is this boring world.
Despite how exciting we are,
     statisticians always start by
    assuming the world is boring
The Null Hypothesis, or Ho is this boring world.

  Usually something like “there is no effect of
   caption size on the lulzyness of LOLcats”
Looking for evidence against the
            Null Hypothesis
●   The alternative hypothesis (Ha) is that
    something interesting is going on.
    –   Ex: “Bigger captions are, on average, funnier”

●   How would we know?
Looking for evidence against the
            Null Hypothesis
●   The alternative hypothesis (Ha) is that
    something interesting is going on.
    –   Ex: “Bigger captions are, on average, funnier”

●   How would we know?

●   To the internetz!
Collect some sample data!



                                           Small caption,
                                           fairly
                                           humourous


Big caption, quite funny


                                          Small caption, funny-ish




Big caption, peed in pants a little
Dealing with variability
Some small caption images
are funny, and some large
caption images are not funny.


There is variance in the data.


But we want to know if there
is a difference on average.
We'll need to take variance
into account.
Descriptive Statistics
        Measures of Variability
      Variance                                        Standard Deviation




                                                      √
        n                                                  n
       ∑ ( xi − ̄)
                x       2
                                                          ∑ (x i − ̄ )
                                                                   x
                                                                      2
2      i=1
s=                                               s=
                                                          i=1
             n−1                                                n−1


    Where xi = the ith value of a distribution
    n = number of values in the sample
    x = sample mean
Descriptive Statistics
               Measures of Variability
1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 8, 9
                                                                    n

Variance and Standard Deviation                                   ∑ ( x i − ̄ )2
                                                                            x
                                                              2   i=1
Therefore, variance of our dist’n (w/ mean = 5):            s=
                                                                        n−1

     Step 1                 Step 2                        Step 3
    1-5 = -4               -42 = 16           16 + 9 + 4 + 4 + […] + 16 = 72
    2-5 = -3                -32 = 9
    3-5 = -2                -22 = 4                   Step 4 (Variance)
    3-5 = -2                -22 = 4                      72/18 = 4
      […]                     […]
    9-5 = 4                42 = 16
                                                     Step 5 (Std Deviation)
                                                           √4 = 2
Your turn
Calculate the variance of the heights in your group.

          n               1) Write down your heights (xi)

         ∑ ( x i − ̄ )2
                   x      2) Calculate the average (Σxi / n)
                          3) Subtract the average for each
     2   i=1
    s=                    height and square it
               n−1        4) Add them all up and divide by n-1
Variance
Measures of Central Tendency
             Calculating the Mean
Using the following distribution of values:
1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 8, 9

(Arithmetic) Mean – the average of a distribution of values
               n

             ∑ xi      or    Sum of values in dist’n
     x       i =1           Number of values in dist’n
     ̄ =
              n−1
       1+2+3+3+4+4+4+5+5+5+5+5+6+6+6+7+7+8+9
                         19                               =5
Could the difference be due to
                   chance?
Remember, we started by
assuming that there was no
difference (the Null
Hypothesis).

If the Null Hypothesis is
true, what are the
chances that we
observed this amount of
difference between
groups?

How do we decide whether
the difference is due to
chance or not?

By vote???
A better way: (formal) Hypothesis
                 testing
●   Determine in advance the level of error you
    are willing to put up with.
    –   We cannot avoid the chance of errors, but we can
        decide how often we are willing to have them
        happen.
●   Biologist like to use 0.05 (a 1 in 20 chance).
●   We call this α (alpha)
A better way: (formal) Hypothesis
                 testing
●   Determine in advance the level of error you
    are willing to put up with.
    –   We cannot avoid the chance of errors, but we can
        decide how often we are willing to have them
        happen.
●   Biologist like to use 0.05 (a 1 in 20 chance).
●   We call this α (alpha)

                       Ronald Fisher:
                       The man behind
                       the idea of NHST
A better way: (formal) Hypothesis
                 testing
●   Calculate how likely your data set is if the null
    were true.

●   If it is less than α, we say that we reject the
    null hypothesis.

●   If we reject the null, we say the results are
    statistically significant.
A better way: (formal) Hypothesis
                 testing
●   Calculate how likely your data set is if the null
    were true.

●   If it is less than α, we say that we reject the
    null hypothesis.

●   If we reject the null, we say the results are
    statistically significant.
             “The world is not boring afterall!”
Lets do it!
●   To calculate how likely it is that our data is
    from the null hypothesis (ie difference is due to
    chance), we need a statistic.

●   But first, some Beer!
Student's t-test
●   William Sealy Gosset figured out how to test if
    a batch of beer was significantly different than
    the standard.
                                   While working for the
                                   Guinness brewing company,
                                   he was forbidden to publish
                                   academic research, so
                                   published his method under
                                   the pseudonym 'student'.
Student's t-test
   The t-value is calculated using the following equation:

                               X 1− X 2
                               ̄    ̄
                          t=

                               √
                                   2   2
                                   s s
                                   1   2
                                     +
                                   n1 n2

Where x 1 and x 2 are the means of the experimental and control
groups;
S12 and S22 are the variances of the experimental and control groups;
n1 and n2 are the sample sizes for the experimental and control
groups.
Student's t-test
   The t-value is calculated using the following equation:

               X 1− X 2
               ̄    ̄
          t=

               √
                   2     2
                   s s
                   1     2
                     +
                   n1 n2

Where x 1 and x 2 are the means of the experimental and control
groups;
S12 and S22 are the variances of the experimental and control groups;
n1 and n2 are the sample sizes for the experimental and control
groups.
t-test
    State your alpha level


                    α = 0.05


If the t-test detects a difference between the means,
there is a 5% chance that this conclusion is
incorrect.
Calculating your t-value
                      Generic-brand     Name-brand
                        (Group 2)        (Group 1)
Mean # of chips         x 2 = 11.2      x 1 = 15.3

   Standard              S2 = 4.3        S1 = 2.4
   Deviation
n (sample size)           n1 = 3          n2 = 3




     X 1− X 2
     ̄    ̄
t=                            According to the data above:


     √
          2       2
         s1 s 2                       calculated t = 1.4
           +
         n1 n2
Alternate Hypothesis
You can only test ONE possible alternate hypothesis at
any one time. The one chosen depends on what you are
looking to find.
Alternative hypothesis: 2 types
   2-tailed
      Non-directional (general): not specifying a
        direction.
          “The two groups are not the same”
   1-tailed
      Directional (specific): specify direction
          “Group A is greater than group B.”
Look up the Critical t-value
In order to find your critical t-value, you need 3 pieces of
    information:
   1. Whether the alternate hypothesis is 1- or 2-tailed
   2. Alpha level (usually = 0.05)
   3. Degrees of freedom (df = n-1)

Calculating degrees of freedom (df)
Degrees of Freedom = n-1
What if you have 2 different sample sizes (n1 and n2)…
which do you pick to calculate your degrees of freedom?
A: df = the smallest of : (n1-1) or (n2 –1)
Looking up your Critical t-value
Compare your ‘calculated’ t-
   value with your ‘critical’ t-value
It is the difference in values between the t-value and critical t that
will determine whether you can reject or fail to reject your null
hypothesis
a) If ‘calculated’ > ‘critical’, then: reject null hyp.
         “My observed data are really unlikely under the
        null hypothesis, therefore I reject the null
        hypothesis!”
b) If ‘calculated’ < ‘critical’, then: do NOT reject null
hyp.
         “My observed data are consistent with the null
        hypothesis, therefore I have no reason to believe
        that it is not true.”
What if we are measuring a
    category, rather than a number?
●   The t-test lets us compare the value of some
    attribute between two groups.
    –   Do mutant fruit flies live longer than wild type?
    –   Does IQ differ between Dawson and Laurier students?
    –   Does drug x decrease blood pressure?
●   The dependent variable is quantitative:
    –   Life span
    –   IQ
    –   Blood pressure
What if we are measuring a
    category, rather than a number?
●   Chi-squared test lets us test hypotheses
    about categories.
    –   Are there more cars of a certain colour getting speeding
        tickets?
    –   Is the ratio of dominant to recessive phenotypes 3:1?
    –   Do chromosomes assort independently?
●   The dependent variable is categorical:
    –   Car colour
    –   Phenotype
    –   Chromosome donor
Chi-square or T-test???
How do you know which one you need?
               T-Test                     Chi-square Test
• the dependent variable is    • the dependent variable is
quantitative                   qualitative (aka. Nominal data)
 (e.g. height, weight, etc.)    (e.g. gender, colour, etc.)
• data can be organized as two • data can be easily tabulated as
lists of numbers               counts:
Example: Room Cold             Example:
             temp    temp
             (bpm)   (bpm)                Male    98
             178     86                   Femal   102
                                          e
             169     89

             192     55
                                    (dependent variable: gender)
(dependent variable: heart rate)
Steps to performing a chi-square test

1.  State your null hypothesis
2.  State your alternate hypothesis
3.  State your alpha level (usually α = 0.05)
4.  Calculate your ‘calculated chi-square value’
5.  Look up your ‘critical chi-square value’ (from chi-square
    table)
6. Compare your ‘calculated’ and ‘critical’ values
   a) If ‘calculated’ > ‘critical’, conclusion: reject null hyp.
   b) If ‘calculated’ < ‘critical’, conclusion: do NOT reject null
       hyp.
7. State your conclusion
Sample hypotheses for chi-square

                                          Sex ratio in our class
 Null hypothesis




                         1. There is no difference between the frequency of
                         men and women in the class
                         ____________________________

                         2. There is a difference between the frequency of men
Alternative hypothesis




                         and women in the class




                              Chi-square can only test non-
                                   directional alt. hyp.
Calculating Chi-square
       ‘Calculated’ chi-square values are calculated
       using the following formula:
                                                          O = observed
                                                          E = expected


       Calculating the chi-square is easier using the following table:
Gender                       O           E          O-E          (O-E)2         (O-E)2
                                                                                  E
Female
Male


                                                    χ2 = sum of last column =
Looking up the Critical χ2
To find the critical χ2 , you need the alpha
  level and the df.
Df for a χ2 test = (# of categories) – 1

In our example, df = 2-1 = 1
Compare your ‘calculated’ chi-
    sq with your ‘critical’ chi-sq
It is the difference between the calculated chi-sq and critical chi-sq
that will determine whether you can reject or fail to reject your
null hypothesis
a) If ‘calculated’ > ‘critical’, then: reject null hyp.
         “My observed data are really unlikely under the
        null hypothesis, therefore I reject the null
        hypothesis!”
b) If ‘calculated’ < ‘critical’, then: do NOT reject null
hyp.
         “My observed data are consistent with the null
        hypothesis, therefore I have no reason to believe
        that it is not true.”
Statistics just might save your life
Questions for Corey
●   You can email me!
    Corey.chivers@mail.mcgill.ca

●   I blog about statistics:
    bayesianbiologist.com

●   I tweet about statistics:
    @cjbayesian

Weitere ähnliche Inhalte

Ähnlich wie Statistics for CEGEP Biology

Quantitative Methods for Lawyers - Class #7 - Probability & Basic Statistics ...
Quantitative Methods for Lawyers - Class #7 - Probability & Basic Statistics ...Quantitative Methods for Lawyers - Class #7 - Probability & Basic Statistics ...
Quantitative Methods for Lawyers - Class #7 - Probability & Basic Statistics ...Daniel Katz
 
Game theory for neural networks
Game theory for neural networksGame theory for neural networks
Game theory for neural networksDavid Balduzzi
 
Integrated modelling Cape Town
Integrated modelling Cape TownIntegrated modelling Cape Town
Integrated modelling Cape TownBob O'Hara
 
Big Data Analytics: The Math, the Implementation and How it can be Effectivel...
Big Data Analytics: The Math, the Implementation and How it can be Effectivel...Big Data Analytics: The Math, the Implementation and How it can be Effectivel...
Big Data Analytics: The Math, the Implementation and How it can be Effectivel...InfoTrust LLC
 
m2_2_variation_z_scores.pptx
m2_2_variation_z_scores.pptxm2_2_variation_z_scores.pptx
m2_2_variation_z_scores.pptxMesfinMelese4
 
Statistical thinking
Statistical thinkingStatistical thinking
Statistical thinkingmij1120
 
DataHandlingStatistics.ppt
DataHandlingStatistics.pptDataHandlingStatistics.ppt
DataHandlingStatistics.pptssuser7f3860
 
Statistical Programming with JavaScript
Statistical Programming with JavaScriptStatistical Programming with JavaScript
Statistical Programming with JavaScriptDavid Simons
 
統計(人間科学のための基礎数学)
統計(人間科学のための基礎数学)統計(人間科学のための基礎数学)
統計(人間科学のための基礎数学)Masahiro Okano
 
Introduction to probabilities and radom variables
Introduction to probabilities and radom variablesIntroduction to probabilities and radom variables
Introduction to probabilities and radom variablesmohammedderriche2
 
Waterloo GLMM talk
Waterloo GLMM talkWaterloo GLMM talk
Waterloo GLMM talkBen Bolker
 
Waterloo GLMM talk
Waterloo GLMM talkWaterloo GLMM talk
Waterloo GLMM talkBen Bolker
 
arithmetic-vs-geometric.pptx
arithmetic-vs-geometric.pptxarithmetic-vs-geometric.pptx
arithmetic-vs-geometric.pptxJESSALOUCAPAPAS2
 
Week 10 fraud copy
Week 10 fraud copyWeek 10 fraud copy
Week 10 fraud copysgwcollins
 

Ähnlich wie Statistics for CEGEP Biology (20)

Quantitative Methods for Lawyers - Class #7 - Probability & Basic Statistics ...
Quantitative Methods for Lawyers - Class #7 - Probability & Basic Statistics ...Quantitative Methods for Lawyers - Class #7 - Probability & Basic Statistics ...
Quantitative Methods for Lawyers - Class #7 - Probability & Basic Statistics ...
 
Game theory for neural networks
Game theory for neural networksGame theory for neural networks
Game theory for neural networks
 
Integrated modelling Cape Town
Integrated modelling Cape TownIntegrated modelling Cape Town
Integrated modelling Cape Town
 
Big Data Analytics: The Math, the Implementation and How it can be Effectivel...
Big Data Analytics: The Math, the Implementation and How it can be Effectivel...Big Data Analytics: The Math, the Implementation and How it can be Effectivel...
Big Data Analytics: The Math, the Implementation and How it can be Effectivel...
 
m2_2_variation_z_scores.pptx
m2_2_variation_z_scores.pptxm2_2_variation_z_scores.pptx
m2_2_variation_z_scores.pptx
 
2주차
2주차2주차
2주차
 
Statistical thinking
Statistical thinkingStatistical thinking
Statistical thinking
 
9주차
9주차9주차
9주차
 
Descriptive statistics i
Descriptive statistics iDescriptive statistics i
Descriptive statistics i
 
DataHandlingStatistics.ppt
DataHandlingStatistics.pptDataHandlingStatistics.ppt
DataHandlingStatistics.ppt
 
Statistical Programming with JavaScript
Statistical Programming with JavaScriptStatistical Programming with JavaScript
Statistical Programming with JavaScript
 
POINT_INTERVAL_estimates.ppt
POINT_INTERVAL_estimates.pptPOINT_INTERVAL_estimates.ppt
POINT_INTERVAL_estimates.ppt
 
Stat.pptx
Stat.pptxStat.pptx
Stat.pptx
 
統計(人間科学のための基礎数学)
統計(人間科学のための基礎数学)統計(人間科学のための基礎数学)
統計(人間科学のための基礎数学)
 
Introduction to probabilities and radom variables
Introduction to probabilities and radom variablesIntroduction to probabilities and radom variables
Introduction to probabilities and radom variables
 
Waterloo GLMM talk
Waterloo GLMM talkWaterloo GLMM talk
Waterloo GLMM talk
 
Waterloo GLMM talk
Waterloo GLMM talkWaterloo GLMM talk
Waterloo GLMM talk
 
arithmetic-vs-geometric.pptx
arithmetic-vs-geometric.pptxarithmetic-vs-geometric.pptx
arithmetic-vs-geometric.pptx
 
Week 10 fraud copy
Week 10 fraud copyWeek 10 fraud copy
Week 10 fraud copy
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 

Kürzlich hochgeladen

THE OBSTACLES THAT IMPEDE THE DEVELOPMENT OF BRAZIL IN THE CONTEMPORARY ERA A...
THE OBSTACLES THAT IMPEDE THE DEVELOPMENT OF BRAZIL IN THE CONTEMPORARY ERA A...THE OBSTACLES THAT IMPEDE THE DEVELOPMENT OF BRAZIL IN THE CONTEMPORARY ERA A...
THE OBSTACLES THAT IMPEDE THE DEVELOPMENT OF BRAZIL IN THE CONTEMPORARY ERA A...Faga1939
 
₹5.5k {Cash Payment} Independent Greater Noida Call Girls In [Delhi INAYA] 🔝|...
₹5.5k {Cash Payment} Independent Greater Noida Call Girls In [Delhi INAYA] 🔝|...₹5.5k {Cash Payment} Independent Greater Noida Call Girls In [Delhi INAYA] 🔝|...
₹5.5k {Cash Payment} Independent Greater Noida Call Girls In [Delhi INAYA] 🔝|...Diya Sharma
 
Group_5_US-China Trade War to understand the trade
Group_5_US-China Trade War to understand the tradeGroup_5_US-China Trade War to understand the trade
Group_5_US-China Trade War to understand the tradeRahatulAshafeen
 
Lorenzo D'Emidio_Lavoro sullaNorth Korea .pptx
Lorenzo D'Emidio_Lavoro sullaNorth Korea .pptxLorenzo D'Emidio_Lavoro sullaNorth Korea .pptx
Lorenzo D'Emidio_Lavoro sullaNorth Korea .pptxlorenzodemidio01
 
BDSM⚡Call Girls in Sector 143 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 143 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 143 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 143 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Embed-2 (1).pdfb[k[k[[k[kkkpkdpokkdpkopko
Embed-2 (1).pdfb[k[k[[k[kkkpkdpokkdpkopkoEmbed-2 (1).pdfb[k[k[[k[kkkpkdpokkdpkopko
Embed-2 (1).pdfb[k[k[[k[kkkpkdpokkdpkopkobhavenpr
 
05052024_First India Newspaper Jaipur.pdf
05052024_First India Newspaper Jaipur.pdf05052024_First India Newspaper Jaipur.pdf
05052024_First India Newspaper Jaipur.pdfFIRST INDIA
 
Powerful Love Spells in Phoenix, AZ (310) 882-6330 Bring Back Lost Lover
Powerful Love Spells in Phoenix, AZ (310) 882-6330 Bring Back Lost LoverPowerful Love Spells in Phoenix, AZ (310) 882-6330 Bring Back Lost Lover
Powerful Love Spells in Phoenix, AZ (310) 882-6330 Bring Back Lost LoverPsychicRuben LoveSpells
 
Verified Love Spells in Little Rock, AR (310) 882-6330 Get My Ex-Lover Back
Verified Love Spells in Little Rock, AR (310) 882-6330 Get My Ex-Lover BackVerified Love Spells in Little Rock, AR (310) 882-6330 Get My Ex-Lover Back
Verified Love Spells in Little Rock, AR (310) 882-6330 Get My Ex-Lover BackPsychicRuben LoveSpells
 
Enjoy Night⚡Call Girls Rajokri Delhi >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Rajokri Delhi >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Rajokri Delhi >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Rajokri Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceDelhi Call girls
 
AI as Research Assistant: Upscaling Content Analysis to Identify Patterns of ...
AI as Research Assistant: Upscaling Content Analysis to Identify Patterns of ...AI as Research Assistant: Upscaling Content Analysis to Identify Patterns of ...
AI as Research Assistant: Upscaling Content Analysis to Identify Patterns of ...Axel Bruns
 
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...Andy (Avraham) Blumenthal
 
BDSM⚡Call Girls in Indirapuram Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Indirapuram Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Indirapuram Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Indirapuram Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Busty Desi⚡Call Girls in Sector 62 Noida Escorts >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Sector 62 Noida Escorts >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Sector 62 Noida Escorts >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Sector 62 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Embed-4.pdf lkdiinlajeklhndklheduhuekjdh
Embed-4.pdf lkdiinlajeklhndklheduhuekjdhEmbed-4.pdf lkdiinlajeklhndklheduhuekjdh
Embed-4.pdf lkdiinlajeklhndklheduhuekjdhbhavenpr
 
Gujarat-SEBCs.pdf pfpkoopapriorjfperjreie
Gujarat-SEBCs.pdf pfpkoopapriorjfperjreieGujarat-SEBCs.pdf pfpkoopapriorjfperjreie
Gujarat-SEBCs.pdf pfpkoopapriorjfperjreiebhavenpr
 
TDP As the Party of Hope For AP Youth Under N Chandrababu Naidu’s Leadership
TDP As the Party of Hope For AP Youth Under N Chandrababu Naidu’s LeadershipTDP As the Party of Hope For AP Youth Under N Chandrababu Naidu’s Leadership
TDP As the Party of Hope For AP Youth Under N Chandrababu Naidu’s Leadershipanjanibaddipudi1
 
Enjoy Night ≽ 8448380779 ≼ Call Girls In Palam Vihar (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Palam Vihar (Gurgaon)Enjoy Night ≽ 8448380779 ≼ Call Girls In Palam Vihar (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Palam Vihar (Gurgaon)Delhi Call girls
 
Pakistan PMLN Election Manifesto 2024.pdf
Pakistan PMLN Election Manifesto 2024.pdfPakistan PMLN Election Manifesto 2024.pdf
Pakistan PMLN Election Manifesto 2024.pdfFahimUddin61
 

Kürzlich hochgeladen (20)

THE OBSTACLES THAT IMPEDE THE DEVELOPMENT OF BRAZIL IN THE CONTEMPORARY ERA A...
THE OBSTACLES THAT IMPEDE THE DEVELOPMENT OF BRAZIL IN THE CONTEMPORARY ERA A...THE OBSTACLES THAT IMPEDE THE DEVELOPMENT OF BRAZIL IN THE CONTEMPORARY ERA A...
THE OBSTACLES THAT IMPEDE THE DEVELOPMENT OF BRAZIL IN THE CONTEMPORARY ERA A...
 
₹5.5k {Cash Payment} Independent Greater Noida Call Girls In [Delhi INAYA] 🔝|...
₹5.5k {Cash Payment} Independent Greater Noida Call Girls In [Delhi INAYA] 🔝|...₹5.5k {Cash Payment} Independent Greater Noida Call Girls In [Delhi INAYA] 🔝|...
₹5.5k {Cash Payment} Independent Greater Noida Call Girls In [Delhi INAYA] 🔝|...
 
Group_5_US-China Trade War to understand the trade
Group_5_US-China Trade War to understand the tradeGroup_5_US-China Trade War to understand the trade
Group_5_US-China Trade War to understand the trade
 
Lorenzo D'Emidio_Lavoro sullaNorth Korea .pptx
Lorenzo D'Emidio_Lavoro sullaNorth Korea .pptxLorenzo D'Emidio_Lavoro sullaNorth Korea .pptx
Lorenzo D'Emidio_Lavoro sullaNorth Korea .pptx
 
BDSM⚡Call Girls in Sector 143 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 143 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 143 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 143 Noida Escorts >༒8448380779 Escort Service
 
Embed-2 (1).pdfb[k[k[[k[kkkpkdpokkdpkopko
Embed-2 (1).pdfb[k[k[[k[kkkpkdpokkdpkopkoEmbed-2 (1).pdfb[k[k[[k[kkkpkdpokkdpkopko
Embed-2 (1).pdfb[k[k[[k[kkkpkdpokkdpkopko
 
05052024_First India Newspaper Jaipur.pdf
05052024_First India Newspaper Jaipur.pdf05052024_First India Newspaper Jaipur.pdf
05052024_First India Newspaper Jaipur.pdf
 
Powerful Love Spells in Phoenix, AZ (310) 882-6330 Bring Back Lost Lover
Powerful Love Spells in Phoenix, AZ (310) 882-6330 Bring Back Lost LoverPowerful Love Spells in Phoenix, AZ (310) 882-6330 Bring Back Lost Lover
Powerful Love Spells in Phoenix, AZ (310) 882-6330 Bring Back Lost Lover
 
Verified Love Spells in Little Rock, AR (310) 882-6330 Get My Ex-Lover Back
Verified Love Spells in Little Rock, AR (310) 882-6330 Get My Ex-Lover BackVerified Love Spells in Little Rock, AR (310) 882-6330 Get My Ex-Lover Back
Verified Love Spells in Little Rock, AR (310) 882-6330 Get My Ex-Lover Back
 
Enjoy Night⚡Call Girls Rajokri Delhi >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Rajokri Delhi >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Rajokri Delhi >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Rajokri Delhi >༒8448380779 Escort Service
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
 
AI as Research Assistant: Upscaling Content Analysis to Identify Patterns of ...
AI as Research Assistant: Upscaling Content Analysis to Identify Patterns of ...AI as Research Assistant: Upscaling Content Analysis to Identify Patterns of ...
AI as Research Assistant: Upscaling Content Analysis to Identify Patterns of ...
 
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...
 
BDSM⚡Call Girls in Indirapuram Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Indirapuram Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Indirapuram Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Indirapuram Escorts >༒8448380779 Escort Service
 
Busty Desi⚡Call Girls in Sector 62 Noida Escorts >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Sector 62 Noida Escorts >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Sector 62 Noida Escorts >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Sector 62 Noida Escorts >༒8448380779 Escort Service
 
Embed-4.pdf lkdiinlajeklhndklheduhuekjdh
Embed-4.pdf lkdiinlajeklhndklheduhuekjdhEmbed-4.pdf lkdiinlajeklhndklheduhuekjdh
Embed-4.pdf lkdiinlajeklhndklheduhuekjdh
 
Gujarat-SEBCs.pdf pfpkoopapriorjfperjreie
Gujarat-SEBCs.pdf pfpkoopapriorjfperjreieGujarat-SEBCs.pdf pfpkoopapriorjfperjreie
Gujarat-SEBCs.pdf pfpkoopapriorjfperjreie
 
TDP As the Party of Hope For AP Youth Under N Chandrababu Naidu’s Leadership
TDP As the Party of Hope For AP Youth Under N Chandrababu Naidu’s LeadershipTDP As the Party of Hope For AP Youth Under N Chandrababu Naidu’s Leadership
TDP As the Party of Hope For AP Youth Under N Chandrababu Naidu’s Leadership
 
Enjoy Night ≽ 8448380779 ≼ Call Girls In Palam Vihar (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Palam Vihar (Gurgaon)Enjoy Night ≽ 8448380779 ≼ Call Girls In Palam Vihar (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Palam Vihar (Gurgaon)
 
Pakistan PMLN Election Manifesto 2024.pdf
Pakistan PMLN Election Manifesto 2024.pdfPakistan PMLN Election Manifesto 2024.pdf
Pakistan PMLN Election Manifesto 2024.pdf
 

Statistics for CEGEP Biology

  • 1. Making sense out of data (aka doing statistics)
  • 3. Who I am and what I do Corey Chivers PhD Student in Biology at McGill I study biological invasions using statistics
  • 4. What is a Statistician?
  • 5. What is a Statistician? A statistician is someone who:
  • 6. What is a Statistician? A statistician is ● Turns data into insights. someone who:
  • 7. What is a Statistician? A statistician is ● Turns data into insights. someone who: ● Answers questions about the world.
  • 8. What is a Statistician? var iat A statistician is ● Turns data into insights. io n i n someone who: ● Answers questions about the world.
  • 9. What is a Statistician? var iat A statistician is ● Turns data into insights. io n i n someone who: ● Answers questions about the world. ● Isn't fun to talk to at a party?
  • 14. Portrait of a Statistician
  • 15. Portrait of a Statistician ?
  • 16. Portrait of a Statistician The cool kids are calling themselves Data Scientists
  • 17. Portrait of a Statistician The cool kids are calling themselves Data Scientists Name: Hilary Mason Title: Chief Data Scientist at bit.ly member of Mayor Bloomberg’s Technology and Innovation Advisory Council From her web bio: “I <3 data and cheeseburgers.”
  • 18. What do you know about statistics? ● On a piece of paper, make a list of all the words you know about statistics. ● I'll start: – Average (mean) – Variance – Normal distribution – ...
  • 19. Despite how exciting we are, statisticians always start by assuming the world is boring The Null Hypothesis, or Ho is this boring world.
  • 20. Despite how exciting we are, statisticians always start by assuming the world is boring The Null Hypothesis, or Ho is this boring world. Usually something like “there is no effect of caption size on the lulzyness of LOLcats”
  • 21. Looking for evidence against the Null Hypothesis ● The alternative hypothesis (Ha) is that something interesting is going on. – Ex: “Bigger captions are, on average, funnier” ● How would we know?
  • 22. Looking for evidence against the Null Hypothesis ● The alternative hypothesis (Ha) is that something interesting is going on. – Ex: “Bigger captions are, on average, funnier” ● How would we know? ● To the internetz!
  • 23. Collect some sample data! Small caption, fairly humourous Big caption, quite funny Small caption, funny-ish Big caption, peed in pants a little
  • 24. Dealing with variability Some small caption images are funny, and some large caption images are not funny. There is variance in the data. But we want to know if there is a difference on average. We'll need to take variance into account.
  • 25. Descriptive Statistics Measures of Variability Variance Standard Deviation √ n n ∑ ( xi − ̄) x 2 ∑ (x i − ̄ ) x 2 2 i=1 s= s= i=1 n−1 n−1 Where xi = the ith value of a distribution n = number of values in the sample x = sample mean
  • 26. Descriptive Statistics Measures of Variability 1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 8, 9 n Variance and Standard Deviation ∑ ( x i − ̄ )2 x 2 i=1 Therefore, variance of our dist’n (w/ mean = 5): s= n−1 Step 1 Step 2 Step 3 1-5 = -4 -42 = 16 16 + 9 + 4 + 4 + […] + 16 = 72 2-5 = -3 -32 = 9 3-5 = -2 -22 = 4 Step 4 (Variance) 3-5 = -2 -22 = 4 72/18 = 4 […] […] 9-5 = 4 42 = 16 Step 5 (Std Deviation) √4 = 2
  • 27. Your turn Calculate the variance of the heights in your group. n 1) Write down your heights (xi) ∑ ( x i − ̄ )2 x 2) Calculate the average (Σxi / n) 3) Subtract the average for each 2 i=1 s= height and square it n−1 4) Add them all up and divide by n-1
  • 29. Measures of Central Tendency Calculating the Mean Using the following distribution of values: 1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 8, 9 (Arithmetic) Mean – the average of a distribution of values n ∑ xi or Sum of values in dist’n x i =1 Number of values in dist’n ̄ = n−1 1+2+3+3+4+4+4+5+5+5+5+5+6+6+6+7+7+8+9 19 =5
  • 30. Could the difference be due to chance? Remember, we started by assuming that there was no difference (the Null Hypothesis). If the Null Hypothesis is true, what are the chances that we observed this amount of difference between groups? How do we decide whether the difference is due to chance or not? By vote???
  • 31. A better way: (formal) Hypothesis testing ● Determine in advance the level of error you are willing to put up with. – We cannot avoid the chance of errors, but we can decide how often we are willing to have them happen. ● Biologist like to use 0.05 (a 1 in 20 chance). ● We call this α (alpha)
  • 32. A better way: (formal) Hypothesis testing ● Determine in advance the level of error you are willing to put up with. – We cannot avoid the chance of errors, but we can decide how often we are willing to have them happen. ● Biologist like to use 0.05 (a 1 in 20 chance). ● We call this α (alpha) Ronald Fisher: The man behind the idea of NHST
  • 33. A better way: (formal) Hypothesis testing ● Calculate how likely your data set is if the null were true. ● If it is less than α, we say that we reject the null hypothesis. ● If we reject the null, we say the results are statistically significant.
  • 34. A better way: (formal) Hypothesis testing ● Calculate how likely your data set is if the null were true. ● If it is less than α, we say that we reject the null hypothesis. ● If we reject the null, we say the results are statistically significant. “The world is not boring afterall!”
  • 35. Lets do it! ● To calculate how likely it is that our data is from the null hypothesis (ie difference is due to chance), we need a statistic. ● But first, some Beer!
  • 36. Student's t-test ● William Sealy Gosset figured out how to test if a batch of beer was significantly different than the standard. While working for the Guinness brewing company, he was forbidden to publish academic research, so published his method under the pseudonym 'student'.
  • 37. Student's t-test The t-value is calculated using the following equation: X 1− X 2 ̄ ̄ t= √ 2 2 s s 1 2 + n1 n2 Where x 1 and x 2 are the means of the experimental and control groups; S12 and S22 are the variances of the experimental and control groups; n1 and n2 are the sample sizes for the experimental and control groups.
  • 38. Student's t-test The t-value is calculated using the following equation: X 1− X 2 ̄ ̄ t= √ 2 2 s s 1 2 + n1 n2 Where x 1 and x 2 are the means of the experimental and control groups; S12 and S22 are the variances of the experimental and control groups; n1 and n2 are the sample sizes for the experimental and control groups.
  • 39. t-test State your alpha level α = 0.05 If the t-test detects a difference between the means, there is a 5% chance that this conclusion is incorrect.
  • 40. Calculating your t-value Generic-brand Name-brand (Group 2) (Group 1) Mean # of chips x 2 = 11.2 x 1 = 15.3 Standard S2 = 4.3 S1 = 2.4 Deviation n (sample size) n1 = 3 n2 = 3 X 1− X 2 ̄ ̄ t= According to the data above: √ 2 2 s1 s 2 calculated t = 1.4 + n1 n2
  • 41. Alternate Hypothesis You can only test ONE possible alternate hypothesis at any one time. The one chosen depends on what you are looking to find. Alternative hypothesis: 2 types 2-tailed Non-directional (general): not specifying a direction. “The two groups are not the same” 1-tailed Directional (specific): specify direction “Group A is greater than group B.”
  • 42. Look up the Critical t-value In order to find your critical t-value, you need 3 pieces of information: 1. Whether the alternate hypothesis is 1- or 2-tailed 2. Alpha level (usually = 0.05) 3. Degrees of freedom (df = n-1) Calculating degrees of freedom (df) Degrees of Freedom = n-1 What if you have 2 different sample sizes (n1 and n2)… which do you pick to calculate your degrees of freedom? A: df = the smallest of : (n1-1) or (n2 –1)
  • 43. Looking up your Critical t-value
  • 44. Compare your ‘calculated’ t- value with your ‘critical’ t-value It is the difference in values between the t-value and critical t that will determine whether you can reject or fail to reject your null hypothesis a) If ‘calculated’ > ‘critical’, then: reject null hyp. “My observed data are really unlikely under the null hypothesis, therefore I reject the null hypothesis!” b) If ‘calculated’ < ‘critical’, then: do NOT reject null hyp. “My observed data are consistent with the null hypothesis, therefore I have no reason to believe that it is not true.”
  • 45.
  • 46. What if we are measuring a category, rather than a number? ● The t-test lets us compare the value of some attribute between two groups. – Do mutant fruit flies live longer than wild type? – Does IQ differ between Dawson and Laurier students? – Does drug x decrease blood pressure? ● The dependent variable is quantitative: – Life span – IQ – Blood pressure
  • 47. What if we are measuring a category, rather than a number? ● Chi-squared test lets us test hypotheses about categories. – Are there more cars of a certain colour getting speeding tickets? – Is the ratio of dominant to recessive phenotypes 3:1? – Do chromosomes assort independently? ● The dependent variable is categorical: – Car colour – Phenotype – Chromosome donor
  • 48. Chi-square or T-test??? How do you know which one you need? T-Test Chi-square Test • the dependent variable is • the dependent variable is quantitative qualitative (aka. Nominal data) (e.g. height, weight, etc.) (e.g. gender, colour, etc.) • data can be organized as two • data can be easily tabulated as lists of numbers counts: Example: Room Cold Example: temp temp (bpm) (bpm) Male 98 178 86 Femal 102 e 169 89 192 55 (dependent variable: gender) (dependent variable: heart rate)
  • 49. Steps to performing a chi-square test 1. State your null hypothesis 2. State your alternate hypothesis 3. State your alpha level (usually α = 0.05) 4. Calculate your ‘calculated chi-square value’ 5. Look up your ‘critical chi-square value’ (from chi-square table) 6. Compare your ‘calculated’ and ‘critical’ values a) If ‘calculated’ > ‘critical’, conclusion: reject null hyp. b) If ‘calculated’ < ‘critical’, conclusion: do NOT reject null hyp. 7. State your conclusion
  • 50. Sample hypotheses for chi-square Sex ratio in our class Null hypothesis 1. There is no difference between the frequency of men and women in the class ____________________________ 2. There is a difference between the frequency of men Alternative hypothesis and women in the class Chi-square can only test non- directional alt. hyp.
  • 51. Calculating Chi-square ‘Calculated’ chi-square values are calculated using the following formula: O = observed E = expected Calculating the chi-square is easier using the following table: Gender O E O-E (O-E)2 (O-E)2 E Female Male χ2 = sum of last column =
  • 52. Looking up the Critical χ2 To find the critical χ2 , you need the alpha level and the df. Df for a χ2 test = (# of categories) – 1 In our example, df = 2-1 = 1
  • 53. Compare your ‘calculated’ chi- sq with your ‘critical’ chi-sq It is the difference between the calculated chi-sq and critical chi-sq that will determine whether you can reject or fail to reject your null hypothesis a) If ‘calculated’ > ‘critical’, then: reject null hyp. “My observed data are really unlikely under the null hypothesis, therefore I reject the null hypothesis!” b) If ‘calculated’ < ‘critical’, then: do NOT reject null hyp. “My observed data are consistent with the null hypothesis, therefore I have no reason to believe that it is not true.”
  • 54. Statistics just might save your life
  • 55. Questions for Corey ● You can email me! Corey.chivers@mail.mcgill.ca ● I blog about statistics: bayesianbiologist.com ● I tweet about statistics: @cjbayesian