Statistics basics

STATISTICS BASICS
BBA CENTERED
yash sadrani
RK UNIVERSITY RAJKOT

1
CONTENTS
1. Hypothesis
2. Null hypothesis
3. Regression
4. cor•re•la•tion
5. Exponential Distribution
6. Alternative hypothesis
7. Central tendency
8. Central tendency
9. Bayes' theorem
10. Chebyshev’s Theorem
11. Simple random sampling
12. Descriptive statistics
13. Statistical inference
14. Characteristics of good estimator
15. properties of the test for
independence

2
16. Utility of regression studies
17. Advantages of Sample Surveys
18. The hypergeometric distribution
19. leptokurtic distribution
20. the interquartile range

3
Hypothsys
When a possible correlation or similar relation between phenomena is investigated, such as, for
example, whether a proposed remedy is effective in treating a disease, that is, at least to some extent
and for some patients, the hypothesis that a relation exists cannot be examined the same way one
might examine a proposed new law of nature: in such an investigation a few cases in which the tested
remedy shows no effect do not falsify the hypothesis. Instead, statistical tests are used to determine
how likely it is that the overall effect would be observed if no real relation as hypothesized exists. If that
likelihood is sufficiently small (e.g., less than 1%), the existence of a relation may be assumed.
Otherwise, any observed effect may as well be due to pure chance.
In statistical hypothesis testing two hypotheses are compared, which are called the null hypothesis and
the alternative hypothesis. The null hypothesis is the hypothesis that states that there is no relation
between the phenomena whose relation is under investigation, or at least not of the form given by the
alternative hypothesis. The alternative hypothesis, as the name suggests, is the alternative to the null
hypothesis: it states that there is some kind of relation. The alternative hypothesis may take several
forms, depending on the nature of the hypothesized relation; in particular, it can be two-sided (for
example: there is some effect, in a yet unknown direction) or one-sided (the direction of the
hypothesized relation, positive or negative, is fixed in advance).
Conventional significance levels for testing the hypotheses are .10, .05, and .01. Whether the null
hypothesis is rejected and the alternative hypothesis is accepted, all must be determined in advance,
before the observations are collected or inspected. If these criteria are determined later, when the data
to be tested is already known, the test is invalid.
It is important to mention that the above procedure is actually dependent on the number of the
participants (units or sample size) that is included in the study. For instance, the sample size may be too
small to reject a null hypothesis and, therefore, is recommended to specify the sample size from the
beginning. It is advisable to define a small, medium and large effect size for each of a number of the
important statistical tests which are used to test the hypotheses.
A statistical hypothesis test is a method of statistical inference using data from a scientific study. In
statistics, a result is called statistically significant if it has been predicted as unlikely to have occurred by
chance alone, according to a pre-determined threshold probability, the significance level. The phrase
"test of significance" was coined by statistician Ronald Fisher.[1] These tests are used in determining
what outcomes of a study would lead to a rejection of the null hypothesis for a pre-specified level of
significance; this can help to decide whether results contain enough information to cast doubt

4
onconventional wisdom, given that conventional wisdom has been used to establish the null hypothesis.
The critical region of a hypothesis test is the set of all outcomes which cause the null hypothesis to be
rejected in favor of the alternative hypothesis. Statistical hypothesis testing is sometimes called
confirmatory data analysis, in contrast to exploratory data analysis, which may not have pre-specified
hypotheses.
Example 1 – Philosopher's beans
The following example was produced by a philosopher describing scientific methods generations before
hypothesis testing was formalized and popularized.
Few beans of this handful are white.
Most beans in this bag are white.
Therefore: Probably, these beans were taken from another bag.
This is an hypothetical inference.
The beans in the bag are the population. The handful are the sample. The null hypothesis is that the
sample originated from the population. The criterion for rejecting the null-hypothesis is the "obvious"
difference in appearance (an informal difference in the mean). The interesting result is that
consideration of a real population and a real sample produced an imaginary bag. The philosopher was
considering logic rather than probability. To be a real statistical hypothesis test, this example requires
the formalities of a probability calculation and a comparison of that probability to a standard.
A simple generalization of the example considers a mixed bag of beans and a handful that contain either
very few or very many white beans. The generalization considers both extremes. It requires more
calculations and more comparisons to arrive at a formal answer, but the core philosophy is unchanged;
If the composition of the handful is greatly different that of the bag, then the sample probably
originated from another bag. The original example is termed a one-sided or a one-tailed test while the
generalization is termed a two-sided or two-tailed test.

5
Null hypothesis
In statistical inference of observed data of a scientific experiment, the null hypothesis refers to a general
or default position: that there is no relationship between two measured phenomena,[1] or that a
potential medical treatment has no effect.[2] Rejecting or disproving the null hypothesis – and thus
concluding that there are grounds for believing that there is a relationship between two phenomena or
that a potential treatment has a measurable effect – is a central task in the modern practice of science,
and gives a precise sense in which a claim is capable of being proven false.
In statistical significance, the null hypothesis is often denoted H0 (read “H-naught”), is generally
assumed true until evidence indicates otherwise (e.g., H0: μ = 500 hours). The concept of a null
hypothesis is used differently in two approaches to statistical inference, though, problematically, the
same term is used. In the significance testing approach of Ronald Fisher, a null hypothesis is potentially
rejected or disproved on the basis of data that is significantly under its assumption, but never accepted
or proved. In the hypothesis testing approach of Jerzy Neyman and Egon Pearson, a null hypothesis is
contrasted with an alternative hypothesis, and these are decided between on the basis of data, with
certain error rates. These two approaches criticized each other, though today a hybrid approach is
widely practiced and presented in textbooks. This hybrid is in turn criticized as incorrect and incoherent
– see statistical hypothesis testing. Statistical significance plays a pivotal role in statistical hypothesis
testing where it is used to determine if a null hypothesis can be rejected or retained.
regression
In statistics, regression analysis is a statistical process for estimating the relationships among variables. It
includes many techniques for modeling and analyzing several variables, when the focus is on the
relationship between a dependent variable and one or more independent variables. More specifically,
regression analysis helps one understand how the typical value of the dependent variable (or 'Criterion
Variable') changes when any one of the independent variables is varied, while the other independent
variables are held fixed. Most commonly, regression analysis estimates the conditional expectation of
the dependent variable given the independent variables – that is, the average value of the dependent
variable when the independent variables are fixed. Less commonly, the focus is on a quantile, or other
location parameter of the conditional distribution of the dependent variable given the independent
variables. In all cases, the estimation target is a function of the independent variables called the
regression function. In regression analysis, it is also of interest to characterize the variation of the
dependent variable around the regression function which can be described by a probability distribution.
1. (Psychology) psychol the adoption by an adult or adolescent of behaviour more appropriate to a child,
esp as a defence mechanism to avoid anxiety
2. (Statistics) statistics

6
a. the analysis or measure of the association between one variable (the dependent variable) and one or
more other variables (the independent variables), usually formulated in an equation in which the
independent variables have parametric coefficients, which may enable future values of the dependent
variable to be predicted
b. (as modifer): regression curve.
3. (Astronomy) astronomy the slow movement around the ecliptic of the two points at which the
moon's orbit intersects the ecliptic. One complete revolution occurs about every 19 years
4. (Geological Science) geology the retreat of the sea from the land
5. (Statistics) the act of regressing
6. (Logic) the act of regressing
cor·re·la·tion
In statistics, dependence is any statistical relationship between two random variables or two sets of
data. Correlation refers to any of a broad class of statistical relationships involving dependence.
Familiar examples of dependent phenomena include the correlation between the physical statures of
parents and their offspring, and the correlation between the demand for a product and its price.
Correlations are useful because they can indicate a predictive relationship that can be exploited in
practice. For example, an electrical utility may produce less power on a mild day based on the
correlation between electricity demand and weather. In this example there is a causal relationship,
because extreme weather causes people to use more electricity for heating or cooling; however,
statistical dependence is not sufficient to demonstrate the presence of such a causal relationship (i.e.,
correlation does not imply causation).
1. A causal, complementary, parallel, or reciprocal relationship, especially a structural, functional, or
qualitative correspondence between two comparable entities: a correlation between drug abuse and
crime.
2. Statistics The simultaneous change in value of two numerically valued random variables: the positive
correlation between cigarette smoking and the incidence of lung cancer; the negative correlation
between age and normal vision.
3. An act of correlating or the condition of being correlated.

7
Exponential Distribution
The Exponential distribution is used to describe survival times.
Suppose that some device has the same hazard rate λ at each moment. The survival time is therefore
*1/λ+ on average.
Let the Random Variable X denote the time of failure. X then follows the Exponential distribution with
parameter λ. The Probability Density Function of X is f X x = , λ exp − λ x i f x ≥ 0 0 o t h e r w i s e .
The Expected Value of X is 1/λ and the Variance is 1/λ2.
example:
A man enters a bank at 4pm. There is one person in front of him in the queue. Suppose that the length
of time an individual should spend with a teller is an exponential random variable with mean 7 minutes.
Let X be the length of time the man in front spends with the teller. λ=*1/7+ therefore X~Ex(*1/7+. The
probability that the man who entered the bank at 4pm has to wait for more than 10 minutes to be
served is P X > 10 = 1 − F X 10 , where F X 10 is the ,cumulative distribution function- of the exponential
distribution evaluated at t=10. The cumulative distribution of the exponential distribution is F X t = ∫ 0 t λ
exp − λ x d x = * − exp − λ x + 0 t = 1 − exp − λ t
The probability that the man has to wait more than 10 minutes is therefore
1 − 1 − exp − λ t = exp -10 7 ≈ 0.240
Type I and type II errors
In statistics, a null hypothesis is a statement that the thing being studied produces no effect or makes no
difference. An example of a null hypothesis is the statement "This diet has no effect on people's weight."
Usually an experimenter frames a null hypothesis with the intent of rejecting it: that is, intending to run
an experiment which produces data that shows that the thing under study does make a difference.

8
A type I error (or error of the first kind) is the incorrect rejection of a true null hypothesis. It is a false
positive. Usually a type I error leads one to conclude that a supposed effect or relationship exists when
in fact it doesn't. Examples of type I errors include a test that shows a patient to have a disease when in
fact the patient does not have the disease, a fire alarm going off indicating a fire when in fact there is no
fire or an experiment indicating that a medical treatment should cure a disease when in fact it does not
A type II error (or error of the second kind) is the failure to reject a false null hypothesis. It is a false
negative. Examples of type II errors would be a blood test failing to detect the disease it was designed to
detect, in a patient who really has the disease; a fire breaking out and the fire alarm does not ring or a
clinical trial of a medical treatment failing to show that the treatment works when really it does.
Alternative hypothesis
In statistical hypothesis testing, the alternative hypothesis (or maintained hypothesis or research
hypothesis) and the null hypothesis are the two rival hypotheses which are compared by a statistical
hypothesis test. An example might be where water quality in a stream has been observed over many
years and a test is made of the null hypothesis that there is no change in quality between the first and
second halves of the data against the alternative hypothesis that the quality is poorer in the second half
of the record.
Central tendency
In statistics, a central tendency (or, more commonly, a measure of central tendency) is a central value or
a typical value for a probability distribution.[1] It is occasionally called an average or just the center of
the distribution. The most common measures of central tendency are the arithmetic mean, the median
and the mode. A central tendency can be calculated for either a finite set of values or for a theoretical
distribution, such as the normal distribution. Occasionally authors use central tendency (or centrality) to
mean "the tendency of quantitative data to cluster around some central value". [2][3] This meaning
might be expected from the usual dictionary definitions of the words tendency and centrality. Those
authors may judge whether data has a strong or a weak central tendency based on the statistical
dispersion, as measured by the standard deviation or something similar.

9
Bayes' theorem
In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule) is a result
that is of importance in the mathematical manipulation of conditional probabilities. It is a result that
derives from the more basic axioms of probability.
When applied, the probabilities involved in Bayes' theorem may have any of a number of probability
interpretations. In one of these interpretations, the theorem is used directly as part of a particular
approach to statistical inference. ln particular, with the Bayesian interpretation of probability, the
theorem expresses how a subjective degree of belief should rationally change to account for evidence:
this is Bayesian inference, which is fundamental to Bayesian statistics. However, Bayes' theorem has
applications in a wide range of calculations involving probabilities, not just in Bayesian inference.
An Introduction to Bayes' Theorem
Bayes' Theorem is a theorem of probability theory originally stated by the Reverend Thomas Bayes. It
can be seen as a way of understanding how the probability that a theory is true is affected by a new
piece of evidence. It has been used in a wide variety of contexts, ranging from marine biology to the
development of "Bayesian" spam blockers for email systems. In the philosophy of science, it has been
used to try to clarify the relationship between theory and evidence. Many insights in the philosophy of
science involving confirmation, falsification, the relation between science and pseudosience, and other
topics can be made more precise, and sometimes extended or corrected, by using Bayes' Theorem.
These pages will introduce the theorem and its use in the philosophy of science.

10
Chebyshev’s Theorem:
For any set of data (either population or sample) and for any constant k greater than 1, the proportion
of the data that must lie within k standard deviations on either side of the mean is at least
1 - _1_
k2
In ordinary words, Chebyshev’s Theorem says the following about sample or population data:
1) Start at the mean.
2) Back off k standard deviations below the mean and then advance k standard deviations
above the mean.
3) The fractional part of the data in the interval described will be at least 1 – 1/k2
(we assume k
> 1).
Simple random sampling
In a simple random sample (SRS) of a given size, all such subsets of the frame are given an equal
probability. Furthermore, any given pair of elements has the same chance of selection as any other such
pair (and similarly for triples, and so on). This minimises bias and simplifies analysis of results. In
particular, the variance between individual results within the sample is a good indicator of variance in
the overall population, which makes it relatively easy to estimate the accuracy of results.
However, SRS can be vulnerable to sampling error because the randomness of the selection may result
in a sample that doesn't reflect the makeup of the population. For instance, a simple random sample of
ten people from a given country will on average produce five men and five women, but any given trial is
likely to overrepresent one sex and underrepresent the other. (Systematic and stratified techniques),
attempt to overcome this problem by "using information about the population" to choose a more
"representative" sample.

11
One of the best ways to achieve unbiased results in a study is through random sampling. Random
sampling includes choosing subjects from a population through unpredictable means. In its simplest
form, subjects all have an equal chance of being selected out of the population being researched.
a method of selecting a sample (random sample) from a statistical population in such a way that every
possible sample that could be selected has a predetermined probability of being selected.
What is the difference between coefficient of
determination, and coefficient of correlation?
Coefficient of correlation is “R” value which is given in the summary table in the Regression output. R
square is also called coefficient of determination. Multiply R times R to get the R square value. In other
words Coefficient of Determination is the square of Coefficeint of Correlation.
R square or coeff. of determination shows percentage variation in y which is explained by all the x
variables together. Higher the better. It is always between 0 and 1. It can never be negative – since it is a
squared value.
It is easy to explain the R square in terms of regression. It is not so easy to explain the R in terms of
regression.
Coefficient of Correlation is the R value i.e. .850 (or 85%). Coefficient of Determination is the R square
value i.e. .723 (or 72.3%). R square is simply square of R i.e. R times R.

12
Coefficient of Correlation: is the degree of relationship between two variables say x and y. It can go
between -1 and 1. 1 indicates that the two variables are moving in unison. They rise and fall together
and have perfect correlation. -1 means that the two variables are in perfect opposites. One goes up and
other goes down, in perfect negative way. Any two variables in this universe can be argued to have a
correlation value. If they are not correlated then the correlation value can still be computed which
would be 0. The correlation value always lies between -1 and 1 (going thru 0 – which means no
correlation at all – perfectly not related). Correlation can be rightfully explalined for simple linear
regression – because you only have one x and one y variable. For multiple linear regression R is
computed, but then it is difficult to explain because we have multiple variables invovled here. Thats why
R square is a better term. You can explain R square for both simple linear regressions and also for multip
le linear regressions.
Descriptive statistics
Descriptive statistics is the discipline of quantitatively describing the main features of a collection of
information,[1] or the quantitative description itself. Descriptive statistics are distinguished from
inferential statistics (or inductive statistics), in that descriptive statistics aim to summarize a sample,
rather than use the data to learn about the population that the sample of data is thought to represent.
This generally means that descriptive statistics, unlike inferential statistics, are not developed on the
basis of probability theory.[2] Even when a data analysis draws its main conclusions using inferential
statistics, descriptive statistics are generally also presented. For example in a paper reporting on a study
involving human subjects, there typically appears a table giving the overall sample size, sample sizes in
important subgroups (e.g., for each treatment or exposure group), and demographic or clinical
characteristics such as the average age, the proportion of subjects of each sex, and the proportion of
subjects with related comorbidities.
Some measures that are commonly used to describe a data set are measures of central tendency and
measures of variability or dispersion. Measures of central tendency include the mean, median and
mode, while measures of variability include the standard deviation (or variance), the minimum and
maximum values of the variables, kurtosis and skewness.[3]

13
Statistical inference
In statistics, statistical inference is the process of drawing conclusions from data that are subject to random
variation, for example, observational errors or sampling variation.[1] More substantially, the terms statistical
inference, statistical induction and inferential statistics are used to describe systems of procedures that can be
used to draw conclusions from datasets arising from systems affected by random variation,[2] such as
observational errors, random sampling, or random experimentation.[1] Initial requirements of such a system of
procedures for inference and induction are that the system should produce reasonable answers when applied to
well-defined situations and that it should be general enough to be applied across a range of situations. Inferential
statistics are used to test hypotheses and make estimations using sample data.
The outcome of statistical inference may be an answer to the question "what should be done next?", where this
might be a decision about making further experiments or surveys, or about drawing a conclusion before
implementing some organizational or governmental policy.
Chrachteristics of good estimator
There are four main properties associated with a "good" estimator. These are:
1) Unbiasedness: the expected value of the estimator (or the mean of the estimator) is simply the figure being
estimated.In statistical terms, E(estimate of Y) = Y.
2) Consistency: the estimator converges in probability with the estimated figure. In other words, as the sample size
approaches the population size, the estimator gets closer and closer to the estimated.
3) Efficiency: The estimator has a low variance, usually relative to other estimators, which is called relative
efficiency. Otherwise, the variance of the estimator is minimized.
4) Robustness: The mean-squared errors of the estimator are minimized relative to other estimators.The estimator
should be unbiased and consistent
Stats: Test for Independence
In the test for independence, the claim is that the row and column variables are independent of each other. This is
the null hypothesis.
The multiplication rule said that if two events were independent, then the probability of both occurring was the
product of the probabilities of each occurring. This is key to working the test for independence. If you end up

14
rejecting the null hypothesis, then the assumption must have been wrong and the row and column variable are
dependent. Remember, all hypothesis testing is done under the assumption the null hypothesis is true.
The test statistic used is the same as the chi-square goodness-of-fit test. The principle behind the test for
independence is the same as the principle behind the goodness-of-fit test. The test for independence is always a
right tail test.
In fact, you can think of the test for independence as a goodness-of-fit test where the data is arranged into table
form. This table is called a contingency table.
The test statistic has a chi-square distribution when the following assumptions are met
The data are obtained from a random sample
The expected frequency of each category must be at least 5.
The following are properties of the test for independence
The data are the observed frequencies.
The data is arranged into a contingency table.
The degrees of freedom are the degrees of freedom for the row variable times the degrees of freedom for the
column variable. It is not one less than the sample size, it is the product of the two degrees of freedom.
It is always a right tail test.
It has a chi-square distribution.
The expected value is computed by taking the row total times the column total and dividing by the grand total
The value of the test statistic doesn't change if the order of the rows or columns are switched.
The value of the test statistic doesn't change if the rows and columns are interchanged (transpose of the matrix)

15
Utility of regression studies
Regression models can be used to help understand and explain relationships among variables; they can also be
used to predict actual outcomes. In this course you will learn how multiple linear regression models are derived,
use software to implement them, learn what assumptions underlie the models, learn how to test whether your
data meet those assumptions and what can be done when those assumptions are not met, and develop strategies
for building and understanding useful models.
Advantages of Sample Surveys
Cost Reduction
In most cases, conducting a sample survey costs less than a census survey
. If fewer people are surveyed, fewer
surveys need to be produced, printed, shipped, administered, and analyzed. Further
, fewer data reports are often
required, thus the amount of time and expense required to analyze and distribute the results reports is reduced.
Generalizability of Results
If conducted properly, the results of a sample survey can still be generalized to the entire population, meaning
that the sample results can be considered representative of the views of the entire target population. Sampling
strategies should be firmly aligned with the overarching survey goals to ensure the utilization of a proper sample
frame and sample size.
Timeliness
Sample surveys can typically be printed, distributed, administered, and analyzed more quickly than census
surveys. As a result, a shorter turnaround time for results is often achieved
Identification of Strengths & Opportunities
As with census surveys, results from a properly conducted sample survey can also be used to identify strengths
and opportunities and develop plans for meaningful change.
Cost: By comparison with a complete enumeration of the same population, a sample may be based on data for
only small number of the units comprising that population. A sample survey may thus be very much less expensive
to conduct than a comparable complete enumeration.

16
Time: Being small in scale, a sample survey is not only less expensive than a census; the desired information is
obtained in much less time.
Scope: The smaller scale is likely to; permit the collection of a wider range of survey data and allow a wide choice
of methods of observations, measurements or questioning than is usually feasible with a complete enumeration.
Respondents Convenience: The sample survey considerably reduces the overall burden of the respondents in the
way that only a few, not all of the individuals in the population are put to the trouble of having to answer
questions or provide information.
Labor: Sampling saves labor. A small staff is required both for fieldwork and for tabulation and processing data.
Flexibility: In certain types of investigation, highly skilled and trained personnel or even specialized equipment are
needed to collect data. A complete enumeration in such cases is impracticable and hence sample surveys, being
more flexible and greater scope, will be more appropriate for this type of inquires.
Data Processing: The data-processing requirement for a sample survey is likely to be much less than for a complete
count. Whereas a complete count may well require a computer to process the data, a sample survey can often be
processed manually with fewer people and less logistic supports.
Accuracy: A sample survey employs personnel of higher quality equipped with intensive training and supervision
that is more careful is possible for fieldwork. As a result, observations, measurements, equipments, or questioning
for a sample survey can often be carried out more carefully and thus yields results subject to similar non-sampling
error than is generally practicable in a more extensive complete enumeration, usually at a much lower cost.
Feasibility: there are situations where complete enumeration is not feasible and thus a survey is necessary. There
is also instance where it is not practicable to enumerate all the units due to their perishable or fragile nature. The
alternative in this situation is to take only a few of the units. For example, consider the problem of checking the
quality of mango juice produced by a local company. One way to test the quality is to drink entire lot, which is
impracticable. Testing of electric bulb, screws, glass, medicine all are example of this type, where sampling is
necessary.

17
The hypergeometric distribution applies to sampling without replacement from a
finite population whose elements can be classified into two mutually exclusive categories like Pass/Fail,
Male/Female or Employed/Unemployed. As random selections are made from the population, each subsequent
draw decreases the population causing the probability of success to change with each draw.
The following conditions characterise the hypergeometric distribution:
The result of each draw can be classified into one or two categories.
The probability of a success changes on each draw.
A random variable X follows the hypergeometric distribution if its probability mass function (pmf) is given by:[1]
P(X=k) = {{{K choose k} {{N-K} choose {n-k}}}over {N choose n}}
Where:
N is the population size
K is the number of success states in the population
n is the number of draws
k is the number of successes
textstyle {a choose b} is a binomial coefficient
The pmf is positive when max(0, n+K-N) leq k leq min(K,n).

18
leptokurtic distribution
In probability theory and statistics, kurtosis (from the Greek word κυρτός, kyrtos or kurtos, meaning curved,
arching) is any measure of the "peakedness" of the probability distribution of a real-valued random variable.[1] In
a similar way to the concept of skewness, kurtosis is a descriptor of the shape of a probability distribution and, just
as for skewness, there are different ways of quantifying it for a theoretical distribution and corresponding ways of
estimating it from a sample from a population. There are various interpretations of kurtosis, and of how particular
measures should be interpreted; these are primarily peakedness (width of peak), tail weight, and lack of shoulders
(distribution primarily peak and tails, not in between).
One common measure of kurtosis, originating with Karl Pearson, is based on a scaled version of the fourth moment
of the data or population, but it has been argued that this really measures heavy tails, and not peakedness.[2] For
this measure, higher kurtosis means more of the variance is the result of infrequent extreme deviations, as
opposed to frequent modestly sized deviations. It is common practice to use an adjusted version of Pearson's
kurtosis, the excess kurtosis, to provide a comparison of the shape of a given distribution to that of the normal
distribution. Distributions with negative or positive excess kurtosis are called platykurtic distributions or leptokurtic
distributions respectively
the interquartile range
the interquartile range (IQR), also called the midspread or middle fifty, is a measure of statistical dispersion, being
equal to the difference between the upper and lower quartiles,*1+*2+ IQR = Q3 − Q1. In other words, the IQR is the
1st Quartile subtracted from the 3rd Quartile; these quartiles can be clearly seen on a box plot on the data. It is a
trimmed estimator, defined as the 25% trimmed mid-range, and is the most significant basic robust measure of
scale.

Statistics basics

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Statistics basics

Ähnlich wie Statistics basics (20)

Mehr von Sadrani Yash

Mehr von Sadrani Yash (12)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Statistics basics