SlideShare ist ein Scribd-Unternehmen logo
1 von 72
Data Analytics
By,
Vrushali Solanke.
Basics of Data Analytics:
 Analytics:
i)It is the systematic computational analysis of data.
ii)It is the discovered , interpretation and communication of meaningful
pattern in a data.
iii)It relies on the simultaneous application of statistics, computer
programming and operation research to quantify the performance.
Data Analytics: It is the science of examine raw data with the purpose of
drawing conclusion .
Data Analytics: It is a process of inspecting, cleansing, transforming, and
modelling data with the goal of discovering useful information, informing
conclusion, and supporting decision making.
Need of Data analytics:
 Data and information are increasing rapidly, so that information available
to us in future is unpredictable.
 It is crucial to integrate this data. If it get wasted, lots of valuable
information will be lost.
 Previously, skilled analyst is required for processing the data; but these
day, massive amount of data processing is not possible for human being.
 So there is a need for the tools which operate at high speed and efficiency
on this data and helps the business for making better decision.
 So, Data Analytics is important.
What is Data Analytics?
 It is the quantitative or qualitative techniques.
 It is the science of drawing insights from raw information source.
 It encompasses many diverse types of data analysis.
 It is primarily conducted in business to consumer (B2C)application.
Why analytics is important?
Data Analysis:
Data analytics vs. Data analysis
Analysis
Explore potential future events
Analysis vs. Analytics
Overview of Data analytics Lifecycles
 Data Analytica is the science of examining raw data with the purpose of drawing
conclusions about the information.
 There are the 6 phases in lifecycle of data analytics:
1. Discovery:
i)The team learn business domain.
ii)The accesses the resources available to support the project in terms of people,
technology, time and data.
iii)Framing the business problem as an analytics challenge that can be addressed in
subsequent phases and formulating initial hypothesis to test and begin learning initial data.
2. Data Preparation :
i)Here team requires analytical sandbox. In which team works with data and perform
analytics in project.
ii)Teams needs to execute Extract, Load, Transform(ETLT) process. Data should
transformed in ETLT process so team can work with it and analyze it.
iii)It include the steps to explore, processes, and condition data prior to modeling and
analytics
3. Model Planning:
i)Here teams determine methods, techniques, and workflow it intends to follow for the
subsequent model building phase.
ii)The team explore data to learn about the relationships between variables.
4. Model Building:
i)In this phase, team develops dataset for testing, training, and production purpose.
ii)Team execute Model based on work done in model planning phase.
iii)Team find out whether existing tools will be sufficient for running the model or if it will
need more robust environment for executing models and workflow.
5. Communicate Results:
i)Here team, in collaboration with stakeholder, determine if the result for the project are
success or failure based on the criteria developed in phase 1.
ii)Team should identify key finding, quantify the business value, and develop a narrative to
summarize and convey findings to stakeholders.
6. Optimization:
i)Team deliver the final report, briefings, code and technical documents.
ii)The team may run a pilot project to implement the model in a production environment.
Importance of Data Analytics for Business;
1. Improving efficiency:
2. Market Understandings:
3. Cost Reduction:
4. Faster and Better Decision Making:
5. New Products/ Services:
6. Industry knowledge:
7. Witnessing the opportunity:
Difference between Data Science and Data Analytics
Sr.
o.
Terms Data Science Data Analytics
1 Scope Macro Micro
2 Focus on Providing strategic actionable
insights into the world
Providing operational observation
into issues
3 Skills required Mathematical, technical and
strategic knowledge is necessary
Data analytics and visualization skills
required.
4 Big data Deal with big data Not necessary to deal with big data
5 Major fields Machine learning, AI, Search
engine engineering, corporate
analytics.
Healthcare, gaming, travel, industries
with immediate data needs.
 What Are Diagnostic Analytics?
 Diagnostic analytics are a form of advanced analytics that focus on explaining why something
has happened based on data analysis. Like a doctor investigating a patient’s symptoms, they aim
to understand the underlying issues and determine why an issue is happening.
 Its capabilities allow users to identify anomalies by highlighting areas that could require further
study, which are pinpointed when trends or data points raise questions that can’t be answered
easily or without digging deeper. Some questions that would have to be addressed with
diagnostic analytics include:
• Why did this marketing campaign fail?
• Why have sales increased without any increased marketing attention for a certain region?
• Why did employee performance fall during this month?
 As well as other questions that have no obvious answer from a single data source.
 Diagnostic analytics offer data discovery, drill-down, data mining and data correlation. Drilling
down into the data allows users to identify potential sources for the anomalies discovered in the
first step. Analysts can use these capabilities to examine patterns both within and external to the
data to draw an informed conclusion. Probability theory, filtering, regression analytics and time-
series data analysis are all useful tools related to diagnostic analytics to facilitate this process.
 What Are Descriptive Analytics?
 It describe the or summarize the raw data and make it something that is interpretable by
humans.
 Simpler way is to define descriptive analytics is ,it answer the question “What has
happened?”
 Descriptive analytics are useful because they allows us to learn from past behaviours and
understand how they might influence future outcomes.
 The main objective of descriptive analytics is to find out the reason behind precious
success or failure in the past.
 Common example is, Descriptive analytics are the reports that provide historical insights
regarding the company’s production, financials, operations, sales, inventory and customers.
 Most of the social analytics are the descriptive analytics. They summarize certain grouping
based on simple counts of events. Like number of followers, likes, post fans .
 What Are Predictive Analytics?
 Predictive and descriptive analytics have oppositional objectives, but they’re very closely
related. This is because you need accurate information about the past to make predictions
for the future. Predictive tools attempt to fill in gaps in the available data. If descriptive
analytics answer the question, “what happened in the past,” predictive analytics answer the
question, “what might happen in the future?”
 Predictive analytics take historical data from various systems and use it to highlight
patterns. Then, algorithms, statistical models and machine learning are employed to
capture the correlations between targeted data sets.
 The most common commercial example is a credit score. Banks uses historical information
to predict whether or not a candidate is likely to keep up with payments. It works in much
the same way for manufacturers, except that they’re usually trying to find out if products
will sell. Predictive analytics focus on the future of the business.
 Predictive analytics can be used through out the organization, from forecasting customer
behavior and purchasing pattern to identify trends in sale activities.
 What Are Prescriptive Analytics?
 Of diagnostic, predictive, descriptive, and prescriptive analytics, the latter is the most
recent addition to the business intelligence landscape. These tools enable companies to
view potential decisions and, based on both current and historical data, follow them
through to a likely outcome. Provide recommendation regarding actions that will take
advantages of the prediction.
 Like predictive analytics, prescriptive analytics won’t be right 100% of the time, because
they work with estimates. However, they provide the best way of “seeing into the future”
and determining the viability of decisions before they’re made.
 The difference between the two is that prescriptive analytics offers opinions as to why a
particular outcome is likely. They can then offer recommendations based on this
information. To achieve this, they use algorithms, machine learning and computational
modeling.
 If predictive analytics answers, “What might happen?” then prescriptive analytics
answers, “What do we have to do to make it happen?” or “How will this action change the
outcome?” Prescriptive deals more with trial and error and has a bit of a hypothesis-testing
nature to it.
 Summary of the Different Types
 Diagnostic analytics ask about the present. They drill down into why something has
happened and helps users diagnose issues.
 Descriptive analytics ask about the past. They want to know what has been happening to
the business and how this is likely to affect future sales.
 Predictive analytics ask about the future. These are concerned with what outcomes can
happen and what outcomes are most likely.
 Finally, prescriptive tools ask about the present’s impact on the future. It wants to know
the best course of action for right now in order to positively impact the future. In other
words, they’re the decision makers.
Statistical Inference:
 Statistical inference is a technique by which you can analyze the result and make
conclusions from the given data to the random variations.
 Statistics can be classified into two different categories. The two different types of
Statistics are: 1. Descriptive Statistics 2. Inferential Statistics In Statistics, descriptive
statistics describe the data, whereas inferential statistics help you make predictions
from the data. In inferential statistics, the data are taken from the sample and allows
you to generalize the population. In general, inference means “guess”, which means
making inference about something
 The purpose of statistics is to describe and predict the information.
 The basic principle of Statistical inference is that conclusion about a population of
interest can be made using information contained in a sample from that population.
 Statistical inference is the procedure through which inference about a population are made
based on certain characteristics calculated from a sample of data drawn from that
population.
 Statistical inference is the process of generating conclusion about a population from a noisy
sample. Without Statistical inference we simply living in data, but with Statistical inference
we are trying to generate knowledge.
Definition of Statistical inference :It is the method of drawing and measuring the reliability of
conclusions about population based on information obtained from a sample of the population.
 Statistical inference can be contrasted with exploratory data analysis.
 Statistical inference requires navigating the set of assumption and tools and subsequently
thinking about how to draw conclusion from data.
 Descriptive statistics :It emphasize the role of population quantities of interest, about
which we wish to draw inference. Descriptive statistics are used as a preliminary steps
before formal inference are drawn. A descriptive statistic is a summary statistic that
quantitatively describes or summarizes features from a collection of information.
 The conclusion of statistical inference is a statistical proposition.
 There are two broad areas of Statistical inference :
1)statistical estimation
2)Statistical hypothesis testing.
1) Statistical estimation: It is concerned with best estimating the value or range of values for
a particular population parameter. There are two types of statistical estimation:
i)Point estimation: Here ,we estimate an unknown parameter using a single number that
is calculated from the sample data. In statistics, point estimation involves the use of sample
data to calculate a single value which is to serve as a "best guess" or "best estimate" of an
unknown population parameter.
ii)Interval estimation: Here, we estimate an unknown parameter using an interval of
values that is likely to contain the true value of that parameter.
Interval estimation, in statistics, the evaluation of a parameter—for example, the mean
(average)—of a population by computing an interval, or range of values, within which the
parameter is most likely to be located.
2)Hypothesis testing: It is concerned with deciding whether the study data are consistent at
some level of agreement with a particular population parameter. In Hypothesis testing we begin
with a claim about the population(called it as Null Hypothesis), and check whether or not the
data obtained from the sample provide evidence against this claim.
Population:
 In statistics as well as in quantitative methodology, the set of data are collected and selected from a
statistical population with the help of some defined procedures. There are two different types of data
sets namely, population and sample. So basically when we calculate the mean deviation, variance
and standard deviation, it is necessary for us to know if we are referring to the entire population or
to only sample data. Suppose the size of the population is denoted by ‘n’ then the sample size of that
population is denoted by n -1. Let us take a look of population data sets and sample data sets in
detail.
 Population : It includes all the elements from the data set and measurable characteristics of the
population such as mean and standard deviation are known as a parameter. For example, All
people living in India indicates the population of India.
 There are different types of population. They are:
• Finite Population
• Infinite Population
• Existent Population
• Hypothetical Population
Let us discuss all the types one by one.
Finite Population
 The finite population is also known as a countable population in which the population can be counted. In other
words, it is defined as the population of all the individuals or objects that are finite. For statistical analysis, the
finite population is more advantageous than the infinite population. Examples of finite populations are
employees of a company, potential consumer in a market.
Infinite Population
 The infinite population is also known as an uncountable population in which the counting of units in the
population is not possible. Example of an infinite population is the number of germs in the patient’s body is
uncountable.
Existent Population
 The existing population is defined as the population of concrete individuals. In other words, the population
whose unit is available in solid form is known as existent population. Examples are books, students etc.
Hypothetical Population
 The population in which whose unit is not available in solid form is known as the hypothetical population. A
population consists of sets of observations, objects etc that are all something in common. In some situations,
the populations are only hypothetical. Examples are an outcome of rolling the dice, the outcome of tossing a
coin.
Sample
It includes one or more observations that are drawn from the population and the measurable characteristic of a
sample is a statistic. Sampling is the process of selecting the sample from the population.
For example, some people living in India is the sample of the population.
Basically, there are two types of sampling. They are:
•Probability sampling
•Non-probability sampling
Probability Sampling
In probability sampling, the population units cannot be selected at the discretion of the researcher. This can be dealt
with following certain procedures which will ensure that every unit of the population consists of one fixed probability
being included in the sample. Such a method is also called random sampling. Some of the techniques used for
probability sampling are:
•Simple random sampling
•Cluster sampling
•Stratified Sampling
•Disproportionate sampling
•Proportionate sampling
•Optimum allocation stratified sampling
•Multi-stage sampling
Non Probability Sampling
In non-probability sampling, the population units can be selected at the discretion of the researcher. Those samples
will use the human judgements for selecting units and has no theoretical basis for estimating the characteristics of
the population. Some of the techniques used for non-probability sampling are
•Quota sampling
•Judgement sampling
•Purposive sampling
Population and Sample Examples
•All the people who have the ID proofs is the population and a group of people who only have
voter id with them is the sample.
•All the students in the class are population whereas the top 10 students in the class are the
sample.
•All the members of the parliament is population and the female candidates present there is the
sample.
Population and Sample Formulas
We will demonstrate here the formulas for mean absolute deviation (MAD), variance and
standard deviation based on population and given sample. Suppose n denotes the
size of the population and n-1 denotes the sample size, then the formulas for mean absolute
deviation, variance and standard deviation are given by;
Comparison Population Sample
Meaning Collection of all the units
or elements that possess
common characteristics
A subgroup of the
members of the
population
Includes Each and every element
of a group
Only includes a handful
of units of population
Characteristics Parameter Statistic
Data Collection Complete enumeration or
census
Sampling or sample
survey
Focus on Identification of the
characteristics
Making inferences about
the population
Difference between Population and Sample
Some of the key differences between population and sample are clearly given below:
Statistical modeling
1.Statistical Model:
 Definition: A statistical model is a mathematical model that embodies a set of statistical
assumptions concerning the generation of sample data (and similar data from a larger population).
 Statistical model is a combination of inference based on collected data and population understanding used
to predict information in an idealized form. This means that a statistical model can be an equation or a
visual representation of information based on research that’s already been collected over time.
 Statistical models are the part of the foundation of statistical inference.
 Essentially, all statistical model exist to find inference between different types of variable and because
there are different types of variable, there are different types of statistical model. Some of the types of
model include regression, analysis of variance, analysis of covariance, and chi-square etc.
2.Statistical Modeling:
 Statistical modeling is an approach to statistical data analysis that helps researchers
discovers something about a phenomenon that is assumed to exist. This approach helps
explain the variability found in the dataset.
 It is a strategy which brings together estimation and hypothesis test under the same
umbrella.
 This modeling approach construct summary model that displays current knowledge. The
model are then “fitted” to data.
 A general modelling framework:
Data= Pattern + Residual
Where, Pattern: Systematic or ‘explained’ variation.
Residuals: Leftover or ‘Unexplained’ variation.
In simple term statistical modelling is a simplified, mathematically formalized way to
approximate reality(i.e. what generate your data)and optionally to make prediction from this
approximation.
Basic steps in statistical model building process are:
1. Model selection: in this step plots of data, process knowledge and assumption about the
process are used to determine the form of the model to be fit to the data.
2. Model fitting: Then using selected model and possibly information about data, an
appropriate model fitting method is used to estimate the unknown parameter in the model.
When parameter estimation have been made, them model is carefully assessed to see if
the underlying assumption of the analysis appear possible.If assumption seems valid ,the
model can be used to answer the scientific questions that promoted modeling effort.
3. Model Validation: If the model validation identifies problem with the current model,
then modeling process is repeated using information from the model validation .
Probability Distribution:
 In Statistics, the probability distribution gives the possibility of each outcome of a
random experiment or events. It provides the probabilities of different possible
occurrence.
 To recall, the probability is a measure of uncertainty of various phenomena. Like, if
you throw a dice, what the possible outcomes of it, is defined by the probability. This
distribution could be defined with any random experiments, whose outcome is not sure
or could not be predicted.
Probability Distribution Definition
 Probability distribution yields the possible outcomes for any random event. It is also
defined based on the underlying sample space as a set of possible outcomes of any
random experiment. These settings could be a set of real numbers or a set of vectors or
set of any entities. It is a part of probability and statistics.
1. Probability:
Probability means possibility. It is a branch of mathematics that deals with the occurrence of a
random event. The value is expressed from zero to one. Probability has been introduced in
Maths to predict how likely events are to happen.
The meaning of probability is basically the extent to which something is likely to happen. This
is the basic probability theory, which is also used in the probability distribution, where you will
learn the possibility of outcomes for a random experiment.
To find the probability of a single event to occur, first, we should know the total number of
possible outcomes.
2. Random experiments: Random experiments are defined as the result of an experiment,
whose outcome cannot be predicted.
Suppose, if we toss a coin, we cannot predict, what outcome it will appear either it will come as
Head or as Tail. The possible result of a random experiment is called an outcome. And the set
of outcomes is called a sample point. With the help of these experiments or events, we can
always create a probability pattern table in terms of variable and probabilities.
Probability of event to happen P(E) = Number of favorable outcomes/Total
Number of outcomes
 3. Sample Space:It is the set of all possible outcomes of a random experiments.
 4. Random Variables
It is the variable whose possible values are numerical outcomes of a random experiment.
P(X) represent the probability of X.
P(X=x) refer to probability that the random variable X is equal to a particular value, denoted by x.
Example, P(X=1) refer to probability that random variable X is equal to 1.
Consider an example ,suppose you flip a coin two times. This simple statistics experiments have 4
possibilities :HH, HT, TH, TT. Now let a variable X represent the number of heads that result from
experiment. The variable X has outcome values 0,1 or 2.
Table represent the probability distribution of a random variable X
Number of Heads Probability
0 0.25
1 0.50
2 0.25
Probability Distribution:
A probability distribution is a function that describes the likelihood of obtaining the possible values
that a random variable can assume.
The probability distribution of a random variable X is define as:
Definition : probability distribution of a random variable X is the system of numbers
X : x1 x2 ……… xn
P(X) : p1 p2 ……… pn
Where ,the real numbers x1,x2,….,xn are the possible values of random variable X. The probability of
random variable X taking the value x i.e. P(X=x)=pi.
P(X)= the likelihood that random variable takes a specific value of x. The sum of all probabilities for
all possible values must be equal to 1.
probability distribution may be either discrete or continuous.
A discrete distribution means that X can assume one of a countable (Finite) number of values.
A continuous distribution means that X can assume one of a uncountable (Infinite) number of
values.
A probability distribution is the function that describes the mapping from any realized value of the
random variable, to probability.
1.Discrete probability distribution: Three frequently used discrete distribution are:
i) The Binomial distribution: is used to compute probabilities for a process where only one of
two possible outcomes may occur on each trial.
Example, Here are some examples of Binomial distribution: Rolling a die: Probability of getting the
number of six (6) (0, 1, 2, 3…50) while rolling a die 50 times; Here, the random variable X is the
number of “successes” that is the number of times six occurs. The probability of getting a six is 1/6.
ii)The geometric distribution: You use this distribution to determine the probability that a
specified number of trails will take place before the first success occurs.
Example, Let’s say, the probability that an athlete achieves a distance of 6m in long jump is 0.7.
Geometric distribution can be used to determine probability of number of attempts that the person will
take to achieve a long jump of 6m. In the second attempt, the probability will be 0.3 * 0.7 = 0.21 and
the probability that the person will achieve in third jump will be 0.3 * 0.3 * 0.7 = 0.063
ii)The Poisson distribution: is used to measure the probability that a given number of events will
occur during given time frame.
Example, Let’s say that the number of buses that come on a bus stop in span of 30 minutes is 1.
Poisson distribution can be used to model the probability of different number of buses, X, coming
to the bus stop within the next 30 minutes where X can take value of 0, 1, 2, 3, 4.
2. Continuous probability distribution:
i)Uniform distribution: In statistics, the uniform distribution is a type of
probability distribution in that all the possible outcomes are equally possible. A deck of
cards has uniform distributions within it since the probability of drawing a heart, club,
diamond or spade is equally possible.
ii)Normal Distribution: The normal distribution is the most important probability
distribution in statistics because it fits many natural phenomena.
For example, heights, blood pressure, measurement error, and IQ scores follow the
normal distribution. It is also known as the Gaussian distribution and the bell curve.
In a normal distribution, data is symmetrically distributed with no skew.
Correlation
 If the change in one variable appears to be accompanied by a change in other variable,
the two variables are said to be correlated and this inter-dependence is called correlation
or co-variation.
 Correlation analysis is a method of statistical evaluation used to study the strength of
relationship between two, numerically measured, continuous variables (e.g. height and
weight) type of analysis is useful when we want to establish if there are possible
connection between variables.
 In short, the tendency of simultaneous variation between two variables is called
correlation or co-variation.
 If correlation is found between two variables it means that when there is a systematic
change in one variable, there is also a systematic change in the other; the variables alter
together over a certain period of time.
 If there is correlation found, depending upon the numerical values measured, this can
be either positive or negative.
 The knowledge of correlation gives us an idea of the direction and intensity of change in
a variable when the correlated variable changes.
 Correlation denotes the interdependency among the variables for correlating two
phenomenon, it is essential that the two phenomenons should have cause-effect
relationship and if such relationship does not exist then the two phenomenons cannot be
correlated.
 If two variables vary in such a way that movement in one are accompanied by movement
in other, these variables are called cause and effect relationship.
 Causation always implies correlation but correlation does not necessarily imply causation.
Because there is strong positive or strong negative correlation between two variables, this
does not mean that one variable is caused by the other variable. A strong correlation never
implies a cause-effect relationship between two variables.
 co-efficient of correlation:
 To measure the degree of association or relationship between two variables quantitatively
of relationship is used and is termed as co-efficient of correlation.
 Co-efficient of correlation is a numerical index that tells us to what extent the two variables
are related and to what extent the variations in one variable changes with the variations in
the other. The co-efficient of correlation is always symbolized either by r or p (Rho) range
from(-1 <=r>=1)
 Techniques for Measuring Correlation:
 Three important statistical tools used to measure correlation are: Scatter diagrams, Karl
Pearson's coefficient of correlation, and Spearman's rank correlation.
 1. Scatter Diagram:
 • A scatter diagram visually presents the nature of association without giving any specific
numerical value. In this technique, the values of the two variables are plotted as points on a
graph paper.
 From a scatter diagram, one can get a fairly good idea of the nature of relationship. In a
scatter diagram the degree of closeness of the scatter points and their overall direction
enable us to examine the relationship.
 If all the points lie on a line, the correlation is perfect and is said to be unity. If the scatter
points are widely dispersed around the line, the correlation is low.
 The correlation is said to be linear if the scatter points lie near a line or on a line. Scatter
diagrams spanning in Fig. give us an idea of the relationship between two variables.
 2. Karl Pearson's Coefficient of Correlation:
 A numerical measure of linear relationship between two variables is gi coefficient of
correlation.
 A relationship is said to be linear if it can be represented by a straight line. product
moment correlation and simple correlation coefficient.
 It gives a precise numerical value of the degree of linear relationship between two The
linear relationship may be given by Y = a + bX.
 This type of relation may be described by a straight line. The intercept that line makes on Y
axis is given by a and the slope of the line is given by b. It gives the change in the value of
Y for very small change in the value of X. On the other hand, if the relation cannot be
represented by straight line as in Y = X the value of the coefficient will be zero. It clearly
shows that zero correlation need not mean absence of any type of relation between the
two variables
 The value of the correlation coefficient lies between minus one and plus one, -1 <= r >= 1 .
The product moment correlation or the Karl Pearson's measure of correlations
Correlation is of following types:
1. Positive correlation:
 When the values of one variable increase with that of another are increased. The values of two
variables are changing with same direction. The high numerical values of one variable relate to
the high numerical values of the other. i.e. 0<r < 1.
 For example, Height and weight, study time and grades.
2. Negative correlation:
 When the values of one variable decrease with that of another are increased or vice versa. The
values of variables change with opposite direction. i.e. the high numerical values of one
variable relate to the low numerical values of the other. i.e. -1<r<0.
 For example, Price and quantity demanded, alcohol consumption and driving ability.
3. No Correlation:
 There is no impact on one variable with an increase or decrease of values of another variable.If
r=0 the two variables are uncorrelated. There is no linear relation between them.
4. Perfect Positive correlation:
 When there is a change in one variable, and if there is equal proportion of change in the
other variable say Y in the same direction, then these two variable are said to have a
Perfect Positive Correlation. i.e. r= 1.
5. Perfectly Negative correlation:
 Between two variables X and Y. if the change in X causes the same amount of change in Y
in equal proportion but in opposite direction, then this correlation is called as Perfectly
Negative correlation. r = -1.
 If there is correlation between two numerical sets of data, positive or negative, the
coefficient worked out can allow you to predict future trends between the two variables.
However, you must remember that you cannot be 100% sure that your prediction will be
correct because correlation does not determine cause or effect.
3. Spearman's Rank Correlation:
 Spearman's coefficient of correlation measures the linear association between ranks
assigned to individual items according to their attributes.
 Attributes are those variables which cannot be numerically measured such as intelligence of
people, physical appearance, honesty, etc. Ranking may be a better alter native to
quantification of qualities.

Regression:
 Regression analysis is a statistical tool used for the investigation of relationships
between variables. It is a method of predicting or estimating one variable knowing the
value of the other variable.
 Estimation is required in different fields in everyday life. A businessman wants to know
the effect of increase in advertising expenditure on sales or a doctor wishes to observe
the effect of a new drug on patients.
 An economist is interested in finding the effect of change in demand pattern of some
commodities on prices. Usually, we seek to ascertain the causal effect of one variable
upon another.
 We use a regression model to understand how changes in the predictor values are
associated with changes in the response mean. Regression analysis helps in
determining the cause and effect relationship between variables.
 We can also use regression to make predictions based on the values of the predictors.
It plays a significant role in many human activities, as it is a powerful and flexible tool
which used to forecast the past, present or future events on the basis of past or present
events.
 Regression analysis is also used to find trends in data. It will provide you with an equation
for a graph so that you can make predictions about your data.
 For example, you might guess that there is a connection between how much you eat and
how much you weigh; regression analysis can help you to quantify that.
 If you have been putting on weight over the last few years, it can predict how much you
will weigh in ten years time if you continue to put on weight at the same rate. It will also
give you a slew of statistics to tell you how accurate your model is.
 Thus, regression analysis models the relationships between a response variable and one
or more predictor variables. In simple words, regression analysis is used to model the
relationship between a dependent variable and one or more independent variables.
 Response variables are also known as dependent variables, Regressand, y-variables, and
outcome variables. Typically, you want to determine whether changes in the predictors are
associated with changes in the response.
 Predictor variables are also known as independent variables, Regressor, x-variables, and
input variables. A predictor variable explains changes in the response. Typically, you want
to determine how changes in one or more predictors are associated with changes in the
response.
For example, in a plant growth study, the response variable is the amount of growth that
occurs during the study. The investigators want to determine how changes in the
predictors are associated with changes in plant growth. The predictors are the amount of
fertilizer applied, the soil moisture, and the amount of sunlight.
Definition:
 “The statistical technique that expresses a functional relationship between two or
more variables in the form of an equation, to estimate the value of a variable,
based on the given value of another variable is called regression analysis".
 The variable whose value is to be estimated is called dependent variable and the
variable whose value is used to estimate this value is called independent
variable.
 The linear algebraic equations that express a dependent variable in terms of an
independent variable are called Linear Regression Equation.
 In terms of statistical inference, regression analysis is concerned with the
parameters of the regression equation that obtains between two or more variables
in the population.
 There are a variety of regression methodologies that you choose based on the
type of response variable, the type of model that is required to provide an
adequate fit to the data, and the estimation method.
The overall objectives of regression analysis can be summarized as follows:
1. To determine whether or not a relationship exists between two variables.
2. To describe the nature of the relationship, should one exist, in the form of a mathematical
equation.
3. To assess the degree of accuracy of description or prediction achieved by the regression
equation.
4. In the case of multiple regression, to assess the relative importance of the various predictor
variables in their contribution to variation in the criterion variable.
Types of Regression Models
The two basic types of regression analysis are:
1. Simple Regression Analysis:
 It is used to estimate the relationship between a dependent variable and a single independent
variable. Regression models that involve one explanatory variable are called Simple Regression. .
 For example, the relationship between crop yields and rainfall.
2. Multiple Regression Analysis:
 It is used to estimate the relationship between a dependent variable and two or more independent
Variables.
 When two or more explanatory variables are involved, the relationships are called Multiple
Regressions.
 For example, the relationship between the salaries of employees and their experience and education.
 Multiple regression analysis introduces several additional complexities but may produce more realistic
results than simple regression analysis. . Regression models are also divided into linear and nonlinear
models, depending on whether the relationship between the response and explanatory variables is
linear or nonlinear.
 In a simple linear regression, there are two variables x and y, wherein y depends on x or say
influenced by x. Here y is called as dependent, or criterion variable and x is independent or predictor
variable.
 The regression line of y on x is expressed as under:
y = a + bx
 where, a = constant, b = regression coefficient, In this equation, a and b are the two
regression parameters. While there are a number of possible criteria for choosing a best-
fitting line, one of the most useful is the least squares criterion.
 The slope b of the best-fitting line, based on the least squares criterion, can be shown be
where the summation is overall n pairs of (x1, y1) values.
The value of a, the y-intercept, can be turn be shown to be a function of b, x and ý i.e.
a = y - bx
 We can observe in following plot linear relationship the mileage and displacement of cars.
The green points are actual observations while the black line fitted is the line of regression.
 Regression Analysis:
Steps in Regression Analysis:
Regression analysis includes the following steps:
Step 1: Statement of the Problem under Consideration:
 The first important step in conducting any regression analysis is to specify the problem
and the objectives to be addressed by the regression analysis.
 The wrong formulation or the wrong understanding of the problem will give the wrong
statistical inferences. The choice of variables depends upon the objectives of study and
understanding of the problem.
Step 2: Choice of Relevant Variables:
 Once the problem is carefully formulated and objectives have been decided, the next
question is to choose the relevant variables.
 It has to kept in mind that the correct choice of variables will determine the statistical
inferences correctly.
 For example, in any agricultural experiment, the yield depends on explanatory variables
like quantity of fertilizer, rainfall, irrigation, temperature etc. These variables are denoted by
X. X. ..., X, as a set of k explanatory variables.
Step 3: Collection of Data on Relevant Variables:
 Once the objective of study is clearly stated and the variables are chosen, the next
question arises is to collect data on such relevant variables. The data is essentially the
measurement on these variables
 For example, suppose we want to collect the data on age. For this, it is important to know
how to record it. Then either the date of birth can be recorded which will provide the exact
age on any specific date or the age in terms of completed years as on specific date.
 Moreover, it is also important to decide that whether the data has to be collected on
variables as quantitative variables or qualitative variables.
 Examples of quantitative variables include height and weight, while examples of qualitative
variables include hair color, religion and gender. Quantitative variables are often
represented in units of measurement, and qualitative variables are represented in non-
numerical terms.
Step 4: Specification of Model:
 The experimenter or the person working in the subject usually helps in determining the
form of the model. Only the form of the tentative model can be ascertained and it will
depend on some unknown parameters. For example, a general form will be like
y = f(X1, X2, ..., Xk; B1, B2, ... Bk)+ €
where € is the random error reflecting mainly the difference in the observed value of y and
the value of y obtained through the model. The form of f (X1, X2, ..., Xk, B1, B2, B2, ..., Bk)
can be linear as well as nonlinear depending on the form of parameters (B1, B2, ..., Bk). A
model is said to be linear if it is linear in parameters.
For example,
y = B X + B X + B X + €
y = B + B ln X + € ,are linear models whereas,
y = B X + B X + B X + €
y = (In B1) X + B X + € ,are non-linear models.
 Step 5: Choice of Method for Fitting the Data:
 After the model has been defined and the data have been collected, the next task
is to estimate the parameters of the model based on the collected data. This is
also referred to as parameter estimation or model fitting.
 Parameter estimation (also called coefficient) are the change in the response
associated with a one-unit change of the predictor, all other predictors being held
constant.
 The most commonly used method of estimation is the least squares method.
Under certain assumptions, the least squares method produces estimators with
desirable properties. The other estimation methods are the maximum likelihood
method, ridge method, principal components method etc.
 Step 6: Fitting of Model:
 The estimation of unknown parameters using appropriate method provides the values of
the parameters. Substituting these values in the equation gives us a usable model. This is
termed as model fitting.
 The estimates of parameters B1,…., Bk in the model,
y = f(X1, X2, ..., XK, B1, B2, ..., Bk) + €
 are denoted as ßo, ß1, ..., Bk which gives the fitted model as
 y = f(X1, X2, ..., Xk , ßo, Bi.... , ßk)
 When the value of y is obtained for the given values of X1, X2, ..., Xk, it is denoted as y
and called as fitted value.
 The fitted equation is used for prediction. In this case, Ÿ is termed as predicted value.
Note that the fitted value is where, the values used for explanatory variables
correspond to one of the n observations in the data whereas predicted value, is the
one obtained for any set of values of explanatory variables. It is not generally
recommended to predict the y - values for the set of those values of explanatory variables
which lie outside the range of data. When the values of explanatory variables are the
future values of explanatory variables, the predicted values are called forecasted
values.
Step 7: Model Validation and Criticism:
 The validity of statistical methods to be used for regression analysis depends on various
assumptions. These assumptions are essentially the assumptions for the model and the
data.
 The quality of statistical inferences heavily depends on whether these assumptions are
satisfied or not. For making these assumptions to be valid and to be satisfied, care is
needed from the beginning of the experiment.
 One has to be careful in choosing the required assumptions and to examine whether the
assumptions are valid for the given experimental conditions or not. It is also important to
decide the situations in which the assumptions may not meet.
 The validation of the assumptions must be made before drawing any statistical conclusion.
Any departure from validity of assumptions will be reflected in the statistical inferences. In
fact, the regression analysis is an iterative process where the outputs are used to
diagnose, validate, criticize and modify the inputs.
 Step 8: Using the Chosen Model(s) for the Solution of the posed problem and
forecasting:
 The determination of explicit form of regression equation is the ultimate objective of
regression analysis. It is finally a good and valid relationship between study variable and
explanatory variables
 The regression equation helps in understanding the interrelationships among the variables.
Such regression equation can be used for several purposes.
 For example, to determine the role of any explanatory variable in the joint relationship in
any policy formulation, to forecast the values of response variable for given set of values of
explanatory variables.
 • Applications or uses of Regression Analysis:
 1. Predictive Analytics:
 Predictive analytics i.e. forecasting future opportunities and risks is the most prominent
application of regression analysis in business. Demand analysis, for instance, predicts the
number of items which a consumer will probably purchase.
 • However, demand is not the only dependent variable when it comes to business.
Regression analysis can go far beyond forecasting impact on direct revenue.
 • For example, Insurance companies heavily rely on regression analysis to estimate the
credit standing of policyholders and a possible number of claims in a given time period.
 2. Operation Efficiency:
 • Regression models can also be used to optimize business processes. A factory manager,
for example, can create a statistical model to understand the impact of oven temperature
on the shelf life of the cookies baked in those ovens. • In a call center, we can analyze the
relationship between wait times of callers and number of complaints.Data-driven decision
making eliminates guesswork, hypothesis and corporate politics from decision making.
 • This improves the business performance by highlighting the areas that have the
maximum impact on the operational efficiency and revenues.
 3. Supporting Decisions:
 Today businesses are overloaded with data on finances, operations and customer purchases.
Increasingly, executives are now leaning on data analytics to make informed business
decisions.
 Regression analysis can bring a scientific angle to the management of any businesses. By
reducing the tremendous amount of raw data into actionable information, regression analysis
leads the way to diving into execution smarter and more accurate decisions. This technique acts
as a perfect tool to test a hypothesis before diving execution.

4. Correcting Errors:
 Regression is not only great for lending empirical support to management decisions but also for
identifying errors in judgment hopping hours will greatly increase sales.
 For example, a retail store manager may believe that extending • Regression analysis, however,
may indicate that the increase in revenue might not be sufficient to support the rise in operating
expenses due to longer working hours (such as additional employee labor charges).
 Hence, regression analysis can provide quantitative support for decisions and prevent mistakes
due to manager's intuitions.
5. New Insights:
 • Over time businesses have gathered a large volume of unorganized data that has the
potential to yield valuable insights. However, this data is useless without proper analysis.
 • Regression analysis techniques can find a relationship between different variables by
uncovering patterns that were previously unnoticed.
 • For example, analysis of data from point of sales systems and purchase accounts may
highlight market patterns like increase in demand on certain days of the week or at certain
times of the year. You can maintain optimal stock and personnel before a spike in demand
arises by acknowledging these insights.
Sr,No. Basis for
comparison
Correlation Regression
1 Meaning Correlation is a statistical measures
which determines co-relationship
association of two variables
Regression describes how an
independent variable is numerically
related to the dependent variable
2 Usage TO represent linear relationship
between two variables
To fit a best line and estimate onr
variable on the basis of another
variable.
3 Dependent and
independent
variable
No difference Both variables are different
4 Indicates Correlation coefficient indicates the
extent to which two variables move
together.
Regression indicates the impact of a
unit changes in the known variable(x)
on the estimated variable(y).
5 Objective To find a numerical value
expressing the relationship
variables
To estimate values of random variable
on the basis of the values of fixed
variable.

Weitere ähnliche Inhalte

Was ist angesagt?

Predictive project analytics: Will your project be successful?
Predictive project analytics: Will your project be successful?Predictive project analytics: Will your project be successful?
Predictive project analytics: Will your project be successful?Deloitte Canada
 
Machine Learning for Business - Eight Best Practices for Getting Started
Machine Learning for Business - Eight Best Practices for Getting StartedMachine Learning for Business - Eight Best Practices for Getting Started
Machine Learning for Business - Eight Best Practices for Getting StartedBhupesh Chaurasia
 
SAS/MIT/Sloan Data Analytics
SAS/MIT/Sloan Data AnalyticsSAS/MIT/Sloan Data Analytics
SAS/MIT/Sloan Data AnalyticsSteven Kimber
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analyticsUmasree Raghunath
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Edureka!
 
Data Analytics and Big Data on IoT
Data Analytics and Big Data on IoTData Analytics and Big Data on IoT
Data Analytics and Big Data on IoTShivam Singh
 
In-Depth Data Analytics
In-Depth Data AnalyticsIn-Depth Data Analytics
In-Depth Data AnalyticsYASH GAIKWAD
 
Customer Intelligence & Analytics - Part I
Customer Intelligence & Analytics - Part ICustomer Intelligence & Analytics - Part I
Customer Intelligence & Analytics - Part IVivastream
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business IntelligenceAbu Bashar
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data miningHoang Nguyen
 
Data Analytics & Business Analytics
Data Analytics & Business AnalyticsData Analytics & Business Analytics
Data Analytics & Business AnalyticsYASH GAIKWAD
 
Data Analytics in Azure Cloud
Data Analytics in Azure CloudData Analytics in Azure Cloud
Data Analytics in Azure CloudMicrosoft Canada
 
Application of predictive analytics
Application of predictive analyticsApplication of predictive analytics
Application of predictive analyticsPrasad Narasimhan
 

Was ist angesagt? (20)

Predictive analytics 2025_br
Predictive analytics 2025_brPredictive analytics 2025_br
Predictive analytics 2025_br
 
ForresterPredictiveWave
ForresterPredictiveWaveForresterPredictiveWave
ForresterPredictiveWave
 
Predictive project analytics: Will your project be successful?
Predictive project analytics: Will your project be successful?Predictive project analytics: Will your project be successful?
Predictive project analytics: Will your project be successful?
 
Machine Learning for Business - Eight Best Practices for Getting Started
Machine Learning for Business - Eight Best Practices for Getting StartedMachine Learning for Business - Eight Best Practices for Getting Started
Machine Learning for Business - Eight Best Practices for Getting Started
 
Introduction to Business Anlytics and Strategic Landscape
Introduction to Business Anlytics and Strategic LandscapeIntroduction to Business Anlytics and Strategic Landscape
Introduction to Business Anlytics and Strategic Landscape
 
SAS/MIT/Sloan Data Analytics
SAS/MIT/Sloan Data AnalyticsSAS/MIT/Sloan Data Analytics
SAS/MIT/Sloan Data Analytics
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
Data analysis
Data analysisData analysis
Data analysis
 
Data Analytics and Big Data on IoT
Data Analytics and Big Data on IoTData Analytics and Big Data on IoT
Data Analytics and Big Data on IoT
 
In-Depth Data Analytics
In-Depth Data AnalyticsIn-Depth Data Analytics
In-Depth Data Analytics
 
Customer Intelligence & Analytics - Part I
Customer Intelligence & Analytics - Part ICustomer Intelligence & Analytics - Part I
Customer Intelligence & Analytics - Part I
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligence
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Data analytics
Data analyticsData analytics
Data analytics
 
Data Analytics & Business Analytics
Data Analytics & Business AnalyticsData Analytics & Business Analytics
Data Analytics & Business Analytics
 
Data Analytics in Azure Cloud
Data Analytics in Azure CloudData Analytics in Azure Cloud
Data Analytics in Azure Cloud
 
Predictive analytics
Predictive analytics Predictive analytics
Predictive analytics
 
Application of predictive analytics
Application of predictive analyticsApplication of predictive analytics
Application of predictive analytics
 
Data analytics
Data analyticsData analytics
Data analytics
 

Ähnlich wie Regression and correlation

what is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysiswhat is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysisData analysis ireland
 
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...Data Science Council of America
 
FOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby A
FOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby AFOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby A
FOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby AJeanmarieColbert3
 
Unveiling the Power of Data Analytics Transforming Insights into Action.pdf
Unveiling the Power of Data Analytics Transforming Insights into Action.pdfUnveiling the Power of Data Analytics Transforming Insights into Action.pdf
Unveiling the Power of Data Analytics Transforming Insights into Action.pdfKajal Digital
 
Running title TRENDS IN COMPUTER INFORMATION SYSTEMS1TRENDS I.docx
Running title TRENDS IN COMPUTER INFORMATION SYSTEMS1TRENDS I.docxRunning title TRENDS IN COMPUTER INFORMATION SYSTEMS1TRENDS I.docx
Running title TRENDS IN COMPUTER INFORMATION SYSTEMS1TRENDS I.docxanhlodge
 
Uncover Trends and Patterns with Data Science.pdf
Uncover Trends and Patterns with Data Science.pdfUncover Trends and Patterns with Data Science.pdf
Uncover Trends and Patterns with Data Science.pdfUncodemy
 
Data Analytics Introduction.pptx
Data Analytics Introduction.pptxData Analytics Introduction.pptx
Data Analytics Introduction.pptxamitparashar42
 
Data Analytics Introduction.pptx
Data Analytics Introduction.pptxData Analytics Introduction.pptx
Data Analytics Introduction.pptxamitparashar42
 
Data Analysis Methods 101 - Turning Raw Data Into Actionable Insights
Data Analysis Methods 101 - Turning Raw Data Into Actionable InsightsData Analysis Methods 101 - Turning Raw Data Into Actionable Insights
Data Analysis Methods 101 - Turning Raw Data Into Actionable InsightsDataSpace Academy
 
Unveiling the Power of Data Analytics.pdf
Unveiling the Power of Data Analytics.pdfUnveiling the Power of Data Analytics.pdf
Unveiling the Power of Data Analytics.pdfJyoti Sharma
 
Data science and data analytics major similarities and distinctions (1)
Data science and data analytics  major similarities and distinctions (1)Data science and data analytics  major similarities and distinctions (1)
Data science and data analytics major similarities and distinctions (1)Robert Smith
 
Beginners_s_Guide_Data_Analytics_1661051664.pdf
Beginners_s_Guide_Data_Analytics_1661051664.pdfBeginners_s_Guide_Data_Analytics_1661051664.pdf
Beginners_s_Guide_Data_Analytics_1661051664.pdfKashifJ1
 
Big Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBig Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBala Iyer
 
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...IJSCAI Journal
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...gerogepatton
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...gerogepatton
 

Ähnlich wie Regression and correlation (20)

what is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysiswhat is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysis
 
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
 
FOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby A
FOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby AFOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby A
FOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby A
 
Unveiling the Power of Data Analytics Transforming Insights into Action.pdf
Unveiling the Power of Data Analytics Transforming Insights into Action.pdfUnveiling the Power of Data Analytics Transforming Insights into Action.pdf
Unveiling the Power of Data Analytics Transforming Insights into Action.pdf
 
Running title TRENDS IN COMPUTER INFORMATION SYSTEMS1TRENDS I.docx
Running title TRENDS IN COMPUTER INFORMATION SYSTEMS1TRENDS I.docxRunning title TRENDS IN COMPUTER INFORMATION SYSTEMS1TRENDS I.docx
Running title TRENDS IN COMPUTER INFORMATION SYSTEMS1TRENDS I.docx
 
Business Analytics Unit III: Developing analytical talent
Business Analytics Unit III: Developing analytical talentBusiness Analytics Unit III: Developing analytical talent
Business Analytics Unit III: Developing analytical talent
 
Uncover Trends and Patterns with Data Science.pdf
Uncover Trends and Patterns with Data Science.pdfUncover Trends and Patterns with Data Science.pdf
Uncover Trends and Patterns with Data Science.pdf
 
Data Analytics Introduction.pptx
Data Analytics Introduction.pptxData Analytics Introduction.pptx
Data Analytics Introduction.pptx
 
Data Analytics Introduction.pptx
Data Analytics Introduction.pptxData Analytics Introduction.pptx
Data Analytics Introduction.pptx
 
Difference b/w DataScience, Data Analyst
Difference b/w DataScience, Data AnalystDifference b/w DataScience, Data Analyst
Difference b/w DataScience, Data Analyst
 
Data Analysis Methods 101 - Turning Raw Data Into Actionable Insights
Data Analysis Methods 101 - Turning Raw Data Into Actionable InsightsData Analysis Methods 101 - Turning Raw Data Into Actionable Insights
Data Analysis Methods 101 - Turning Raw Data Into Actionable Insights
 
LESSON 1.pdf
LESSON 1.pdfLESSON 1.pdf
LESSON 1.pdf
 
Unveiling the Power of Data Analytics.pdf
Unveiling the Power of Data Analytics.pdfUnveiling the Power of Data Analytics.pdf
Unveiling the Power of Data Analytics.pdf
 
Data science and data analytics major similarities and distinctions (1)
Data science and data analytics  major similarities and distinctions (1)Data science and data analytics  major similarities and distinctions (1)
Data science and data analytics major similarities and distinctions (1)
 
Beginners_s_Guide_Data_Analytics_1661051664.pdf
Beginners_s_Guide_Data_Analytics_1661051664.pdfBeginners_s_Guide_Data_Analytics_1661051664.pdf
Beginners_s_Guide_Data_Analytics_1661051664.pdf
 
Big Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBig Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the Marketspace
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 

Kürzlich hochgeladen

1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excelysmaelreyes
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一F sss
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 

Kürzlich hochgeladen (20)

1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excel
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 

Regression and correlation

  • 2. Basics of Data Analytics:  Analytics: i)It is the systematic computational analysis of data. ii)It is the discovered , interpretation and communication of meaningful pattern in a data. iii)It relies on the simultaneous application of statistics, computer programming and operation research to quantify the performance. Data Analytics: It is the science of examine raw data with the purpose of drawing conclusion . Data Analytics: It is a process of inspecting, cleansing, transforming, and modelling data with the goal of discovering useful information, informing conclusion, and supporting decision making.
  • 3.
  • 4.
  • 5.
  • 6. Need of Data analytics:  Data and information are increasing rapidly, so that information available to us in future is unpredictable.  It is crucial to integrate this data. If it get wasted, lots of valuable information will be lost.  Previously, skilled analyst is required for processing the data; but these day, massive amount of data processing is not possible for human being.  So there is a need for the tools which operate at high speed and efficiency on this data and helps the business for making better decision.  So, Data Analytics is important.
  • 7. What is Data Analytics?  It is the quantitative or qualitative techniques.  It is the science of drawing insights from raw information source.  It encompasses many diverse types of data analysis.  It is primarily conducted in business to consumer (B2C)application.
  • 8. Why analytics is important?
  • 10. Data analytics vs. Data analysis Analysis Explore potential future events
  • 12.
  • 13. Overview of Data analytics Lifecycles  Data Analytica is the science of examining raw data with the purpose of drawing conclusions about the information.  There are the 6 phases in lifecycle of data analytics: 1. Discovery: i)The team learn business domain. ii)The accesses the resources available to support the project in terms of people, technology, time and data. iii)Framing the business problem as an analytics challenge that can be addressed in subsequent phases and formulating initial hypothesis to test and begin learning initial data. 2. Data Preparation : i)Here team requires analytical sandbox. In which team works with data and perform analytics in project. ii)Teams needs to execute Extract, Load, Transform(ETLT) process. Data should transformed in ETLT process so team can work with it and analyze it. iii)It include the steps to explore, processes, and condition data prior to modeling and analytics
  • 14. 3. Model Planning: i)Here teams determine methods, techniques, and workflow it intends to follow for the subsequent model building phase. ii)The team explore data to learn about the relationships between variables. 4. Model Building: i)In this phase, team develops dataset for testing, training, and production purpose. ii)Team execute Model based on work done in model planning phase. iii)Team find out whether existing tools will be sufficient for running the model or if it will need more robust environment for executing models and workflow. 5. Communicate Results: i)Here team, in collaboration with stakeholder, determine if the result for the project are success or failure based on the criteria developed in phase 1. ii)Team should identify key finding, quantify the business value, and develop a narrative to summarize and convey findings to stakeholders. 6. Optimization: i)Team deliver the final report, briefings, code and technical documents. ii)The team may run a pilot project to implement the model in a production environment.
  • 15. Importance of Data Analytics for Business; 1. Improving efficiency: 2. Market Understandings: 3. Cost Reduction: 4. Faster and Better Decision Making: 5. New Products/ Services: 6. Industry knowledge: 7. Witnessing the opportunity:
  • 16. Difference between Data Science and Data Analytics Sr. o. Terms Data Science Data Analytics 1 Scope Macro Micro 2 Focus on Providing strategic actionable insights into the world Providing operational observation into issues 3 Skills required Mathematical, technical and strategic knowledge is necessary Data analytics and visualization skills required. 4 Big data Deal with big data Not necessary to deal with big data 5 Major fields Machine learning, AI, Search engine engineering, corporate analytics. Healthcare, gaming, travel, industries with immediate data needs.
  • 17.
  • 18.
  • 19.
  • 20.  What Are Diagnostic Analytics?  Diagnostic analytics are a form of advanced analytics that focus on explaining why something has happened based on data analysis. Like a doctor investigating a patient’s symptoms, they aim to understand the underlying issues and determine why an issue is happening.  Its capabilities allow users to identify anomalies by highlighting areas that could require further study, which are pinpointed when trends or data points raise questions that can’t be answered easily or without digging deeper. Some questions that would have to be addressed with diagnostic analytics include: • Why did this marketing campaign fail? • Why have sales increased without any increased marketing attention for a certain region? • Why did employee performance fall during this month?  As well as other questions that have no obvious answer from a single data source.  Diagnostic analytics offer data discovery, drill-down, data mining and data correlation. Drilling down into the data allows users to identify potential sources for the anomalies discovered in the first step. Analysts can use these capabilities to examine patterns both within and external to the data to draw an informed conclusion. Probability theory, filtering, regression analytics and time- series data analysis are all useful tools related to diagnostic analytics to facilitate this process.
  • 21.  What Are Descriptive Analytics?  It describe the or summarize the raw data and make it something that is interpretable by humans.  Simpler way is to define descriptive analytics is ,it answer the question “What has happened?”  Descriptive analytics are useful because they allows us to learn from past behaviours and understand how they might influence future outcomes.  The main objective of descriptive analytics is to find out the reason behind precious success or failure in the past.  Common example is, Descriptive analytics are the reports that provide historical insights regarding the company’s production, financials, operations, sales, inventory and customers.  Most of the social analytics are the descriptive analytics. They summarize certain grouping based on simple counts of events. Like number of followers, likes, post fans .
  • 22.  What Are Predictive Analytics?  Predictive and descriptive analytics have oppositional objectives, but they’re very closely related. This is because you need accurate information about the past to make predictions for the future. Predictive tools attempt to fill in gaps in the available data. If descriptive analytics answer the question, “what happened in the past,” predictive analytics answer the question, “what might happen in the future?”  Predictive analytics take historical data from various systems and use it to highlight patterns. Then, algorithms, statistical models and machine learning are employed to capture the correlations between targeted data sets.  The most common commercial example is a credit score. Banks uses historical information to predict whether or not a candidate is likely to keep up with payments. It works in much the same way for manufacturers, except that they’re usually trying to find out if products will sell. Predictive analytics focus on the future of the business.  Predictive analytics can be used through out the organization, from forecasting customer behavior and purchasing pattern to identify trends in sale activities.
  • 23.  What Are Prescriptive Analytics?  Of diagnostic, predictive, descriptive, and prescriptive analytics, the latter is the most recent addition to the business intelligence landscape. These tools enable companies to view potential decisions and, based on both current and historical data, follow them through to a likely outcome. Provide recommendation regarding actions that will take advantages of the prediction.  Like predictive analytics, prescriptive analytics won’t be right 100% of the time, because they work with estimates. However, they provide the best way of “seeing into the future” and determining the viability of decisions before they’re made.  The difference between the two is that prescriptive analytics offers opinions as to why a particular outcome is likely. They can then offer recommendations based on this information. To achieve this, they use algorithms, machine learning and computational modeling.  If predictive analytics answers, “What might happen?” then prescriptive analytics answers, “What do we have to do to make it happen?” or “How will this action change the outcome?” Prescriptive deals more with trial and error and has a bit of a hypothesis-testing nature to it.
  • 24.  Summary of the Different Types  Diagnostic analytics ask about the present. They drill down into why something has happened and helps users diagnose issues.  Descriptive analytics ask about the past. They want to know what has been happening to the business and how this is likely to affect future sales.  Predictive analytics ask about the future. These are concerned with what outcomes can happen and what outcomes are most likely.  Finally, prescriptive tools ask about the present’s impact on the future. It wants to know the best course of action for right now in order to positively impact the future. In other words, they’re the decision makers.
  • 25. Statistical Inference:  Statistical inference is a technique by which you can analyze the result and make conclusions from the given data to the random variations.  Statistics can be classified into two different categories. The two different types of Statistics are: 1. Descriptive Statistics 2. Inferential Statistics In Statistics, descriptive statistics describe the data, whereas inferential statistics help you make predictions from the data. In inferential statistics, the data are taken from the sample and allows you to generalize the population. In general, inference means “guess”, which means making inference about something  The purpose of statistics is to describe and predict the information.  The basic principle of Statistical inference is that conclusion about a population of interest can be made using information contained in a sample from that population.
  • 26.  Statistical inference is the procedure through which inference about a population are made based on certain characteristics calculated from a sample of data drawn from that population.  Statistical inference is the process of generating conclusion about a population from a noisy sample. Without Statistical inference we simply living in data, but with Statistical inference we are trying to generate knowledge. Definition of Statistical inference :It is the method of drawing and measuring the reliability of conclusions about population based on information obtained from a sample of the population.  Statistical inference can be contrasted with exploratory data analysis.  Statistical inference requires navigating the set of assumption and tools and subsequently thinking about how to draw conclusion from data.  Descriptive statistics :It emphasize the role of population quantities of interest, about which we wish to draw inference. Descriptive statistics are used as a preliminary steps before formal inference are drawn. A descriptive statistic is a summary statistic that quantitatively describes or summarizes features from a collection of information.  The conclusion of statistical inference is a statistical proposition.
  • 27.  There are two broad areas of Statistical inference : 1)statistical estimation 2)Statistical hypothesis testing. 1) Statistical estimation: It is concerned with best estimating the value or range of values for a particular population parameter. There are two types of statistical estimation: i)Point estimation: Here ,we estimate an unknown parameter using a single number that is calculated from the sample data. In statistics, point estimation involves the use of sample data to calculate a single value which is to serve as a "best guess" or "best estimate" of an unknown population parameter. ii)Interval estimation: Here, we estimate an unknown parameter using an interval of values that is likely to contain the true value of that parameter. Interval estimation, in statistics, the evaluation of a parameter—for example, the mean (average)—of a population by computing an interval, or range of values, within which the parameter is most likely to be located. 2)Hypothesis testing: It is concerned with deciding whether the study data are consistent at some level of agreement with a particular population parameter. In Hypothesis testing we begin with a claim about the population(called it as Null Hypothesis), and check whether or not the data obtained from the sample provide evidence against this claim.
  • 28. Population:  In statistics as well as in quantitative methodology, the set of data are collected and selected from a statistical population with the help of some defined procedures. There are two different types of data sets namely, population and sample. So basically when we calculate the mean deviation, variance and standard deviation, it is necessary for us to know if we are referring to the entire population or to only sample data. Suppose the size of the population is denoted by ‘n’ then the sample size of that population is denoted by n -1. Let us take a look of population data sets and sample data sets in detail.  Population : It includes all the elements from the data set and measurable characteristics of the population such as mean and standard deviation are known as a parameter. For example, All people living in India indicates the population of India.  There are different types of population. They are: • Finite Population • Infinite Population • Existent Population • Hypothetical Population
  • 29. Let us discuss all the types one by one. Finite Population  The finite population is also known as a countable population in which the population can be counted. In other words, it is defined as the population of all the individuals or objects that are finite. For statistical analysis, the finite population is more advantageous than the infinite population. Examples of finite populations are employees of a company, potential consumer in a market. Infinite Population  The infinite population is also known as an uncountable population in which the counting of units in the population is not possible. Example of an infinite population is the number of germs in the patient’s body is uncountable. Existent Population  The existing population is defined as the population of concrete individuals. In other words, the population whose unit is available in solid form is known as existent population. Examples are books, students etc. Hypothetical Population  The population in which whose unit is not available in solid form is known as the hypothetical population. A population consists of sets of observations, objects etc that are all something in common. In some situations, the populations are only hypothetical. Examples are an outcome of rolling the dice, the outcome of tossing a coin.
  • 30. Sample It includes one or more observations that are drawn from the population and the measurable characteristic of a sample is a statistic. Sampling is the process of selecting the sample from the population. For example, some people living in India is the sample of the population. Basically, there are two types of sampling. They are: •Probability sampling •Non-probability sampling Probability Sampling In probability sampling, the population units cannot be selected at the discretion of the researcher. This can be dealt with following certain procedures which will ensure that every unit of the population consists of one fixed probability being included in the sample. Such a method is also called random sampling. Some of the techniques used for probability sampling are: •Simple random sampling •Cluster sampling •Stratified Sampling •Disproportionate sampling •Proportionate sampling •Optimum allocation stratified sampling •Multi-stage sampling Non Probability Sampling In non-probability sampling, the population units can be selected at the discretion of the researcher. Those samples will use the human judgements for selecting units and has no theoretical basis for estimating the characteristics of the population. Some of the techniques used for non-probability sampling are •Quota sampling •Judgement sampling •Purposive sampling
  • 31. Population and Sample Examples •All the people who have the ID proofs is the population and a group of people who only have voter id with them is the sample. •All the students in the class are population whereas the top 10 students in the class are the sample. •All the members of the parliament is population and the female candidates present there is the sample. Population and Sample Formulas We will demonstrate here the formulas for mean absolute deviation (MAD), variance and standard deviation based on population and given sample. Suppose n denotes the size of the population and n-1 denotes the sample size, then the formulas for mean absolute deviation, variance and standard deviation are given by;
  • 32. Comparison Population Sample Meaning Collection of all the units or elements that possess common characteristics A subgroup of the members of the population Includes Each and every element of a group Only includes a handful of units of population Characteristics Parameter Statistic Data Collection Complete enumeration or census Sampling or sample survey Focus on Identification of the characteristics Making inferences about the population Difference between Population and Sample Some of the key differences between population and sample are clearly given below:
  • 33. Statistical modeling 1.Statistical Model:  Definition: A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population).  Statistical model is a combination of inference based on collected data and population understanding used to predict information in an idealized form. This means that a statistical model can be an equation or a visual representation of information based on research that’s already been collected over time.  Statistical models are the part of the foundation of statistical inference.  Essentially, all statistical model exist to find inference between different types of variable and because there are different types of variable, there are different types of statistical model. Some of the types of model include regression, analysis of variance, analysis of covariance, and chi-square etc.
  • 34. 2.Statistical Modeling:  Statistical modeling is an approach to statistical data analysis that helps researchers discovers something about a phenomenon that is assumed to exist. This approach helps explain the variability found in the dataset.  It is a strategy which brings together estimation and hypothesis test under the same umbrella.  This modeling approach construct summary model that displays current knowledge. The model are then “fitted” to data.  A general modelling framework: Data= Pattern + Residual Where, Pattern: Systematic or ‘explained’ variation. Residuals: Leftover or ‘Unexplained’ variation. In simple term statistical modelling is a simplified, mathematically formalized way to approximate reality(i.e. what generate your data)and optionally to make prediction from this approximation.
  • 35. Basic steps in statistical model building process are: 1. Model selection: in this step plots of data, process knowledge and assumption about the process are used to determine the form of the model to be fit to the data. 2. Model fitting: Then using selected model and possibly information about data, an appropriate model fitting method is used to estimate the unknown parameter in the model. When parameter estimation have been made, them model is carefully assessed to see if the underlying assumption of the analysis appear possible.If assumption seems valid ,the model can be used to answer the scientific questions that promoted modeling effort. 3. Model Validation: If the model validation identifies problem with the current model, then modeling process is repeated using information from the model validation .
  • 36. Probability Distribution:  In Statistics, the probability distribution gives the possibility of each outcome of a random experiment or events. It provides the probabilities of different possible occurrence.  To recall, the probability is a measure of uncertainty of various phenomena. Like, if you throw a dice, what the possible outcomes of it, is defined by the probability. This distribution could be defined with any random experiments, whose outcome is not sure or could not be predicted. Probability Distribution Definition  Probability distribution yields the possible outcomes for any random event. It is also defined based on the underlying sample space as a set of possible outcomes of any random experiment. These settings could be a set of real numbers or a set of vectors or set of any entities. It is a part of probability and statistics.
  • 37. 1. Probability: Probability means possibility. It is a branch of mathematics that deals with the occurrence of a random event. The value is expressed from zero to one. Probability has been introduced in Maths to predict how likely events are to happen. The meaning of probability is basically the extent to which something is likely to happen. This is the basic probability theory, which is also used in the probability distribution, where you will learn the possibility of outcomes for a random experiment. To find the probability of a single event to occur, first, we should know the total number of possible outcomes. 2. Random experiments: Random experiments are defined as the result of an experiment, whose outcome cannot be predicted. Suppose, if we toss a coin, we cannot predict, what outcome it will appear either it will come as Head or as Tail. The possible result of a random experiment is called an outcome. And the set of outcomes is called a sample point. With the help of these experiments or events, we can always create a probability pattern table in terms of variable and probabilities. Probability of event to happen P(E) = Number of favorable outcomes/Total Number of outcomes
  • 38.  3. Sample Space:It is the set of all possible outcomes of a random experiments.  4. Random Variables It is the variable whose possible values are numerical outcomes of a random experiment. P(X) represent the probability of X. P(X=x) refer to probability that the random variable X is equal to a particular value, denoted by x. Example, P(X=1) refer to probability that random variable X is equal to 1. Consider an example ,suppose you flip a coin two times. This simple statistics experiments have 4 possibilities :HH, HT, TH, TT. Now let a variable X represent the number of heads that result from experiment. The variable X has outcome values 0,1 or 2. Table represent the probability distribution of a random variable X Number of Heads Probability 0 0.25 1 0.50 2 0.25
  • 39. Probability Distribution: A probability distribution is a function that describes the likelihood of obtaining the possible values that a random variable can assume. The probability distribution of a random variable X is define as: Definition : probability distribution of a random variable X is the system of numbers X : x1 x2 ……… xn P(X) : p1 p2 ……… pn Where ,the real numbers x1,x2,….,xn are the possible values of random variable X. The probability of random variable X taking the value x i.e. P(X=x)=pi. P(X)= the likelihood that random variable takes a specific value of x. The sum of all probabilities for all possible values must be equal to 1. probability distribution may be either discrete or continuous. A discrete distribution means that X can assume one of a countable (Finite) number of values. A continuous distribution means that X can assume one of a uncountable (Infinite) number of values. A probability distribution is the function that describes the mapping from any realized value of the random variable, to probability.
  • 40. 1.Discrete probability distribution: Three frequently used discrete distribution are: i) The Binomial distribution: is used to compute probabilities for a process where only one of two possible outcomes may occur on each trial. Example, Here are some examples of Binomial distribution: Rolling a die: Probability of getting the number of six (6) (0, 1, 2, 3…50) while rolling a die 50 times; Here, the random variable X is the number of “successes” that is the number of times six occurs. The probability of getting a six is 1/6. ii)The geometric distribution: You use this distribution to determine the probability that a specified number of trails will take place before the first success occurs. Example, Let’s say, the probability that an athlete achieves a distance of 6m in long jump is 0.7. Geometric distribution can be used to determine probability of number of attempts that the person will take to achieve a long jump of 6m. In the second attempt, the probability will be 0.3 * 0.7 = 0.21 and the probability that the person will achieve in third jump will be 0.3 * 0.3 * 0.7 = 0.063 ii)The Poisson distribution: is used to measure the probability that a given number of events will occur during given time frame. Example, Let’s say that the number of buses that come on a bus stop in span of 30 minutes is 1. Poisson distribution can be used to model the probability of different number of buses, X, coming to the bus stop within the next 30 minutes where X can take value of 0, 1, 2, 3, 4.
  • 41. 2. Continuous probability distribution: i)Uniform distribution: In statistics, the uniform distribution is a type of probability distribution in that all the possible outcomes are equally possible. A deck of cards has uniform distributions within it since the probability of drawing a heart, club, diamond or spade is equally possible. ii)Normal Distribution: The normal distribution is the most important probability distribution in statistics because it fits many natural phenomena. For example, heights, blood pressure, measurement error, and IQ scores follow the normal distribution. It is also known as the Gaussian distribution and the bell curve. In a normal distribution, data is symmetrically distributed with no skew.
  • 42. Correlation  If the change in one variable appears to be accompanied by a change in other variable, the two variables are said to be correlated and this inter-dependence is called correlation or co-variation.  Correlation analysis is a method of statistical evaluation used to study the strength of relationship between two, numerically measured, continuous variables (e.g. height and weight) type of analysis is useful when we want to establish if there are possible connection between variables.  In short, the tendency of simultaneous variation between two variables is called correlation or co-variation.  If correlation is found between two variables it means that when there is a systematic change in one variable, there is also a systematic change in the other; the variables alter together over a certain period of time.  If there is correlation found, depending upon the numerical values measured, this can be either positive or negative.  The knowledge of correlation gives us an idea of the direction and intensity of change in a variable when the correlated variable changes.
  • 43.  Correlation denotes the interdependency among the variables for correlating two phenomenon, it is essential that the two phenomenons should have cause-effect relationship and if such relationship does not exist then the two phenomenons cannot be correlated.  If two variables vary in such a way that movement in one are accompanied by movement in other, these variables are called cause and effect relationship.  Causation always implies correlation but correlation does not necessarily imply causation. Because there is strong positive or strong negative correlation between two variables, this does not mean that one variable is caused by the other variable. A strong correlation never implies a cause-effect relationship between two variables.  co-efficient of correlation:  To measure the degree of association or relationship between two variables quantitatively of relationship is used and is termed as co-efficient of correlation.  Co-efficient of correlation is a numerical index that tells us to what extent the two variables are related and to what extent the variations in one variable changes with the variations in the other. The co-efficient of correlation is always symbolized either by r or p (Rho) range from(-1 <=r>=1)
  • 44.  Techniques for Measuring Correlation:  Three important statistical tools used to measure correlation are: Scatter diagrams, Karl Pearson's coefficient of correlation, and Spearman's rank correlation.  1. Scatter Diagram:  • A scatter diagram visually presents the nature of association without giving any specific numerical value. In this technique, the values of the two variables are plotted as points on a graph paper.  From a scatter diagram, one can get a fairly good idea of the nature of relationship. In a scatter diagram the degree of closeness of the scatter points and their overall direction enable us to examine the relationship.  If all the points lie on a line, the correlation is perfect and is said to be unity. If the scatter points are widely dispersed around the line, the correlation is low.  The correlation is said to be linear if the scatter points lie near a line or on a line. Scatter diagrams spanning in Fig. give us an idea of the relationship between two variables.
  • 45.  2. Karl Pearson's Coefficient of Correlation:  A numerical measure of linear relationship between two variables is gi coefficient of correlation.  A relationship is said to be linear if it can be represented by a straight line. product moment correlation and simple correlation coefficient.  It gives a precise numerical value of the degree of linear relationship between two The linear relationship may be given by Y = a + bX.  This type of relation may be described by a straight line. The intercept that line makes on Y axis is given by a and the slope of the line is given by b. It gives the change in the value of Y for very small change in the value of X. On the other hand, if the relation cannot be represented by straight line as in Y = X the value of the coefficient will be zero. It clearly shows that zero correlation need not mean absence of any type of relation between the two variables  The value of the correlation coefficient lies between minus one and plus one, -1 <= r >= 1 .
  • 46. The product moment correlation or the Karl Pearson's measure of correlations
  • 47. Correlation is of following types: 1. Positive correlation:  When the values of one variable increase with that of another are increased. The values of two variables are changing with same direction. The high numerical values of one variable relate to the high numerical values of the other. i.e. 0<r < 1.  For example, Height and weight, study time and grades. 2. Negative correlation:  When the values of one variable decrease with that of another are increased or vice versa. The values of variables change with opposite direction. i.e. the high numerical values of one variable relate to the low numerical values of the other. i.e. -1<r<0.  For example, Price and quantity demanded, alcohol consumption and driving ability. 3. No Correlation:  There is no impact on one variable with an increase or decrease of values of another variable.If r=0 the two variables are uncorrelated. There is no linear relation between them.
  • 48. 4. Perfect Positive correlation:  When there is a change in one variable, and if there is equal proportion of change in the other variable say Y in the same direction, then these two variable are said to have a Perfect Positive Correlation. i.e. r= 1. 5. Perfectly Negative correlation:  Between two variables X and Y. if the change in X causes the same amount of change in Y in equal proportion but in opposite direction, then this correlation is called as Perfectly Negative correlation. r = -1.  If there is correlation between two numerical sets of data, positive or negative, the coefficient worked out can allow you to predict future trends between the two variables. However, you must remember that you cannot be 100% sure that your prediction will be correct because correlation does not determine cause or effect.
  • 49.
  • 50. 3. Spearman's Rank Correlation:  Spearman's coefficient of correlation measures the linear association between ranks assigned to individual items according to their attributes.  Attributes are those variables which cannot be numerically measured such as intelligence of people, physical appearance, honesty, etc. Ranking may be a better alter native to quantification of qualities. 
  • 51. Regression:  Regression analysis is a statistical tool used for the investigation of relationships between variables. It is a method of predicting or estimating one variable knowing the value of the other variable.  Estimation is required in different fields in everyday life. A businessman wants to know the effect of increase in advertising expenditure on sales or a doctor wishes to observe the effect of a new drug on patients.  An economist is interested in finding the effect of change in demand pattern of some commodities on prices. Usually, we seek to ascertain the causal effect of one variable upon another.  We use a regression model to understand how changes in the predictor values are associated with changes in the response mean. Regression analysis helps in determining the cause and effect relationship between variables.  We can also use regression to make predictions based on the values of the predictors. It plays a significant role in many human activities, as it is a powerful and flexible tool which used to forecast the past, present or future events on the basis of past or present events.
  • 52.  Regression analysis is also used to find trends in data. It will provide you with an equation for a graph so that you can make predictions about your data.  For example, you might guess that there is a connection between how much you eat and how much you weigh; regression analysis can help you to quantify that.  If you have been putting on weight over the last few years, it can predict how much you will weigh in ten years time if you continue to put on weight at the same rate. It will also give you a slew of statistics to tell you how accurate your model is.  Thus, regression analysis models the relationships between a response variable and one or more predictor variables. In simple words, regression analysis is used to model the relationship between a dependent variable and one or more independent variables.  Response variables are also known as dependent variables, Regressand, y-variables, and outcome variables. Typically, you want to determine whether changes in the predictors are associated with changes in the response.  Predictor variables are also known as independent variables, Regressor, x-variables, and input variables. A predictor variable explains changes in the response. Typically, you want to determine how changes in one or more predictors are associated with changes in the response.
  • 53.
  • 54. For example, in a plant growth study, the response variable is the amount of growth that occurs during the study. The investigators want to determine how changes in the predictors are associated with changes in plant growth. The predictors are the amount of fertilizer applied, the soil moisture, and the amount of sunlight.
  • 55. Definition:  “The statistical technique that expresses a functional relationship between two or more variables in the form of an equation, to estimate the value of a variable, based on the given value of another variable is called regression analysis".  The variable whose value is to be estimated is called dependent variable and the variable whose value is used to estimate this value is called independent variable.  The linear algebraic equations that express a dependent variable in terms of an independent variable are called Linear Regression Equation.  In terms of statistical inference, regression analysis is concerned with the parameters of the regression equation that obtains between two or more variables in the population.  There are a variety of regression methodologies that you choose based on the type of response variable, the type of model that is required to provide an adequate fit to the data, and the estimation method.
  • 56. The overall objectives of regression analysis can be summarized as follows: 1. To determine whether or not a relationship exists between two variables. 2. To describe the nature of the relationship, should one exist, in the form of a mathematical equation. 3. To assess the degree of accuracy of description or prediction achieved by the regression equation. 4. In the case of multiple regression, to assess the relative importance of the various predictor variables in their contribution to variation in the criterion variable. Types of Regression Models
  • 57. The two basic types of regression analysis are: 1. Simple Regression Analysis:  It is used to estimate the relationship between a dependent variable and a single independent variable. Regression models that involve one explanatory variable are called Simple Regression. .  For example, the relationship between crop yields and rainfall. 2. Multiple Regression Analysis:  It is used to estimate the relationship between a dependent variable and two or more independent Variables.  When two or more explanatory variables are involved, the relationships are called Multiple Regressions.  For example, the relationship between the salaries of employees and their experience and education.  Multiple regression analysis introduces several additional complexities but may produce more realistic results than simple regression analysis. . Regression models are also divided into linear and nonlinear models, depending on whether the relationship between the response and explanatory variables is linear or nonlinear.  In a simple linear regression, there are two variables x and y, wherein y depends on x or say influenced by x. Here y is called as dependent, or criterion variable and x is independent or predictor variable.
  • 58.
  • 59.  The regression line of y on x is expressed as under: y = a + bx  where, a = constant, b = regression coefficient, In this equation, a and b are the two regression parameters. While there are a number of possible criteria for choosing a best- fitting line, one of the most useful is the least squares criterion.  The slope b of the best-fitting line, based on the least squares criterion, can be shown be where the summation is overall n pairs of (x1, y1) values. The value of a, the y-intercept, can be turn be shown to be a function of b, x and ý i.e. a = y - bx
  • 60.  We can observe in following plot linear relationship the mileage and displacement of cars. The green points are actual observations while the black line fitted is the line of regression.  Regression Analysis:
  • 61. Steps in Regression Analysis: Regression analysis includes the following steps: Step 1: Statement of the Problem under Consideration:  The first important step in conducting any regression analysis is to specify the problem and the objectives to be addressed by the regression analysis.  The wrong formulation or the wrong understanding of the problem will give the wrong statistical inferences. The choice of variables depends upon the objectives of study and understanding of the problem. Step 2: Choice of Relevant Variables:  Once the problem is carefully formulated and objectives have been decided, the next question is to choose the relevant variables.  It has to kept in mind that the correct choice of variables will determine the statistical inferences correctly.  For example, in any agricultural experiment, the yield depends on explanatory variables like quantity of fertilizer, rainfall, irrigation, temperature etc. These variables are denoted by X. X. ..., X, as a set of k explanatory variables.
  • 62. Step 3: Collection of Data on Relevant Variables:  Once the objective of study is clearly stated and the variables are chosen, the next question arises is to collect data on such relevant variables. The data is essentially the measurement on these variables  For example, suppose we want to collect the data on age. For this, it is important to know how to record it. Then either the date of birth can be recorded which will provide the exact age on any specific date or the age in terms of completed years as on specific date.  Moreover, it is also important to decide that whether the data has to be collected on variables as quantitative variables or qualitative variables.  Examples of quantitative variables include height and weight, while examples of qualitative variables include hair color, religion and gender. Quantitative variables are often represented in units of measurement, and qualitative variables are represented in non- numerical terms.
  • 63. Step 4: Specification of Model:  The experimenter or the person working in the subject usually helps in determining the form of the model. Only the form of the tentative model can be ascertained and it will depend on some unknown parameters. For example, a general form will be like y = f(X1, X2, ..., Xk; B1, B2, ... Bk)+ € where € is the random error reflecting mainly the difference in the observed value of y and the value of y obtained through the model. The form of f (X1, X2, ..., Xk, B1, B2, B2, ..., Bk) can be linear as well as nonlinear depending on the form of parameters (B1, B2, ..., Bk). A model is said to be linear if it is linear in parameters. For example, y = B X + B X + B X + € y = B + B ln X + € ,are linear models whereas, y = B X + B X + B X + € y = (In B1) X + B X + € ,are non-linear models.
  • 64.  Step 5: Choice of Method for Fitting the Data:  After the model has been defined and the data have been collected, the next task is to estimate the parameters of the model based on the collected data. This is also referred to as parameter estimation or model fitting.  Parameter estimation (also called coefficient) are the change in the response associated with a one-unit change of the predictor, all other predictors being held constant.  The most commonly used method of estimation is the least squares method. Under certain assumptions, the least squares method produces estimators with desirable properties. The other estimation methods are the maximum likelihood method, ridge method, principal components method etc.
  • 65.  Step 6: Fitting of Model:  The estimation of unknown parameters using appropriate method provides the values of the parameters. Substituting these values in the equation gives us a usable model. This is termed as model fitting.  The estimates of parameters B1,…., Bk in the model, y = f(X1, X2, ..., XK, B1, B2, ..., Bk) + €  are denoted as ßo, ß1, ..., Bk which gives the fitted model as  y = f(X1, X2, ..., Xk , ßo, Bi.... , ßk)  When the value of y is obtained for the given values of X1, X2, ..., Xk, it is denoted as y and called as fitted value.  The fitted equation is used for prediction. In this case, Ÿ is termed as predicted value. Note that the fitted value is where, the values used for explanatory variables correspond to one of the n observations in the data whereas predicted value, is the one obtained for any set of values of explanatory variables. It is not generally recommended to predict the y - values for the set of those values of explanatory variables which lie outside the range of data. When the values of explanatory variables are the future values of explanatory variables, the predicted values are called forecasted values.
  • 66. Step 7: Model Validation and Criticism:  The validity of statistical methods to be used for regression analysis depends on various assumptions. These assumptions are essentially the assumptions for the model and the data.  The quality of statistical inferences heavily depends on whether these assumptions are satisfied or not. For making these assumptions to be valid and to be satisfied, care is needed from the beginning of the experiment.  One has to be careful in choosing the required assumptions and to examine whether the assumptions are valid for the given experimental conditions or not. It is also important to decide the situations in which the assumptions may not meet.  The validation of the assumptions must be made before drawing any statistical conclusion. Any departure from validity of assumptions will be reflected in the statistical inferences. In fact, the regression analysis is an iterative process where the outputs are used to diagnose, validate, criticize and modify the inputs.
  • 67.  Step 8: Using the Chosen Model(s) for the Solution of the posed problem and forecasting:  The determination of explicit form of regression equation is the ultimate objective of regression analysis. It is finally a good and valid relationship between study variable and explanatory variables  The regression equation helps in understanding the interrelationships among the variables. Such regression equation can be used for several purposes.  For example, to determine the role of any explanatory variable in the joint relationship in any policy formulation, to forecast the values of response variable for given set of values of explanatory variables.
  • 68.
  • 69.  • Applications or uses of Regression Analysis:  1. Predictive Analytics:  Predictive analytics i.e. forecasting future opportunities and risks is the most prominent application of regression analysis in business. Demand analysis, for instance, predicts the number of items which a consumer will probably purchase.  • However, demand is not the only dependent variable when it comes to business. Regression analysis can go far beyond forecasting impact on direct revenue.  • For example, Insurance companies heavily rely on regression analysis to estimate the credit standing of policyholders and a possible number of claims in a given time period.  2. Operation Efficiency:  • Regression models can also be used to optimize business processes. A factory manager, for example, can create a statistical model to understand the impact of oven temperature on the shelf life of the cookies baked in those ovens. • In a call center, we can analyze the relationship between wait times of callers and number of complaints.Data-driven decision making eliminates guesswork, hypothesis and corporate politics from decision making.  • This improves the business performance by highlighting the areas that have the maximum impact on the operational efficiency and revenues.
  • 70.  3. Supporting Decisions:  Today businesses are overloaded with data on finances, operations and customer purchases. Increasingly, executives are now leaning on data analytics to make informed business decisions.  Regression analysis can bring a scientific angle to the management of any businesses. By reducing the tremendous amount of raw data into actionable information, regression analysis leads the way to diving into execution smarter and more accurate decisions. This technique acts as a perfect tool to test a hypothesis before diving execution.  4. Correcting Errors:  Regression is not only great for lending empirical support to management decisions but also for identifying errors in judgment hopping hours will greatly increase sales.  For example, a retail store manager may believe that extending • Regression analysis, however, may indicate that the increase in revenue might not be sufficient to support the rise in operating expenses due to longer working hours (such as additional employee labor charges).  Hence, regression analysis can provide quantitative support for decisions and prevent mistakes due to manager's intuitions.
  • 71. 5. New Insights:  • Over time businesses have gathered a large volume of unorganized data that has the potential to yield valuable insights. However, this data is useless without proper analysis.  • Regression analysis techniques can find a relationship between different variables by uncovering patterns that were previously unnoticed.  • For example, analysis of data from point of sales systems and purchase accounts may highlight market patterns like increase in demand on certain days of the week or at certain times of the year. You can maintain optimal stock and personnel before a spike in demand arises by acknowledging these insights.
  • 72. Sr,No. Basis for comparison Correlation Regression 1 Meaning Correlation is a statistical measures which determines co-relationship association of two variables Regression describes how an independent variable is numerically related to the dependent variable 2 Usage TO represent linear relationship between two variables To fit a best line and estimate onr variable on the basis of another variable. 3 Dependent and independent variable No difference Both variables are different 4 Indicates Correlation coefficient indicates the extent to which two variables move together. Regression indicates the impact of a unit changes in the known variable(x) on the estimated variable(y). 5 Objective To find a numerical value expressing the relationship variables To estimate values of random variable on the basis of the values of fixed variable.