2. DATA SET
• A data set contains informations about a sample. A Dataset consists
of cases. In most datasets, we have cases and variables.
3. Case
• Cases refer to the individuals / objects in a dataset.
• When data are collected from humans, we sometimes call them participants.
When data are collected from animals, the term subjects is often used. Another
synonym is experimental unit. So, cases are nothing but the objects in the
collection.
• In a study, cases can be many different things. They can be individual patients
and group of patients. But they can also be, for instance, companies, schools or
countries etc.
• Each case has one or more attributes or qualities, called variables which are
characteristics of cases.
4. Variable
• Variables are Characteristic of cases that can take on different values (in other
words, something that can vary)
• Variables are the attributes. They are the characteristics or qualities of a person,
animal, or object that you can count or measure.
• For example, a person’s age, a dog’s weight, or the height of a building,
income, province or country of birth, grades obtained at school and type of
housing are all examples of variables.
• The value of the variable can vary, or change. It can take on different values.
• We can also have a constant. The value of the constant is the fixed for all cases in
the study.
5. Illustration: The following dataset contains 10
cases and 3 variables which we measure for
each case:
• Each case has 3 variables
• Each case has a value for each of the 3
variables (points, assists, and rebounds).
6. Problem: Study Time & Grades
• A teacher wants to know if third grade students who spend more time
reading at home get higher homework and exam grades.
• Point out the case, variable and constant.
Problem: Dog Food
• A researcher wants to know if dogs who are fed only canned food have
different body mass indexes (BMI) than dogs who are fed only hard
food. They collect BMI data from 50 dogs who eat only canned food and 50
dogs who eat only hard food.
• Point out the case, variable and constant.
Problem: Age & Weight of Sea Otters
• Researchers are studying the relationship between age and weight in a
sample of 100 male sea otters (Enhydra lutris).
• Point out the case, variable and constant.
7. Problem: Study Time & Grades
• The students are the cases. There are three variables: amount of time
spent reading at home, homework grades, and exam grades. The grade-
level of the students is a constant because all students are in the third
grade.
Problem: Dog Food
• The cases are the dogs. There are two variables: type of food
and BMI. A constant would be subspecies, because all cases are domestic
dogs.
Problem: Age & Weight of Sea Otters
• The 100 otters are the cases. There are two variables: age and weight.
Biological sex is a constant because all subjects are male. Species is also
a constant.
8. Types of Variables
• We can have many, different kinds of
variables, representing different
characteristics. Because of this reason
there are various level of
measurements or different types of
variables.
• Variables may be classified into two
main categories: categorical and
numeric.
• Each category is then classified in two
subcategories: nominal or ordinal for
categorical variables, discrete or
continuous for numeric variables.
9. Categorical Variables:
• A Categorical variable (also called qualitative variable) refers to a characteristic
that can’t be quantified.
• Categorical variables can be either nominal or ordinal.
• Categorical variables provide groupings that may have
• no logical order, or a
• logical order with inconsistent differences between groups (e.g., the
difference between 1st place and 2 second place in a race is not equivalent to
the difference between 3rd place and 4th place).
10. Example: Favorite Ice Cream Flavor
• A teacher conducts a poll in her class. She asks her students if they would prefer
chocolate, vanilla, or strawberry ice cream at their class party. Preferred ice
cream flavor is a categorical variable because the different flavors are categories
with no meaningful order of magnitudes.
Example: Highest Level of Education
• A census asks residents for the highest level of education they have obtained: less
than high school, high school, 2-year degree, 4-year degree, master's degree,
doctoral/professional degree. This is a categorical variable. While there is a
meaningful order of educational attainment, the differences between each
category are not consistent. For example, the difference between high school and
2-year degree is not the same as the difference between a master's degree and a
doctoral/professional degree. Because there are not equal intervals, this variable
cannot be classified as quantitative.
11. Nominal Variable:
• A nominal variable is made up of
various categories which has no
order.
• Each category differs from each
other AND there is no ranking
order.
• A nominal variable is one that
describes a name, label or category
without natural order. Sex and type
of dwelling are examples of nominal
variables.
Example:
• Gender of a patient may be Male or
Female or State where they live in.
• Similarly, in the example here, the
variable “mode of transportation for
travel to work” is also nominal.
12. Ordinal Variable:
• An ordinal variable is a variable whose values
are defined by an order relation between
the different categories. There is not only a
difference between the categories of a
variable; there is also an order.
• An example might be Highest paid, Average
Paid and Lowest Paid employee.
• Similarly, in the example here, the variable
“behaviour” is ordinal because the category
“Excellent” is better than the category “Very
good,” which is better than the category
“Good,” etc. There is some natural ordering,
but it is limited since we do not know by how
much “Excellent” behaviour is better than
“Very good” behaviour.
13. Numeric Variables:
• A numeric variable (also called quantitative variable) is a quantifiable
characteristic whose values are numbers (except numbers which are codes
standing up for categories).
• Numerical values can be placed in a meaningful order with consistent intervals.
• For example, weight, height, and length can all be written numerically.
• Numeric variables may be either continuous or discrete.
14. Example: Weight
• A team of medical researchers weigh participants in kilograms. Weight in
kilograms is a quantitative variable because it takes on numerical values
with meaningful magnitudes and equal intervals.
Example: Running Distance
• A runner records the distance he runs each day in miles. Distance in miles is
a quantitative variable because it takes on numerical values with
meaningful magnitudes and equal intervals.
Example: Children per Household
• A census asks every household in a city how many children under the age
of 18 reside there. Number of children in a household is
a quantitative variable because it has a numerical value with a meaningful
order and equal intervals.
15. Continuous Variable:
• A variable is continuous if the possible values of the variable form an
interval. It can take any value in an interval.
• An example is, the height of a patient. Someone can be 172
centimeters tall and 174 centimeters tall. But also, 170.2461.
• Here, we don’t have a set of separate numbers, but an infinite region
of values.
• So, a variable holding any value between its maximum value and its
minimum value is called a continuous variable. Otherwise, it is a
discrete variable.
16. Discrete Variable:
• A variable is discrete if its possible categories form a set of separate
numbers.
• Data that deals with counting. Counting is done with whole numbers like 0,
1, 2, 3.
• Illustration: Suppose you are collecting information about cancer patients.
Now for each and every cancer patient you want to know the below
information.
• Sample code number: id number
• Clump Thickness: 1 – 10
• Uniformity of Cell Size: 1 – 10
• Class: (2 for benign, 4 for malignant)
• Here, the cancer patients themselves are cases and all these characteristics
of the patients are variables.
• Uniformity of Cell Size: 1 – 10 is an example of discrete variable.
17. Value
Values are the possible outcomes for a single variable. They are different for different cases. Values
can be numbers or named categories. For example, the variable “Gender” has two values “male”
and “female”. Here, some people (cases) are men and some are women.
Example: People
• Cases: Individuals such as X, Y, Z
• Variables: The respective gender is the variable since it is varying among the individuals, Male = 1
and Female = 2
• Values: Values for the variable “Gender” are 1 and 2 which are assigned or coded for our
convenience.
Example: Income - In the case of “income”, the values are the actual numbers
• Case: X and Y
• Variable: Income
• Value: Rs. 25,000/- and 30,000/-