SlideShare a Scribd company logo
1 of 28
INTRODUCTION
Review of Statistics. Stata and Excel introduction
DESCRIPTIVE STATISTICS
• Mean – arithmetic mean, arithmetic average.
• Sum of the data values divided by the number of observations
• Mode
• Median
• Minimum, maximum
• Variance
• Standard deviation
MEAN
• Mean - arithmetic mean, arithmetic average. Sum of the data values divided by the
number of observations
• Example: Calculate the mean for the hypothetical data for shipments of peanuts from a
U.S. exporter to five Canadian cities
• Montreal – 640,000 pounds
• Ottawa – 15,000 pounds
• Toronto – 285,000 pounds
• Vancouver – 228,000 pounds
• Winnipeg – 45,000 pounds
• Notes: Σ means sum, unit of observation here is a Canadian city
MEAN CONT’D
• In excel:
• Click on fx and find the function name or type in
• =average(range of data)
• In Stata:
• Import data by clicking on “file” (upper left corner) -> “import” ->pick the format of the file->
find it by clicking ”browse”-> tick the box “import first row as variable names”
• Mean(peanuts)
STATA
• Stata is a powerful tool for researchers and applied economists.
• Infinitely extensible, gives users the same development tools used by the company’s professional
programmers
• Google is your best friend
• Stata has a few windows:
• bottom middle is the command window – this is where you type in the commands;
• top middle – the commands that you submitted appear and so does the output;
• left – all of the commands you have run;
• right – all of the variables you have in your dataset
• To view your dataset you can click on “data editor” or “data browser”
STATA
• Right now there is no data in Stata. We first have to upload the data to it. The way you
upload data into Stata (or any other type of statistical software) depends on the type of
data file you have
• Text data, such as comma-delimited files (.csv)
• Excel files (.xlsx)
• Stata files (.dta)
• Please find the dataset “grades” on blackboard. What type of file is it?
• Stata: file-> import->type of file. Please tick “import first row as variable names”
• If you want to upload a different dataset to work with it, type in “clear” in the command window
STATA LOGS AND DO-FILES
• log – records your work in Stata, start before you do anything else!
• .do file – lets you record a series of commands
• Try to make your own log and .do file
• Click on “log” -> “begin” ->give it a name ->save in the location convenient for you (this starts a
log, when you exit Stata the log will automatically save).
• Click on “do-file editor” start typing up commands. You would save it like any other document
(”save” -> give it a name, save in a convenient location).
• To run the commands in the do-file simply click ”run” at the top of the do-file
EXAMPLE
• Calculate mean for the student grades in excel and in Stata
• You will find the data set “grades” on blackboard
• Make sure your work in Stata is recorded in a log
• What is the unit of observation in the dataset (i.e. whose grades are these)?
• How many observations are there?
• What is the average grade in that class?
SMALLEST AND LARGEST OBSERVATION
• You might be wondering if anyone got 100 in the class, or what the highest grade in the class
was and possibly the lowest.
• We can do so by looking at the data, by sorting data, and by using minimum and maximum
functions in Excel and Stata
• To sort data:
• In Excel: highlight the data you want to sort, “data” -> “sort”
• In Stata: sort ’variablename’
• gsort +’variablename’ or –’variablename’
• Once you have sorted the data you can see what the first and last observations are
• Functions in Excel: =min(data), =max(data)
• Functions in Stata: summarize ‘variablename’
• Minimum and maximum let you know if you have outliers in your data or there are certain
problems with your data
APPLICATION 1. USE EXCEL
• Use UNRATE – unemployment rate dataset to find out the…
• Average unemployment rate between 1948 and 2018
• What was the maximum and minimum unemployment rate during that period?
• Any thoughts on your findings?
• TIP… Stata has an API with Fred. There are two ways of accessing the FRED database…
• Freduse command (might need to be installed)…. freduse UNRATE, clear
• File >> Import >> Federal Reserve Economic Database
APPLICATION 2. USE GRADES2 TO ANSWER THE
FOLLOWING
• In Stata:
• What is the minimum grade in that class?
• What is the maximum grade in that class?
• What is the average grade in that class?
• How do the minimums, maximums, and averages compare across the two classes?
STANDARD DEVIATION
• I want to calculate how dispersed the students’ grades are compared to the average
grade in the class
• Standard deviation (square root of variance) – spread of the observations around the
mean value
• Why is it useful? We can find out how much the data fluctuates around the mean in a
dataset and compare datasets, it also lets us know if there are any outliers in a dataset
so we can get rid of them.
• Examples: income in different cities, unemployment in different regions, return on
different companies’ stock,
STANDARD DEVIATION CONT’D
• In Excel the function for standard deviation is: =stdev(data)
• In Stata standard deviation is the part of summarize command output
STANDARD DEVIATION APPLICATIONS
• Find the standard deviation for both of the classes and compare them. What conclusion
can you draw?
• What was the standard deviation of the unemployment rate before and after outliers
were corrected? What conclusion can you draw?
VARIANCE
• Closely tied to standard deviation
• Variance = squared standard deviation
• Measure of how far away the observations are in a dataset from the mean
• To find variance in excel: =var(datarange)
• To find variance in Stata: have to square standard deviation by hand or use display r(Var)
after summarize command
• Stata retains a number of calculations (behind the scenes).
• return list
• There are other tools for calculating summary statistics…
• Help tabstat
• tabstat UNRATE, s(var)
USING STATA AND EXCEL AS A CALCULATOR
• To find variance you can always square standard deviation
• di r(Var)
• di r(sd)^2
• To use excel as a calculator you have to type in “=“ into a cell and then what you are
trying to calculate
• In Stata you have to type in the word ”display” and then what you are trying to calculate
• For example, if standard deviation is 1.6 then to calculate variance in
• Excel: =1.6^2 (or =1.6*1.6)
• Stata: display 1.6^2 (or display 1.6*1.6)
CREATING A NEW VARIABLE
• You can create new variables in Excel and Stata. This skill will be useful later on in the
class
• For now lets imagine the professor gives everyone in the first class a 1% curve and
calculate their grades
• In excel in a new cell type in: =”cell with data”+1, hover over bottom right corner of the
new cell and double click, the column should populate with calculated values. What is
the class average now once everyone received extra credit?
• Let’s import the grades into Stata and do the same. To create a new variable:
• generate var=classgrade+1
BAR CHARTS
• You would like to find out how many people in the class received an A, B, C, and D.
• The best way to look at that is to create a distribution chart (histogram) that will show
how many received each grade
• In Excel highlight the data->insert->histogram->right-click on the x-axis label to change
number of bins and their range
• In Stata click on graphics->histogram. There are many options, let’s go through some of
them
• Variable – classgrade
• Width of bins – 10 (this is how “wide” each grade category is)
• Lower limit of first bin – 60 (assuming no one failed the class)
• Y-axis – frequency
BAR CHARTS CONT’D
• We can create bar charts to compare the same variable over time (i.e. unemployment) or
across different units (i.e. income across different cities)
• Let’s create an overtime bar chart using unemployment rate data in excel
• Highlight unemployment rate column by clicking on column name twice
• Click “insert” (top right)->pick bar chart (2D column)
• Left click on x-axis labels->select data->edit->select range (years column) by
highlighting it
• To add labels to the axes, click on the chart->”+” symbol at the right corner-> tick axis
titles->type the titles into the boxes
LINE CHARTS
• Showing the progression of a variable overtime is easier with a line chart
• Load unemployment rate to Stata
• Click “graphics” on the top left -> twoway graph->create->line plot type-> Y-variable is
unemployment rate, X-variable is year->submit
• To save your graph - > file->save as-> pick the type that will make it easy for you to
open the graph
• https://fred.stlouisfed.org/series/UNRATE/ compare your graphs to FRED data
GDP OVERTIME IN US, MEXICO, AND CANADA
• Please google “GDP per capita by country world bank” -> pick the one in current US$
(why do we have to use GDP per capita in current dollars? ) ->Download the csv file
• Use ctrl-F to find GDP for US, Mexico and Canada. Copy and paste into a new document
each country’s GDP
• Delete third and fourth columns
• Create a line chart. What conclusion can we draw about the relative economic growth of
these countries?
CORRELATION
• Is it possible to improve your score during the semester or is the grade on the first exam
closely related to the grade at the end of the semester?
• Use grades3.xlsx data set to be able to answer this question
• Import the dataset into stata. We are going to plot the observed points on a graph
where the axes are: exam grade and class grade
• To do so type in: scatter(exam1 classgrade)
• We can tell that there is a positive relationship between the two variables
• The graph that you created is called a scatterplot. By looking at scatterplots we can kind
of tell if there is a relationship between different variables in the data. We can also make
an educated guess whether the relationship between the two variables is positive or
negative by looking at a scatterplot
• Can you think of two variables that might be positively or negatively related?
CALIFORNIA SCHOOL’S DATASET
• The data set includes data on California’s school districts in 1998-1999 school year
• It includes average test scores for 5th grades in each school district
• The description of the data set is in the word document titled “California Test Scores”
• Let’s look at the relationship between total enrollment and testscores
• Stata: scatter testscr enrl_tot
• Take a look at the data description and think of what could be related to the test scores?
Is it a positive or a negative relationship?
CORRELATION COEFFICIENT
• We don’t have to guess whether there is a relationship between two variables and
whether the relationship is positive or negative
• We will use something called “correlation coefficient” (usually denoted r) to answer that
• If r is between 0 and 1 the relationship is positive
• If r is between -1 and 0 the relationship is negative
• The closer the absolute value of r to 1is, the stronger the relationship
• The closer the absolute value of r to 0 is, the weaker the relationship
• In stata to find the correlation coefficient type in: correlate variable1 variable2
• In excel to find the correlation coefficient type in: =correl(variable 1 variable2)
DO IT YOURSELF TIME
• Try to create a scatterplot for the grades3 dataset in excel
• Hint: a scatterplot is just a type of chart, your steps would be similar to creating a bar
chart in excel
• Try to find the correlation coefficient for the grades3 dataset in excel (on slido)
• Hint: the correlation coefficient is a type of function. This should be similar to finding an
average or a standard deviation in excel.
LINE OF BEST FIT
• Line of best fit is the line that best represents all of the data points on a scatterplot
• Like any straight line it has an intercept and a slope
• The equation of a straight line is: y=mx+b
• Where b – intercept with the y-axis, m – the slope of the line
• If the line of best fit for a scatterplot is y=-3x+2, this means that 2 – intercept with the Y-
axis and 3 – slope of the line.
• When x = 0, y = 2
• Since the slope is negative the relationship between the two variables is negative.
EXAMPLE: LINE OF BEST FIT FOR CLASS GRADES
• Once you have created a scatterplot in excel you can add the line of best fit to it
• Click on the “+” in the upper-right corner, tick “trendline”
• You can see that the line of best fit is upward-sloping => the relationship between the
two variables is positive
• To find out the equation of the line left-click on it ->format->display equation on chart
• What are the intercept and the slope of the line? What conclusion can we draw from
knowing those numbers?
• Do they make sense?
CONCLUSION
• We have reviewed descriptive statistics. What are some of the descriptive stats we have
discussed?
• How can we find them in excel?
• How can we find them in stata?
• What types of charts have you learned to create? How can you do this in stata/ excel?
• If the correlation coefficient is -1 what does it mean? 0? 0.2?

More Related Content

What's hot

General Linear Model | Statistics
General Linear Model | StatisticsGeneral Linear Model | Statistics
General Linear Model | StatisticsTransweb Global Inc
 
Statistical Methods
Statistical MethodsStatistical Methods
Statistical Methodsguest9fa52
 
descriptive and inferential statistics
descriptive and inferential statisticsdescriptive and inferential statistics
descriptive and inferential statisticsMona Sajid
 
Introduction to Stata
Introduction to StataIntroduction to Stata
Introduction to Stataizahn
 
Data management in Stata
Data management in StataData management in Stata
Data management in Stataizahn
 
How to choose a right statistical test
How to choose a right statistical testHow to choose a right statistical test
How to choose a right statistical testKhalid Mahmood
 
Inferential statictis ready go
Inferential statictis ready goInferential statictis ready go
Inferential statictis ready goMmedsc Hahm
 
"A basic guide to SPSS"
"A basic guide to SPSS""A basic guide to SPSS"
"A basic guide to SPSS"Bashir7576
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsSarfraz Ahmad
 
Stata Training_EEA.ppt
Stata Training_EEA.pptStata Training_EEA.ppt
Stata Training_EEA.pptselam49
 

What's hot (20)

General Linear Model | Statistics
General Linear Model | StatisticsGeneral Linear Model | Statistics
General Linear Model | Statistics
 
Statistical Methods
Statistical MethodsStatistical Methods
Statistical Methods
 
DATA Types
DATA TypesDATA Types
DATA Types
 
descriptive and inferential statistics
descriptive and inferential statisticsdescriptive and inferential statistics
descriptive and inferential statistics
 
Introduction to Stata
Introduction to StataIntroduction to Stata
Introduction to Stata
 
Data management in Stata
Data management in StataData management in Stata
Data management in Stata
 
How to choose a right statistical test
How to choose a right statistical testHow to choose a right statistical test
How to choose a right statistical test
 
Measures of dispersion
Measures  of  dispersionMeasures  of  dispersion
Measures of dispersion
 
Binomial distribution
Binomial distributionBinomial distribution
Binomial distribution
 
Statistical analysis
Statistical  analysisStatistical  analysis
Statistical analysis
 
Uses of SPSS and Excel to analyze data
Uses of SPSS and Excel   to analyze dataUses of SPSS and Excel   to analyze data
Uses of SPSS and Excel to analyze data
 
Inferential statictis ready go
Inferential statictis ready goInferential statictis ready go
Inferential statictis ready go
 
"A basic guide to SPSS"
"A basic guide to SPSS""A basic guide to SPSS"
"A basic guide to SPSS"
 
Spss tutorial 1
Spss tutorial 1Spss tutorial 1
Spss tutorial 1
 
Poisson regression models for count data
Poisson regression models for count dataPoisson regression models for count data
Poisson regression models for count data
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
(Manual spss)
(Manual spss)(Manual spss)
(Manual spss)
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Stata Training_EEA.ppt
Stata Training_EEA.pptStata Training_EEA.ppt
Stata Training_EEA.ppt
 
Difference-in-Difference Methods
Difference-in-Difference MethodsDifference-in-Difference Methods
Difference-in-Difference Methods
 

Similar to Introduction - Using Stata

Introduction to spss 1
Introduction to spss 1Introduction to spss 1
Introduction to spss 1Michael Taiwo
 
L9 using datawarrior for scientific data visualization
L9 using datawarrior for scientific data visualizationL9 using datawarrior for scientific data visualization
L9 using datawarrior for scientific data visualizationSeppo Karrila
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitizationVenkata Reddy Konasani
 
Data analysis using spss
Data analysis using spssData analysis using spss
Data analysis using spssSyed Faisal
 
Introduction to spss
Introduction to spssIntroduction to spss
Introduction to spssSubodh Khanal
 
1. chapter i(pasw)
1. chapter i(pasw)1. chapter i(pasw)
1. chapter i(pasw)Chhom Karath
 
Beginners SPSS.ppt
Beginners SPSS.pptBeginners SPSS.ppt
Beginners SPSS.pptsayahuwaina
 
Spss by vijay ambast
Spss by vijay ambastSpss by vijay ambast
Spss by vijay ambastVijay Ambast
 
OLD SEVEN TOOLS OF QUALTIY MANAGEMENT
OLD SEVEN TOOLS OF QUALTIY MANAGEMENTOLD SEVEN TOOLS OF QUALTIY MANAGEMENT
OLD SEVEN TOOLS OF QUALTIY MANAGEMENTANNA UNIVERSITY
 
data analysis techniques and statistical softwares
data analysis techniques and statistical softwaresdata analysis techniques and statistical softwares
data analysis techniques and statistical softwaresDr.ammara khakwani
 
MS-EXCEL Assignment Help
MS-EXCEL Assignment HelpMS-EXCEL Assignment Help
MS-EXCEL Assignment HelpRahul Kataria
 
SPSS an intro...
SPSS an intro...SPSS an intro...
SPSS an intro...Jithin Zcs
 
Use of-Excel
Use of-ExcelUse of-Excel
Use of-ExcelBrisbane
 

Similar to Introduction - Using Stata (20)

Introduction
IntroductionIntroduction
Introduction
 
Introduction to spss 1
Introduction to spss 1Introduction to spss 1
Introduction to spss 1
 
L9 using datawarrior for scientific data visualization
L9 using datawarrior for scientific data visualizationL9 using datawarrior for scientific data visualization
L9 using datawarrior for scientific data visualization
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitization
 
Spss (1)
Spss (1)Spss (1)
Spss (1)
 
Data analysis using spss
Data analysis using spssData analysis using spss
Data analysis using spss
 
Introduction to spss
Introduction to spssIntroduction to spss
Introduction to spss
 
1. chapter i(pasw)
1. chapter i(pasw)1. chapter i(pasw)
1. chapter i(pasw)
 
presentation Updated.pdf
presentation Updated.pdfpresentation Updated.pdf
presentation Updated.pdf
 
Beginners SPSS.ppt
Beginners SPSS.pptBeginners SPSS.ppt
Beginners SPSS.ppt
 
IS100 Week 8
IS100 Week 8IS100 Week 8
IS100 Week 8
 
Spss by vijay ambast
Spss by vijay ambastSpss by vijay ambast
Spss by vijay ambast
 
Minitab Seminar1.pptx
Minitab Seminar1.pptxMinitab Seminar1.pptx
Minitab Seminar1.pptx
 
Tqm old tools
Tqm old toolsTqm old tools
Tqm old tools
 
Tqm old tools
Tqm old toolsTqm old tools
Tqm old tools
 
OLD SEVEN TOOLS OF QUALTIY MANAGEMENT
OLD SEVEN TOOLS OF QUALTIY MANAGEMENTOLD SEVEN TOOLS OF QUALTIY MANAGEMENT
OLD SEVEN TOOLS OF QUALTIY MANAGEMENT
 
data analysis techniques and statistical softwares
data analysis techniques and statistical softwaresdata analysis techniques and statistical softwares
data analysis techniques and statistical softwares
 
MS-EXCEL Assignment Help
MS-EXCEL Assignment HelpMS-EXCEL Assignment Help
MS-EXCEL Assignment Help
 
SPSS an intro...
SPSS an intro...SPSS an intro...
SPSS an intro...
 
Use of-Excel
Use of-ExcelUse of-Excel
Use of-Excel
 

More from Ryan Herzog

Chapter 14 - Great Recession
Chapter 14 - Great RecessionChapter 14 - Great Recession
Chapter 14 - Great RecessionRyan Herzog
 
Chapter 13 - AD/AS
Chapter 13 - AD/ASChapter 13 - AD/AS
Chapter 13 - AD/ASRyan Herzog
 
Chapter 12 - Monetary Policy
Chapter 12 - Monetary PolicyChapter 12 - Monetary Policy
Chapter 12 - Monetary PolicyRyan Herzog
 
Chapter 11 - IS Curve
Chapter 11 - IS CurveChapter 11 - IS Curve
Chapter 11 - IS CurveRyan Herzog
 
Chapter 10 - Great Recession
Chapter 10 - Great RecessionChapter 10 - Great Recession
Chapter 10 - Great RecessionRyan Herzog
 
Chapter 9 - Short Run
Chapter 9 - Short RunChapter 9 - Short Run
Chapter 9 - Short RunRyan Herzog
 
Chapter 8 - Inflation
Chapter 8 - InflationChapter 8 - Inflation
Chapter 8 - InflationRyan Herzog
 
Chapter 7 - Labor Market
Chapter 7 - Labor MarketChapter 7 - Labor Market
Chapter 7 - Labor MarketRyan Herzog
 
Chapter 6 - Romer Model
Chapter 6 - Romer Model Chapter 6 - Romer Model
Chapter 6 - Romer Model Ryan Herzog
 
Chapter 5 - Solow Model for Growth
Chapter 5 - Solow Model for GrowthChapter 5 - Solow Model for Growth
Chapter 5 - Solow Model for GrowthRyan Herzog
 
Chapter 4 - Model of Production
Chapter 4 - Model of ProductionChapter 4 - Model of Production
Chapter 4 - Model of ProductionRyan Herzog
 
Chapter 3 - Long-Run Economic Growth
Chapter 3 - Long-Run Economic GrowthChapter 3 - Long-Run Economic Growth
Chapter 3 - Long-Run Economic GrowthRyan Herzog
 
Chapter 2 - Measuring the Macroeconomy
Chapter 2 - Measuring the MacroeconomyChapter 2 - Measuring the Macroeconomy
Chapter 2 - Measuring the MacroeconomyRyan Herzog
 
Topic 7 (questions)
Topic 7 (questions)Topic 7 (questions)
Topic 7 (questions)Ryan Herzog
 
Topic 5 (multiple regression)
Topic 5 (multiple regression)Topic 5 (multiple regression)
Topic 5 (multiple regression)Ryan Herzog
 
Topic 6 (model specification)
Topic 6 (model specification)Topic 6 (model specification)
Topic 6 (model specification)Ryan Herzog
 
Topic 5 (multiple regression)
Topic 5 (multiple regression)Topic 5 (multiple regression)
Topic 5 (multiple regression)Ryan Herzog
 
Topic 4 (binary)
Topic 4 (binary)Topic 4 (binary)
Topic 4 (binary)Ryan Herzog
 

More from Ryan Herzog (20)

Chapter 14 - Great Recession
Chapter 14 - Great RecessionChapter 14 - Great Recession
Chapter 14 - Great Recession
 
Chapter 13 - AD/AS
Chapter 13 - AD/ASChapter 13 - AD/AS
Chapter 13 - AD/AS
 
Chapter 12 - Monetary Policy
Chapter 12 - Monetary PolicyChapter 12 - Monetary Policy
Chapter 12 - Monetary Policy
 
Chapter 11 - IS Curve
Chapter 11 - IS CurveChapter 11 - IS Curve
Chapter 11 - IS Curve
 
Chapter 10 - Great Recession
Chapter 10 - Great RecessionChapter 10 - Great Recession
Chapter 10 - Great Recession
 
Chapter 9 - Short Run
Chapter 9 - Short RunChapter 9 - Short Run
Chapter 9 - Short Run
 
Chapter 8 - Inflation
Chapter 8 - InflationChapter 8 - Inflation
Chapter 8 - Inflation
 
Chapter 7 - Labor Market
Chapter 7 - Labor MarketChapter 7 - Labor Market
Chapter 7 - Labor Market
 
Chapter 6 - Romer Model
Chapter 6 - Romer Model Chapter 6 - Romer Model
Chapter 6 - Romer Model
 
Chapter 5 - Solow Model for Growth
Chapter 5 - Solow Model for GrowthChapter 5 - Solow Model for Growth
Chapter 5 - Solow Model for Growth
 
Chapter 4 - Model of Production
Chapter 4 - Model of ProductionChapter 4 - Model of Production
Chapter 4 - Model of Production
 
Chapter 3 - Long-Run Economic Growth
Chapter 3 - Long-Run Economic GrowthChapter 3 - Long-Run Economic Growth
Chapter 3 - Long-Run Economic Growth
 
Chapter 2 - Measuring the Macroeconomy
Chapter 2 - Measuring the MacroeconomyChapter 2 - Measuring the Macroeconomy
Chapter 2 - Measuring the Macroeconomy
 
Topic 7 (data)
Topic 7 (data)Topic 7 (data)
Topic 7 (data)
 
Inequality
InequalityInequality
Inequality
 
Topic 7 (questions)
Topic 7 (questions)Topic 7 (questions)
Topic 7 (questions)
 
Topic 5 (multiple regression)
Topic 5 (multiple regression)Topic 5 (multiple regression)
Topic 5 (multiple regression)
 
Topic 6 (model specification)
Topic 6 (model specification)Topic 6 (model specification)
Topic 6 (model specification)
 
Topic 5 (multiple regression)
Topic 5 (multiple regression)Topic 5 (multiple regression)
Topic 5 (multiple regression)
 
Topic 4 (binary)
Topic 4 (binary)Topic 4 (binary)
Topic 4 (binary)
 

Recently uploaded

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfSanaAli374401
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 

Recently uploaded (20)

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 

Introduction - Using Stata

  • 1. INTRODUCTION Review of Statistics. Stata and Excel introduction
  • 2. DESCRIPTIVE STATISTICS • Mean – arithmetic mean, arithmetic average. • Sum of the data values divided by the number of observations • Mode • Median • Minimum, maximum • Variance • Standard deviation
  • 3. MEAN • Mean - arithmetic mean, arithmetic average. Sum of the data values divided by the number of observations • Example: Calculate the mean for the hypothetical data for shipments of peanuts from a U.S. exporter to five Canadian cities • Montreal – 640,000 pounds • Ottawa – 15,000 pounds • Toronto – 285,000 pounds • Vancouver – 228,000 pounds • Winnipeg – 45,000 pounds • Notes: Σ means sum, unit of observation here is a Canadian city
  • 4. MEAN CONT’D • In excel: • Click on fx and find the function name or type in • =average(range of data) • In Stata: • Import data by clicking on “file” (upper left corner) -> “import” ->pick the format of the file-> find it by clicking ”browse”-> tick the box “import first row as variable names” • Mean(peanuts)
  • 5. STATA • Stata is a powerful tool for researchers and applied economists. • Infinitely extensible, gives users the same development tools used by the company’s professional programmers • Google is your best friend • Stata has a few windows: • bottom middle is the command window – this is where you type in the commands; • top middle – the commands that you submitted appear and so does the output; • left – all of the commands you have run; • right – all of the variables you have in your dataset • To view your dataset you can click on “data editor” or “data browser”
  • 6. STATA • Right now there is no data in Stata. We first have to upload the data to it. The way you upload data into Stata (or any other type of statistical software) depends on the type of data file you have • Text data, such as comma-delimited files (.csv) • Excel files (.xlsx) • Stata files (.dta) • Please find the dataset “grades” on blackboard. What type of file is it? • Stata: file-> import->type of file. Please tick “import first row as variable names” • If you want to upload a different dataset to work with it, type in “clear” in the command window
  • 7. STATA LOGS AND DO-FILES • log – records your work in Stata, start before you do anything else! • .do file – lets you record a series of commands • Try to make your own log and .do file • Click on “log” -> “begin” ->give it a name ->save in the location convenient for you (this starts a log, when you exit Stata the log will automatically save). • Click on “do-file editor” start typing up commands. You would save it like any other document (”save” -> give it a name, save in a convenient location). • To run the commands in the do-file simply click ”run” at the top of the do-file
  • 8. EXAMPLE • Calculate mean for the student grades in excel and in Stata • You will find the data set “grades” on blackboard • Make sure your work in Stata is recorded in a log • What is the unit of observation in the dataset (i.e. whose grades are these)? • How many observations are there? • What is the average grade in that class?
  • 9. SMALLEST AND LARGEST OBSERVATION • You might be wondering if anyone got 100 in the class, or what the highest grade in the class was and possibly the lowest. • We can do so by looking at the data, by sorting data, and by using minimum and maximum functions in Excel and Stata • To sort data: • In Excel: highlight the data you want to sort, “data” -> “sort” • In Stata: sort ’variablename’ • gsort +’variablename’ or –’variablename’ • Once you have sorted the data you can see what the first and last observations are • Functions in Excel: =min(data), =max(data) • Functions in Stata: summarize ‘variablename’ • Minimum and maximum let you know if you have outliers in your data or there are certain problems with your data
  • 10. APPLICATION 1. USE EXCEL • Use UNRATE – unemployment rate dataset to find out the… • Average unemployment rate between 1948 and 2018 • What was the maximum and minimum unemployment rate during that period? • Any thoughts on your findings? • TIP… Stata has an API with Fred. There are two ways of accessing the FRED database… • Freduse command (might need to be installed)…. freduse UNRATE, clear • File >> Import >> Federal Reserve Economic Database
  • 11. APPLICATION 2. USE GRADES2 TO ANSWER THE FOLLOWING • In Stata: • What is the minimum grade in that class? • What is the maximum grade in that class? • What is the average grade in that class? • How do the minimums, maximums, and averages compare across the two classes?
  • 12. STANDARD DEVIATION • I want to calculate how dispersed the students’ grades are compared to the average grade in the class • Standard deviation (square root of variance) – spread of the observations around the mean value • Why is it useful? We can find out how much the data fluctuates around the mean in a dataset and compare datasets, it also lets us know if there are any outliers in a dataset so we can get rid of them. • Examples: income in different cities, unemployment in different regions, return on different companies’ stock,
  • 13. STANDARD DEVIATION CONT’D • In Excel the function for standard deviation is: =stdev(data) • In Stata standard deviation is the part of summarize command output
  • 14. STANDARD DEVIATION APPLICATIONS • Find the standard deviation for both of the classes and compare them. What conclusion can you draw? • What was the standard deviation of the unemployment rate before and after outliers were corrected? What conclusion can you draw?
  • 15. VARIANCE • Closely tied to standard deviation • Variance = squared standard deviation • Measure of how far away the observations are in a dataset from the mean • To find variance in excel: =var(datarange) • To find variance in Stata: have to square standard deviation by hand or use display r(Var) after summarize command • Stata retains a number of calculations (behind the scenes). • return list • There are other tools for calculating summary statistics… • Help tabstat • tabstat UNRATE, s(var)
  • 16. USING STATA AND EXCEL AS A CALCULATOR • To find variance you can always square standard deviation • di r(Var) • di r(sd)^2 • To use excel as a calculator you have to type in “=“ into a cell and then what you are trying to calculate • In Stata you have to type in the word ”display” and then what you are trying to calculate • For example, if standard deviation is 1.6 then to calculate variance in • Excel: =1.6^2 (or =1.6*1.6) • Stata: display 1.6^2 (or display 1.6*1.6)
  • 17. CREATING A NEW VARIABLE • You can create new variables in Excel and Stata. This skill will be useful later on in the class • For now lets imagine the professor gives everyone in the first class a 1% curve and calculate their grades • In excel in a new cell type in: =”cell with data”+1, hover over bottom right corner of the new cell and double click, the column should populate with calculated values. What is the class average now once everyone received extra credit? • Let’s import the grades into Stata and do the same. To create a new variable: • generate var=classgrade+1
  • 18. BAR CHARTS • You would like to find out how many people in the class received an A, B, C, and D. • The best way to look at that is to create a distribution chart (histogram) that will show how many received each grade • In Excel highlight the data->insert->histogram->right-click on the x-axis label to change number of bins and their range • In Stata click on graphics->histogram. There are many options, let’s go through some of them • Variable – classgrade • Width of bins – 10 (this is how “wide” each grade category is) • Lower limit of first bin – 60 (assuming no one failed the class) • Y-axis – frequency
  • 19. BAR CHARTS CONT’D • We can create bar charts to compare the same variable over time (i.e. unemployment) or across different units (i.e. income across different cities) • Let’s create an overtime bar chart using unemployment rate data in excel • Highlight unemployment rate column by clicking on column name twice • Click “insert” (top right)->pick bar chart (2D column) • Left click on x-axis labels->select data->edit->select range (years column) by highlighting it • To add labels to the axes, click on the chart->”+” symbol at the right corner-> tick axis titles->type the titles into the boxes
  • 20. LINE CHARTS • Showing the progression of a variable overtime is easier with a line chart • Load unemployment rate to Stata • Click “graphics” on the top left -> twoway graph->create->line plot type-> Y-variable is unemployment rate, X-variable is year->submit • To save your graph - > file->save as-> pick the type that will make it easy for you to open the graph • https://fred.stlouisfed.org/series/UNRATE/ compare your graphs to FRED data
  • 21. GDP OVERTIME IN US, MEXICO, AND CANADA • Please google “GDP per capita by country world bank” -> pick the one in current US$ (why do we have to use GDP per capita in current dollars? ) ->Download the csv file • Use ctrl-F to find GDP for US, Mexico and Canada. Copy and paste into a new document each country’s GDP • Delete third and fourth columns • Create a line chart. What conclusion can we draw about the relative economic growth of these countries?
  • 22. CORRELATION • Is it possible to improve your score during the semester or is the grade on the first exam closely related to the grade at the end of the semester? • Use grades3.xlsx data set to be able to answer this question • Import the dataset into stata. We are going to plot the observed points on a graph where the axes are: exam grade and class grade • To do so type in: scatter(exam1 classgrade) • We can tell that there is a positive relationship between the two variables • The graph that you created is called a scatterplot. By looking at scatterplots we can kind of tell if there is a relationship between different variables in the data. We can also make an educated guess whether the relationship between the two variables is positive or negative by looking at a scatterplot • Can you think of two variables that might be positively or negatively related?
  • 23. CALIFORNIA SCHOOL’S DATASET • The data set includes data on California’s school districts in 1998-1999 school year • It includes average test scores for 5th grades in each school district • The description of the data set is in the word document titled “California Test Scores” • Let’s look at the relationship between total enrollment and testscores • Stata: scatter testscr enrl_tot • Take a look at the data description and think of what could be related to the test scores? Is it a positive or a negative relationship?
  • 24. CORRELATION COEFFICIENT • We don’t have to guess whether there is a relationship between two variables and whether the relationship is positive or negative • We will use something called “correlation coefficient” (usually denoted r) to answer that • If r is between 0 and 1 the relationship is positive • If r is between -1 and 0 the relationship is negative • The closer the absolute value of r to 1is, the stronger the relationship • The closer the absolute value of r to 0 is, the weaker the relationship • In stata to find the correlation coefficient type in: correlate variable1 variable2 • In excel to find the correlation coefficient type in: =correl(variable 1 variable2)
  • 25. DO IT YOURSELF TIME • Try to create a scatterplot for the grades3 dataset in excel • Hint: a scatterplot is just a type of chart, your steps would be similar to creating a bar chart in excel • Try to find the correlation coefficient for the grades3 dataset in excel (on slido) • Hint: the correlation coefficient is a type of function. This should be similar to finding an average or a standard deviation in excel.
  • 26. LINE OF BEST FIT • Line of best fit is the line that best represents all of the data points on a scatterplot • Like any straight line it has an intercept and a slope • The equation of a straight line is: y=mx+b • Where b – intercept with the y-axis, m – the slope of the line • If the line of best fit for a scatterplot is y=-3x+2, this means that 2 – intercept with the Y- axis and 3 – slope of the line. • When x = 0, y = 2 • Since the slope is negative the relationship between the two variables is negative.
  • 27. EXAMPLE: LINE OF BEST FIT FOR CLASS GRADES • Once you have created a scatterplot in excel you can add the line of best fit to it • Click on the “+” in the upper-right corner, tick “trendline” • You can see that the line of best fit is upward-sloping => the relationship between the two variables is positive • To find out the equation of the line left-click on it ->format->display equation on chart • What are the intercept and the slope of the line? What conclusion can we draw from knowing those numbers? • Do they make sense?
  • 28. CONCLUSION • We have reviewed descriptive statistics. What are some of the descriptive stats we have discussed? • How can we find them in excel? • How can we find them in stata? • What types of charts have you learned to create? How can you do this in stata/ excel? • If the correlation coefficient is -1 what does it mean? 0? 0.2?