SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Individual Level Predictive Analytics
Improving Student Enrolment Outcomes
Stephen Childs, Institutional Analyst @sechilds
CIRPA/PNAIRP 2016, Kelowna, BC
November 7, 2016
Office of Institutional Analysis
Why Predictive Analytics and IR
 Higher Education Institutions collect more data
 IR offices have experts in institutional data
 IR offices are seeking ways to add more value
 Machine learning, predictive models are in the news
2
Opportunity… or Crisis?
 Predictive Analytics are a different skill set
 A different set of software tools required
 You may be the only analyst working on this in your office
 Requesters expect you to be the expert
 Resistance to implementing insights from predictive analytics
The way forward
 Add these skills to your IR toolkit
 Find tools that work with your existing ones
 Develop your understanding and expertise
 Community of Practice
Learning Outcomes
 Have a high-level understanding of what predictive analytics
does and how it works.
 Have a concrete series of steps to follow.
 Know the vocabulary of machine learning and statistical
modeling.
 Know what tools can be used for this - and how they work
with existing tools
 Know about how we select, test, train models for prediction
 Learn some of the challenges in predictive modeling
Outline
 Introduction (already done??)
 Introduction to Machine Learning
 Model Building Steps
 Tool Overview
 Customer Education
 Challenges
 Building Community
About Me
Machine Learning
 Contrast with statistics
 Supervised and Unsupervised Learning
 Classification and Regression
 Different Algorithms
Predictive Data Analysis Steps
Goal
Data
Access
Analysis
File
Model Delivery
STEP 1: Define Your Goal
 Sets the scope of your analysis
 Provides input into model selection
 Identifies stakeholders
 Discover what data is available
 Revise as the project progresses
STEP 2: Get Access to your Data
 Three different types of data:
—Operational SIS
—Data Warehouse – snapshots
—Predictive Analytics Data
 Talk to your DBA to find out tables
 Think of other data to add:
—Residence, CRM
—Socio-economic data
STEP 3: Build an Analysis File
 Extract – Transform – Load
—Use as much existing ETL as you can
—Join tables together
—Work with a programmer – but analyst drives
 Hard to capture the timeline of the application
—When did they apply?
—When were they accepted?
—When did they register?
STEP 3: Build and Analysis File - Tools
STEP 3: Build a Data Analysis File – Best Practices
 Test your ETL process (automated is better)
 Save your data in a database (existing one, SQLite)
 Append rows to table and timestamp & use test indicator
 Keep track of program version
 Keep a changelog
 Capture more data, then filter that for analysis
STEP 4: Develop a model
Student
Characteristics
Outcomes
Independent Variables
Features
Dependent Variable
function
algorithm
formula
STEP 4: Develop a Model – Things to Watch Out For
 Missing data
 Multiple models
 Model testing
STEP 4: Develop a Model - Accuracy
 Refer back to your goal – no universal measure of accuracy
 Model used for decision making/resource allocation
 Assign loss based on incorrect predictions – minimize it
 Receiver Operating Characteristic (ROC) and Area Under the
Curve (AUC)
 Bias-Variance Trade Off and Overfitting
STEP 5: Deliver Your Results
 Set up delivery early
 Meet with your audience – set expectations
 How will the data be used – refer back to goal
 Dashboards
 Data files
STEP 5: Delivery to Students
 Have to carefully present information to students
—Present a positive outlook
—Don’t personalize it – talk about a group of similar
students.
 The factors in the model may be less deterministic than
unobserved factors.
 Difference between causality and correlation.
 Beware the self-fulfilling prophecy
Cathy O’Neil
 @mathbabe, mathbabe.org
 Mathematician, former hedge-fund
quant
Weapons of Math Destruction
 Three factors make a model a WMD:
—Is the participant aware of the model? Is the model
opaque or invisible?
—Does the model work against the participant’s interest? Is
it unfair? Does it create feedback loops?
—Can the model scale?
Experience So Far
 Longer than anticipated to get the data
 Working with the data was a great learning experience
 Automated process for harvesting data
 Starting to work on the delivery end
Challenges
 Data quality
 Not enough RHS variables
 More categorical variables in usual ML problems
Community of Practice
 Predictive Analytics Roundtable
 Mailing List – more discussion in future
 http://mailman.ucalgary.ca/mailman/listinfo/predictive-l
 Stephen.Childs@ucalgary.ca
 @sechilds #CIRPA2016
 PyData, other user groups

Weitere ähnliche Inhalte

Was ist angesagt?

Balanced Scorecarding
Balanced  ScorecardingBalanced  Scorecarding
Balanced Scorecarding
hanu friend
 
Data analysis – using computers
Data analysis – using computersData analysis – using computers
Data analysis – using computers
Noonapau
 

Was ist angesagt? (20)

Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
Data Analytics.03. Data processing
Data Analytics.03. Data processingData Analytics.03. Data processing
Data Analytics.03. Data processing
 
Supervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine LearningSupervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine Learning
 
Using Bibliometrics to Keep Up with the Joneses
Using Bibliometrics to Keep Up with the JonesesUsing Bibliometrics to Keep Up with the Joneses
Using Bibliometrics to Keep Up with the Joneses
 
Pikas using bibliometrics to make sense of research proposals
Pikas using bibliometrics to make sense of research proposalsPikas using bibliometrics to make sense of research proposals
Pikas using bibliometrics to make sense of research proposals
 
Advancing Foundation and Practice of Software Analytics
Advancing Foundation and Practice of Software AnalyticsAdvancing Foundation and Practice of Software Analytics
Advancing Foundation and Practice of Software Analytics
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
Unsupervised Machine Learning Ml And How It Works
Unsupervised Machine Learning Ml And How It WorksUnsupervised Machine Learning Ml And How It Works
Unsupervised Machine Learning Ml And How It Works
 
Citi Global T4I Accelerator Data and Analytics Presentation
Citi Global T4I Accelerator Data and Analytics PresentationCiti Global T4I Accelerator Data and Analytics Presentation
Citi Global T4I Accelerator Data and Analytics Presentation
 
Spring 2016
Spring 2016Spring 2016
Spring 2016
 
Analytics 101 - Getting Started
Analytics 101 - Getting Started Analytics 101 - Getting Started
Analytics 101 - Getting Started
 
Supervised Machine Learning Techniques common algorithms and its application
Supervised Machine Learning Techniques common algorithms and its applicationSupervised Machine Learning Techniques common algorithms and its application
Supervised Machine Learning Techniques common algorithms and its application
 
Balanced Scorecarding
Balanced  ScorecardingBalanced  Scorecarding
Balanced Scorecarding
 
Harper Analytics Beyond Usage Numbers
Harper Analytics Beyond Usage NumbersHarper Analytics Beyond Usage Numbers
Harper Analytics Beyond Usage Numbers
 
Ompp3 om (operations management) practical project problems are
Ompp3 om (operations management) practical project problems  are Ompp3 om (operations management) practical project problems  are
Ompp3 om (operations management) practical project problems are
 
Introduction to Computational Statistics
Introduction to Computational StatisticsIntroduction to Computational Statistics
Introduction to Computational Statistics
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
 
Cisa 3358600, april 2021 research paper instructions 1
Cisa 3358600, april 2021   research paper instructions 1 Cisa 3358600, april 2021   research paper instructions 1
Cisa 3358600, april 2021 research paper instructions 1
 
Data analysis – using computers
Data analysis – using computersData analysis – using computers
Data analysis – using computers
 
Business Basic Statistics
Business Basic StatisticsBusiness Basic Statistics
Business Basic Statistics
 

Ähnlich wie CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptxUnit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
tesfkeb
 
Data+Science+in+Python+-+Data+Prep+&+EDA.pdf
Data+Science+in+Python+-+Data+Prep+&+EDA.pdfData+Science+in+Python+-+Data+Prep+&+EDA.pdf
Data+Science+in+Python+-+Data+Prep+&+EDA.pdf
neelakandan2001kpm
 
Foundational Methodology for Data Science
Foundational Methodology for Data ScienceFoundational Methodology for Data Science
Foundational Methodology for Data Science
John B. Rollins, Ph.D.
 
Machine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEMachine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AE
butest
 

Ähnlich wie CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment (20)

Predictive Analytics in Practice
Predictive Analytics in PracticePredictive Analytics in Practice
Predictive Analytics in Practice
 
Quantitative techniques for business analysis
Quantitative techniques for business analysisQuantitative techniques for business analysis
Quantitative techniques for business analysis
 
Data Analytics: Better Decision, Better Business
Data Analytics: Better Decision, Better BusinessData Analytics: Better Decision, Better Business
Data Analytics: Better Decision, Better Business
 
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptxUnit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
 
Analytics from data to better decision
Analytics   from data to better decisionAnalytics   from data to better decision
Analytics from data to better decision
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analytics
 
Market and Social Research Part 8
Market and Social Research Part 8Market and Social Research Part 8
Market and Social Research Part 8
 
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxLesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
 
What does-x api-mean-for-your-learning-data and analytics-strategy-slideshare
What does-x api-mean-for-your-learning-data and analytics-strategy-slideshareWhat does-x api-mean-for-your-learning-data and analytics-strategy-slideshare
What does-x api-mean-for-your-learning-data and analytics-strategy-slideshare
 
Data+Science+in+Python+-+Data+Prep+&+EDA.pdf
Data+Science+in+Python+-+Data+Prep+&+EDA.pdfData+Science+in+Python+-+Data+Prep+&+EDA.pdf
Data+Science+in+Python+-+Data+Prep+&+EDA.pdf
 
data-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdfdata-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdf
 
CDE InFocus Conference (London): Big data in education - theory and practice
CDE InFocus Conference (London): Big data in education - theory and practiceCDE InFocus Conference (London): Big data in education - theory and practice
CDE InFocus Conference (London): Big data in education - theory and practice
 
Data Science in Python.pptx
Data Science in Python.pptxData Science in Python.pptx
Data Science in Python.pptx
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Foundational Methodology for Data Science
Foundational Methodology for Data ScienceFoundational Methodology for Data Science
Foundational Methodology for Data Science
 
Tableau Conference 2014 Presentation
Tableau Conference 2014 PresentationTableau Conference 2014 Presentation
Tableau Conference 2014 Presentation
 
Machine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEMachine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AE
 
Data Quality: principles, approaches, and best practices
Data Quality: principles, approaches, and best practicesData Quality: principles, approaches, and best practices
Data Quality: principles, approaches, and best practices
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
 
Data fluency
Data fluencyData fluency
Data fluency
 

Kürzlich hochgeladen

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 

Kürzlich hochgeladen (20)

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 

CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

  • 1. Individual Level Predictive Analytics Improving Student Enrolment Outcomes Stephen Childs, Institutional Analyst @sechilds CIRPA/PNAIRP 2016, Kelowna, BC November 7, 2016 Office of Institutional Analysis
  • 2. Why Predictive Analytics and IR  Higher Education Institutions collect more data  IR offices have experts in institutional data  IR offices are seeking ways to add more value  Machine learning, predictive models are in the news 2
  • 3. Opportunity… or Crisis?  Predictive Analytics are a different skill set  A different set of software tools required  You may be the only analyst working on this in your office  Requesters expect you to be the expert  Resistance to implementing insights from predictive analytics
  • 4. The way forward  Add these skills to your IR toolkit  Find tools that work with your existing ones  Develop your understanding and expertise  Community of Practice
  • 5. Learning Outcomes  Have a high-level understanding of what predictive analytics does and how it works.  Have a concrete series of steps to follow.  Know the vocabulary of machine learning and statistical modeling.  Know what tools can be used for this - and how they work with existing tools  Know about how we select, test, train models for prediction  Learn some of the challenges in predictive modeling
  • 6. Outline  Introduction (already done??)  Introduction to Machine Learning  Model Building Steps  Tool Overview  Customer Education  Challenges  Building Community
  • 8. Machine Learning  Contrast with statistics  Supervised and Unsupervised Learning  Classification and Regression  Different Algorithms
  • 9. Predictive Data Analysis Steps Goal Data Access Analysis File Model Delivery
  • 10. STEP 1: Define Your Goal  Sets the scope of your analysis  Provides input into model selection  Identifies stakeholders  Discover what data is available  Revise as the project progresses
  • 11. STEP 2: Get Access to your Data  Three different types of data: —Operational SIS —Data Warehouse – snapshots —Predictive Analytics Data  Talk to your DBA to find out tables  Think of other data to add: —Residence, CRM —Socio-economic data
  • 12. STEP 3: Build an Analysis File  Extract – Transform – Load —Use as much existing ETL as you can —Join tables together —Work with a programmer – but analyst drives  Hard to capture the timeline of the application —When did they apply? —When were they accepted? —When did they register?
  • 13. STEP 3: Build and Analysis File - Tools
  • 14. STEP 3: Build a Data Analysis File – Best Practices  Test your ETL process (automated is better)  Save your data in a database (existing one, SQLite)  Append rows to table and timestamp & use test indicator  Keep track of program version  Keep a changelog  Capture more data, then filter that for analysis
  • 15. STEP 4: Develop a model Student Characteristics Outcomes Independent Variables Features Dependent Variable function algorithm formula
  • 16. STEP 4: Develop a Model – Things to Watch Out For  Missing data  Multiple models  Model testing
  • 17. STEP 4: Develop a Model - Accuracy  Refer back to your goal – no universal measure of accuracy  Model used for decision making/resource allocation  Assign loss based on incorrect predictions – minimize it  Receiver Operating Characteristic (ROC) and Area Under the Curve (AUC)  Bias-Variance Trade Off and Overfitting
  • 18. STEP 5: Deliver Your Results  Set up delivery early  Meet with your audience – set expectations  How will the data be used – refer back to goal  Dashboards  Data files
  • 19. STEP 5: Delivery to Students  Have to carefully present information to students —Present a positive outlook —Don’t personalize it – talk about a group of similar students.  The factors in the model may be less deterministic than unobserved factors.  Difference between causality and correlation.  Beware the self-fulfilling prophecy
  • 20. Cathy O’Neil  @mathbabe, mathbabe.org  Mathematician, former hedge-fund quant
  • 21. Weapons of Math Destruction  Three factors make a model a WMD: —Is the participant aware of the model? Is the model opaque or invisible? —Does the model work against the participant’s interest? Is it unfair? Does it create feedback loops? —Can the model scale?
  • 22. Experience So Far  Longer than anticipated to get the data  Working with the data was a great learning experience  Automated process for harvesting data  Starting to work on the delivery end
  • 23. Challenges  Data quality  Not enough RHS variables  More categorical variables in usual ML problems
  • 24. Community of Practice  Predictive Analytics Roundtable  Mailing List – more discussion in future  http://mailman.ucalgary.ca/mailman/listinfo/predictive-l  Stephen.Childs@ucalgary.ca  @sechilds #CIRPA2016  PyData, other user groups

Hinweis der Redaktion

  1. Different skills – data needs and setup are different. Predictive analytics are very different from reporting. Terminology often comes from machine learning/computer science – IR more grounded in traditional statistics Those requesting predictive analytics may not know much about what they want. (A familiar story in IR.) But you need to be the expert – they expect that.
  2. Predictive analytics and machine learning to the toolkit Find software tools that work with existing tools Learn the vocabulary around this discipline More understanding of machine learning, statistics and other stuff Don’t work alone – develop a community of practice.
  3. Analyst and Researcher - MA Economics from WLU History with EPRI and uCalgary Analyst role and technical skills vs. programmer/analyst
  4. Machine learning comes out of computer science – different tradition and terminology It is related to artificial intelligence It is widely used in the tech world – it impacts your lif e on a daily basis Statistics and Machine Learning - the two cultures paper & response Statistics - assume a Data Generation Process and want to learn about that Machine Learning - Algorithms applied to data - no such assumption Goals are different Types of Machine Learning Supervised vs. Unsupervised Learning Classification vs. Regression Machine Learning Algorithms OLS Logistic Regression Decision Trees - Random Forest Ensemble Models
  5. Define your goal Get access to data Build an analysis file Develop a model Deliver your results
  6. It is best to start with a written document describing the goals of your project. Otherwise you are really starting with an “unwritten” one – and that can cause confusion later. e.g. Canadian and American constitutions vs. British
  7. The operational SIS data is the source. You perform an ETL process to get the snapshot. * How many people use snapshots? * Of those, is the snapshot fields different from the SIS fields? You will probably need to generate your own ETL process for the predictive analytics data. If you want to do real time predictions – you need access to that operational data. There may be a view already in place for you - that does most of what you want.
  8. Ask – Raise your hand if you are comfortable joining database tables together – or any type of tables Keep your hands up – also raise your hands if you have someone close by who can help you with that Your job as the analyst is to keep your eye on the goal. Working with a programmer lets you do pair programming. (Which is great if you can do it.) Students can change their minds throughout the process – the university can reject them from a program. Figure out the significant events in the person’s record and capture that time stamp. We also found out that the effective dates are not always WHEN something was added into the database!
  9. Option 1 – use a programming language – Python, R, SQL Option 2 – use a graphical data blending tool (for prototyping or the whole project) Graphical tools are better for prototyping – getting you started quickly. Code is the best – easier to maintain, easier to track changes, handles complexity better, but higher barrier to entry!! There are a number of ETL “move” you can learn – they work regardless of the tool – and will be very useful in talking with programmers. Focus on learning the moves, not the syntax. Draw diagrams!
  10. Testing – you need to make sure your ETL is good – if you can automate this testing… you are ahead of the curve Compare to individual records – see if your file makes sense Modularize data transformations – so you can test with fake data that – cover likely cases Databases are awesome – use an existing one or set up a simple one (SQLite) – you have this expertise at your institution!! This lets you start creating DAILY snapshots – which will come in handy next year! Think about the table structure – talk to your collegues!! Program version – git hash, version number (semantic versioning) -
  11. What is a model? At it’s core, it is a way to relate the characteristics of students to their outcomes. You can think of it as a formula that takes the data that you have – and modifies it to produce the outcome you want. There are a number of different types of models, but most programs will give you the same interface to all types. The important thing is understanding what the algorithm is doing – and how to “tune” it.
  12. Missing data - grades example, geographic data, gender example.
  13. Confusion matrix! Make no assumptions – means you are assuming a false +ve is as bad as a false –ve.
  14. Our models should never serve as a gatekeeper to services or access to education – the only case where that happens is an experiment – and you need to get REB approval for that.