CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

Individual Level Predictive Analytics
Improving Student Enrolment Outcomes
Stephen Childs, Institutional Analyst @sechilds
CIRPA/PNAIRP 2016, Kelowna, BC
November 7, 2016
Office of Institutional Analysis

Why Predictive Analytics and IR
 Higher Education Institutions collect more data
 IR offices have experts in institutional data
 IR offices are seeking ways to add more value
 Machine learning, predictive models are in the news
2

Opportunity… or Crisis?
 Predictive Analytics are a different skill set
 A different set of software tools required
 You may be the only analyst working on this in your office
 Requesters expect you to be the expert
 Resistance to implementing insights from predictive analytics

The way forward
 Add these skills to your IR toolkit
 Find tools that work with your existing ones
 Develop your understanding and expertise
 Community of Practice

Learning Outcomes
 Have a high-level understanding of what predictive analytics
does and how it works.
 Have a concrete series of steps to follow.
 Know the vocabulary of machine learning and statistical
modeling.
 Know what tools can be used for this - and how they work
with existing tools
 Know about how we select, test, train models for prediction
 Learn some of the challenges in predictive modeling

Outline
 Introduction (already done??)
 Introduction to Machine Learning
 Model Building Steps
 Tool Overview
 Customer Education
 Challenges
 Building Community

Machine Learning
 Contrast with statistics
 Supervised and Unsupervised Learning
 Classification and Regression
 Different Algorithms

Predictive Data Analysis Steps
Goal
Data
Access
Analysis
File
Model Delivery

STEP 1: Define Your Goal
 Sets the scope of your analysis
 Provides input into model selection
 Identifies stakeholders
 Discover what data is available
 Revise as the project progresses

STEP 2: Get Access to your Data
 Three different types of data:
—Operational SIS
—Data Warehouse – snapshots
—Predictive Analytics Data
 Talk to your DBA to find out tables
 Think of other data to add:
—Residence, CRM
—Socio-economic data

STEP 3: Build an Analysis File
 Extract – Transform – Load
—Use as much existing ETL as you can
—Join tables together
—Work with a programmer – but analyst drives
 Hard to capture the timeline of the application
—When did they apply?
—When were they accepted?
—When did they register?

STEP 3: Build and Analysis File - Tools

STEP 3: Build a Data Analysis File – Best Practices
 Test your ETL process (automated is better)
 Save your data in a database (existing one, SQLite)
 Append rows to table and timestamp & use test indicator
 Keep track of program version
 Keep a changelog
 Capture more data, then filter that for analysis

STEP 4: Develop a model
Student
Characteristics
Outcomes
Independent Variables
Features
Dependent Variable
function
algorithm
formula

STEP 4: Develop a Model – Things to Watch Out For
 Missing data
 Multiple models
 Model testing

STEP 4: Develop a Model - Accuracy
 Refer back to your goal – no universal measure of accuracy
 Model used for decision making/resource allocation
 Assign loss based on incorrect predictions – minimize it
 Receiver Operating Characteristic (ROC) and Area Under the
Curve (AUC)
 Bias-Variance Trade Off and Overfitting

STEP 5: Deliver Your Results
 Set up delivery early
 Meet with your audience – set expectations
 How will the data be used – refer back to goal
 Dashboards
 Data files

STEP 5: Delivery to Students
 Have to carefully present information to students
—Present a positive outlook
—Don’t personalize it – talk about a group of similar
students.
 The factors in the model may be less deterministic than
unobserved factors.
 Difference between causality and correlation.
 Beware the self-fulfilling prophecy

Cathy O’Neil
 @mathbabe, mathbabe.org
 Mathematician, former hedge-fund
quant

Weapons of Math Destruction
 Three factors make a model a WMD:
—Is the participant aware of the model? Is the model
opaque or invisible?
—Does the model work against the participant’s interest? Is
it unfair? Does it create feedback loops?
—Can the model scale?

Experience So Far
 Longer than anticipated to get the data
 Working with the data was a great learning experience
 Automated process for harvesting data
 Starting to work on the delivery end

Challenges
 Data quality
 Not enough RHS variables
 More categorical variables in usual ML problems

Community of Practice
 Predictive Analytics Roundtable
 Mailing List – more discussion in future
 http://mailman.ucalgary.ca/mailman/listinfo/predictive-l
 Stephen.Childs@ucalgary.ca
 @sechilds #CIRPA2016
 PyData, other user groups

CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

Ähnlich wie CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

Hinweis der Redaktion