SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Data Science for Higher Ed
Gloria Lau
Manager, Data Science @ LinkedIn
LinkedIn data.
For students*.

*prospective students, current students and recent graduates
WHY?
We have career outcome data to
derive better insights about higher
education
Common questions from user studies
Prospective students:
I want to be a pediatrician. Where should I go to school?
I don’t know what I want but I am an A student. So?
Current students:
Show me the internship / job opportunities.
Should I double / change major?
Recent graduates:
Show me the job opportunities.
Should I consider further education?
The Answer for the type A’s
Show me the career outcome data per school / field of study / degree
The Answer for the exploratory kind
Show me the career outcome data in a form that allows for
serendipitous discoveries
 build me some data products to help me draw insights
from aggregate data
 build me some data products that are delightful
OK! Let’s start building some data
products for students!
type A’s and non type A’s, we have answers for you
Invest in Plumbing
Before your faucets
Data Science for Higher Ed
A case study
From plumbing to fixture.
From standardization to delightful data products.
Standardization
•

Standardization is about understanding our data, and
building the foundational layer that maps <school_name> to
<school_id> so that we can build data products on top
•

Entity resolution

•

Recognizable entities

•

Typeahead
Entity Resolution

•

User types in University of California, Berkeley  easy

•

User types in UCB  hard / ambiguous
Entity Resolution

•

Name feature: fuzzy match, edit distance, prefix match, etc

•

Profile feature: email, groups, etc

•

Network feature: connections, invitations, etc
Recognizable entities
•

User types in University of California, Berkeley  easy

•

User types in UCB  hard / ambiguous / alias not
understood

•

User types in 東京大学  harder / canonical name not
understood
Recognizable entities

•

You don’t know what you don’t know
•

Your standardization is only as good as your recognized
dataset

•

LinkedIn data is very global
Recognizable entities

•

IPEDS for US school data

•

Crowdsourcing for non-US school + government data
•

•

internal and external with schema spec’ed out

Alias – bootstrap from member data
Typeahead
•

Plug the hole from the front(-end) as soon as you can

•

Invest in a good typeahead early on so that you don’t even
need to standardize
•

Helps standardization rate tremendously

•

Make sure you have aliases and localized strings in your
typeahead
Plumbing? checked
Onto building delightful* data products

*The level of delightfulness is directly correlated to
how good your standardization layer is.
Similar Schools
Serendipitous discoveries. Sideways browse.
Based on career outcome data + some more.
Similar Schools
Similar schools
•

Aggregate profile per school based on alumni data
•

Industry, job title, job function, company, skills, etc

•

Feature engineering and balancing

•

Dot-product of 2 aggregate profiles = school similarity
Similar schools – issues

•

Observation #1: similarity identified between tiny
specialized schools and big research institutions

•

Observation #2: similarity identified between non-US
specialized schools and big US research institutions
What’s wrong?
Degree bucketization
Similar schools - issues
•

Observation: no data

•

New community colleges and non-US
schools have very sparse data

•

Solution: attribute-based similarity
•

From IPEDS and crowdsourced data

Kyoritsu Women's University
Notable Alumni
Aspirations. Connecting the dots.
Notable Alumni
•

Who’s notable?
•

Wikipedia match
•
•

•

School standardization
Name mapping

Success stories
Who’s notable – Wikipedia stories

…
Wikipedia stories
•

Lightweight school standardization
•

•

Name mapping
•

•

✓ Name feature ✕ profile feature ✕ network feature

Even when you are notable, your name isn’t unique

Crowdsourcing for evaluation
•

Profile from LinkedIn vs profile from Wikipedia
Crowdsourcing for evaluation
Are we done? Do we have notable
alumni for all schools?
Similar issue like similar schools – data sparseness
Who’s notable - Success stories
•

Many schools don’t have notable alumni section in Wikipedia

•

Success stories based on LinkedIn data
•

Features of success
•
•

•

CXO’s at Fortune companies
Generalizes to high seniority at top companies

But what does it mean to be
•
•

Senior

•
•

A top company

An alum

They all depend on…
Standardization
•

Degree standardization - alumni

•

Company standardization
•

•

IBM vs international brotherhood of magicians

Title & seniority standardization
•

founder of the gloria lau franchise vs founder of LinkedIn

•

VP in financial sector vs VP in software engineering industry
Evaluation – I know it when I see it
INSIGHTS:
unique & standardized data to describe schools.
similar schools.
notable alumni.

to drive STUDENT DECISIONS

Weitere ähnliche Inhalte

Was ist angesagt?

VIVO Team Builder - VIVO conference 2014
VIVO Team Builder - VIVO conference 2014VIVO Team Builder - VIVO conference 2014
VIVO Team Builder - VIVO conference 2014Anup Sawant
 
Tips You Should Know and Do Before You Open The Data Curtains To Your School
Tips You Should Know and Do Before You Open The Data Curtains To Your SchoolTips You Should Know and Do Before You Open The Data Curtains To Your School
Tips You Should Know and Do Before You Open The Data Curtains To Your SchoolRachel Welsh
 
Linked Art: Sustainable Cultural Knowledge through Linked Open Usable Data
Linked Art: Sustainable Cultural Knowledge through Linked Open Usable DataLinked Art: Sustainable Cultural Knowledge through Linked Open Usable Data
Linked Art: Sustainable Cultural Knowledge through Linked Open Usable DataRobert Sanderson
 
Information, Not Location: Putting the What in Front of the Where So Patrons...
Information, Not Location: Putting the What in Front of the Where So  Patrons...Information, Not Location: Putting the What in Front of the Where So  Patrons...
Information, Not Location: Putting the What in Front of the Where So Patrons...Ken Varnum
 
IIIF and Linked Data: A Cultural Heritage DAM Ecosystem
IIIF and Linked Data: A Cultural Heritage DAM EcosystemIIIF and Linked Data: A Cultural Heritage DAM Ecosystem
IIIF and Linked Data: A Cultural Heritage DAM EcosystemRobert Sanderson
 
Ordering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataOrdering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataAndy Stretton
 

Was ist angesagt? (7)

VIVO Team Builder - VIVO conference 2014
VIVO Team Builder - VIVO conference 2014VIVO Team Builder - VIVO conference 2014
VIVO Team Builder - VIVO conference 2014
 
Tips You Should Know and Do Before You Open The Data Curtains To Your School
Tips You Should Know and Do Before You Open The Data Curtains To Your SchoolTips You Should Know and Do Before You Open The Data Curtains To Your School
Tips You Should Know and Do Before You Open The Data Curtains To Your School
 
Finding and Using E-Books
Finding and Using E-BooksFinding and Using E-Books
Finding and Using E-Books
 
Linked Art: Sustainable Cultural Knowledge through Linked Open Usable Data
Linked Art: Sustainable Cultural Knowledge through Linked Open Usable DataLinked Art: Sustainable Cultural Knowledge through Linked Open Usable Data
Linked Art: Sustainable Cultural Knowledge through Linked Open Usable Data
 
Information, Not Location: Putting the What in Front of the Where So Patrons...
Information, Not Location: Putting the What in Front of the Where So  Patrons...Information, Not Location: Putting the What in Front of the Where So  Patrons...
Information, Not Location: Putting the What in Front of the Where So Patrons...
 
IIIF and Linked Data: A Cultural Heritage DAM Ecosystem
IIIF and Linked Data: A Cultural Heritage DAM EcosystemIIIF and Linked Data: A Cultural Heritage DAM Ecosystem
IIIF and Linked Data: A Cultural Heritage DAM Ecosystem
 
Ordering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataOrdering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect data
 

Andere mochten auch

Marilyn Gardner Milton: Advanced Law School Chat Pt. 2
Marilyn Gardner Milton: Advanced Law School Chat Pt. 2Marilyn Gardner Milton: Advanced Law School Chat Pt. 2
Marilyn Gardner Milton: Advanced Law School Chat Pt. 2Marilyn Gardner Milton MA
 
Io t olympics kickoff - Iskander Smit
Io t olympics kickoff - Iskander SmitIo t olympics kickoff - Iskander Smit
Io t olympics kickoff - Iskander SmitInfo.nl
 
Qualitem - Large List Support - SharePoint Saturday
Qualitem - Large List Support - SharePoint SaturdayQualitem - Large List Support - SharePoint Saturday
Qualitem - Large List Support - SharePoint SaturdayRick Rosato
 
SIAS Bio-IT Conference_FINAL
SIAS Bio-IT Conference_FINALSIAS Bio-IT Conference_FINAL
SIAS Bio-IT Conference_FINALJohn Koch
 
moForte - Introduction 05_03_2016
moForte  - Introduction 05_03_2016moForte  - Introduction 05_03_2016
moForte - Introduction 05_03_2016gpscc
 
World Office Forum Alianza del Pacífico 2015
World Office Forum Alianza del Pacífico 2015World Office Forum Alianza del Pacífico 2015
World Office Forum Alianza del Pacífico 2015Jorge Zanoletty Pérez
 
Мастердент
МастердентМастердент
Мастердентosokolova
 
班級經營100.03.30
班級經營100.03.30班級經營100.03.30
班級經營100.03.30Kuo-Yi Chen
 
Bluemix predictive analyticsのご紹介
Bluemix predictive analyticsのご紹介Bluemix predictive analyticsのご紹介
Bluemix predictive analyticsのご紹介IBM Analytics Japan
 
June 30 NYU Tandon Online General Webinar Slides
June 30 NYU Tandon Online General Webinar SlidesJune 30 NYU Tandon Online General Webinar Slides
June 30 NYU Tandon Online General Webinar SlidesNYU Tandon Online
 
7 reasons why media productivity plans don't work as expected
7 reasons why media productivity plans don't work as expected7 reasons why media productivity plans don't work as expected
7 reasons why media productivity plans don't work as expectedPaola Furlanetto
 
Hashtaggery BLC16
Hashtaggery BLC16Hashtaggery BLC16
Hashtaggery BLC16Amy Burvall
 
IoT Eindhoven Iskander Smit - Civic City
IoT Eindhoven Iskander Smit - Civic CityIoT Eindhoven Iskander Smit - Civic City
IoT Eindhoven Iskander Smit - Civic CityInfo.nl
 
Stampions Cross Media Cafe
Stampions Cross Media CafeStampions Cross Media Cafe
Stampions Cross Media CafeInfo.nl
 
Step by Step Guide to Healthcare IT Security Risk Management - Redspin Infor...
Step by Step Guide to Healthcare IT Security Risk Management  - Redspin Infor...Step by Step Guide to Healthcare IT Security Risk Management  - Redspin Infor...
Step by Step Guide to Healthcare IT Security Risk Management - Redspin Infor...Redspin, Inc.
 

Andere mochten auch (20)

Marilyn Gardner Milton: Advanced Law School Chat Pt. 2
Marilyn Gardner Milton: Advanced Law School Chat Pt. 2Marilyn Gardner Milton: Advanced Law School Chat Pt. 2
Marilyn Gardner Milton: Advanced Law School Chat Pt. 2
 
Io t olympics kickoff - Iskander Smit
Io t olympics kickoff - Iskander SmitIo t olympics kickoff - Iskander Smit
Io t olympics kickoff - Iskander Smit
 
RHELOPS
RHELOPSRHELOPS
RHELOPS
 
EEK! Halloween Activities for K to 5
EEK! Halloween Activities for K to 5EEK! Halloween Activities for K to 5
EEK! Halloween Activities for K to 5
 
Qualitem - Large List Support - SharePoint Saturday
Qualitem - Large List Support - SharePoint SaturdayQualitem - Large List Support - SharePoint Saturday
Qualitem - Large List Support - SharePoint Saturday
 
Introducing Elixir
Introducing ElixirIntroducing Elixir
Introducing Elixir
 
SIAS Bio-IT Conference_FINAL
SIAS Bio-IT Conference_FINALSIAS Bio-IT Conference_FINAL
SIAS Bio-IT Conference_FINAL
 
moForte - Introduction 05_03_2016
moForte  - Introduction 05_03_2016moForte  - Introduction 05_03_2016
moForte - Introduction 05_03_2016
 
World Office Forum Alianza del Pacífico 2015
World Office Forum Alianza del Pacífico 2015World Office Forum Alianza del Pacífico 2015
World Office Forum Alianza del Pacífico 2015
 
Мастердент
МастердентМастердент
Мастердент
 
班級經營100.03.30
班級經營100.03.30班級經營100.03.30
班級經營100.03.30
 
Bluemix predictive analyticsのご紹介
Bluemix predictive analyticsのご紹介Bluemix predictive analyticsのご紹介
Bluemix predictive analyticsのご紹介
 
June 30 NYU Tandon Online General Webinar Slides
June 30 NYU Tandon Online General Webinar SlidesJune 30 NYU Tandon Online General Webinar Slides
June 30 NYU Tandon Online General Webinar Slides
 
7 reasons why media productivity plans don't work as expected
7 reasons why media productivity plans don't work as expected7 reasons why media productivity plans don't work as expected
7 reasons why media productivity plans don't work as expected
 
Hashtaggery BLC16
Hashtaggery BLC16Hashtaggery BLC16
Hashtaggery BLC16
 
IoT Eindhoven Iskander Smit - Civic City
IoT Eindhoven Iskander Smit - Civic CityIoT Eindhoven Iskander Smit - Civic City
IoT Eindhoven Iskander Smit - Civic City
 
Stampions Cross Media Cafe
Stampions Cross Media CafeStampions Cross Media Cafe
Stampions Cross Media Cafe
 
Analogía de clasificación
Analogía de clasificaciónAnalogía de clasificación
Analogía de clasificación
 
Finalaya daily wrap_01sep2014
Finalaya daily wrap_01sep2014Finalaya daily wrap_01sep2014
Finalaya daily wrap_01sep2014
 
Step by Step Guide to Healthcare IT Security Risk Management - Redspin Infor...
Step by Step Guide to Healthcare IT Security Risk Management  - Redspin Infor...Step by Step Guide to Healthcare IT Security Risk Management  - Redspin Infor...
Step by Step Guide to Healthcare IT Security Risk Management - Redspin Infor...
 

Ähnlich wie Qcon SF 2013

Getting Found - Using Social Media To Build Your Research Profile: Research N...
Getting Found - Using Social Media To Build Your Research Profile: Research N...Getting Found - Using Social Media To Build Your Research Profile: Research N...
Getting Found - Using Social Media To Build Your Research Profile: Research N...Natacha Suttor
 
Improving Web Information Architecture & International Scientific Visibility
Improving Web Information Architecture & International Scientific VisibilityImproving Web Information Architecture & International Scientific Visibility
Improving Web Information Architecture & International Scientific VisibilityMASmedios com
 
LSE SADL workshop 1 2013
LSE SADL workshop 1 2013LSE SADL workshop 1 2013
LSE SADL workshop 1 2013LSESADL
 
Launching Your Professional Career with a B.A. in Sociology or Anthropology f...
Launching Your Professional Career with a B.A. in Sociology or Anthropology f...Launching Your Professional Career with a B.A. in Sociology or Anthropology f...
Launching Your Professional Career with a B.A. in Sociology or Anthropology f...Melanie E. Coulson, M.A.
 
Higher Education University Websites: Improving Information Architecture & Sc...
Higher Education University Websites: Improving Information Architecture & Sc...Higher Education University Websites: Improving Information Architecture & Sc...
Higher Education University Websites: Improving Information Architecture & Sc...Jorge Serrano-Cobos
 
Student Activity Hub community Meeting 10-25-2017
Student Activity Hub community Meeting 10-25-2017Student Activity Hub community Meeting 10-25-2017
Student Activity Hub community Meeting 10-25-2017Brett Pollak
 
SPLASH-Spring11
SPLASH-Spring11SPLASH-Spring11
SPLASH-Spring11secrockett
 
Resumes, Job Applications, and Interviewing
Resumes, Job Applications, and InterviewingResumes, Job Applications, and Interviewing
Resumes, Job Applications, and InterviewingBonner Foundation
 
Careers outside Academia - USC Computer Science Masters and Ph.D. Students
Careers outside Academia - USC Computer Science Masters and Ph.D. StudentsCareers outside Academia - USC Computer Science Masters and Ph.D. Students
Careers outside Academia - USC Computer Science Masters and Ph.D. StudentsAshwin Rao
 
Ph.D. vs Ed.D.: Which Degree is Right For You?
Ph.D. vs Ed.D.: Which Degree is Right For You?Ph.D. vs Ed.D.: Which Degree is Right For You?
Ph.D. vs Ed.D.: Which Degree is Right For You?TridentCADC
 
Emerging and Evolving Occupations
Emerging and Evolving OccupationsEmerging and Evolving Occupations
Emerging and Evolving OccupationsJudy Scherer
 
NSF-GRFP: What you need to know
NSF-GRFP: What you need to knowNSF-GRFP: What you need to know
NSF-GRFP: What you need to knowKelsey Wood
 
Using Online Learner Readiness to Enhance Student Satisfaction and Retention
Using Online Learner Readiness to Enhance Student Satisfaction and RetentionUsing Online Learner Readiness to Enhance Student Satisfaction and Retention
Using Online Learner Readiness to Enhance Student Satisfaction and RetentioneLearningToolBox.com
 
Effective networking: social media
Effective networking: social mediaEffective networking: social media
Effective networking: social mediaRCSI Library
 
LSE SADL Workshop 1 2014
LSE SADL Workshop 1 2014LSE SADL Workshop 1 2014
LSE SADL Workshop 1 2014LSESADL
 
College 101 - 2006 Presentation
College 101 - 2006 PresentationCollege 101 - 2006 Presentation
College 101 - 2006 PresentationLaura Whited
 
Linked In
Linked InLinked In
Linked Inplblum
 

Ähnlich wie Qcon SF 2013 (20)

Getting Found - Using Social Media To Build Your Research Profile: Research N...
Getting Found - Using Social Media To Build Your Research Profile: Research N...Getting Found - Using Social Media To Build Your Research Profile: Research N...
Getting Found - Using Social Media To Build Your Research Profile: Research N...
 
Improving Web Information Architecture & International Scientific Visibility
Improving Web Information Architecture & International Scientific VisibilityImproving Web Information Architecture & International Scientific Visibility
Improving Web Information Architecture & International Scientific Visibility
 
LSE SADL workshop 1 2013
LSE SADL workshop 1 2013LSE SADL workshop 1 2013
LSE SADL workshop 1 2013
 
Launching Your Professional Career with a B.A. in Sociology or Anthropology f...
Launching Your Professional Career with a B.A. in Sociology or Anthropology f...Launching Your Professional Career with a B.A. in Sociology or Anthropology f...
Launching Your Professional Career with a B.A. in Sociology or Anthropology f...
 
Higher Education University Websites: Improving Information Architecture & Sc...
Higher Education University Websites: Improving Information Architecture & Sc...Higher Education University Websites: Improving Information Architecture & Sc...
Higher Education University Websites: Improving Information Architecture & Sc...
 
Student Activity Hub community Meeting 10-25-2017
Student Activity Hub community Meeting 10-25-2017Student Activity Hub community Meeting 10-25-2017
Student Activity Hub community Meeting 10-25-2017
 
SPLASH-Spring11
SPLASH-Spring11SPLASH-Spring11
SPLASH-Spring11
 
Resumes, Job Applications, and Interviewing
Resumes, Job Applications, and InterviewingResumes, Job Applications, and Interviewing
Resumes, Job Applications, and Interviewing
 
Careers outside Academia - USC Computer Science Masters and Ph.D. Students
Careers outside Academia - USC Computer Science Masters and Ph.D. StudentsCareers outside Academia - USC Computer Science Masters and Ph.D. Students
Careers outside Academia - USC Computer Science Masters and Ph.D. Students
 
Ph.D. vs Ed.D.: Which Degree is Right For You?
Ph.D. vs Ed.D.: Which Degree is Right For You?Ph.D. vs Ed.D.: Which Degree is Right For You?
Ph.D. vs Ed.D.: Which Degree is Right For You?
 
Emerging and Evolving Occupations
Emerging and Evolving OccupationsEmerging and Evolving Occupations
Emerging and Evolving Occupations
 
NSF-GRFP: What you need to know
NSF-GRFP: What you need to knowNSF-GRFP: What you need to know
NSF-GRFP: What you need to know
 
Using Online Learner Readiness to Enhance Student Satisfaction and Retention
Using Online Learner Readiness to Enhance Student Satisfaction and RetentionUsing Online Learner Readiness to Enhance Student Satisfaction and Retention
Using Online Learner Readiness to Enhance Student Satisfaction and Retention
 
Resume CV Workshop for STEM Majors Recent
Resume CV Workshop for STEM Majors RecentResume CV Workshop for STEM Majors Recent
Resume CV Workshop for STEM Majors Recent
 
Zero to admitted2012
Zero to admitted2012Zero to admitted2012
Zero to admitted2012
 
Effective networking: social media
Effective networking: social mediaEffective networking: social media
Effective networking: social media
 
LSE SADL Workshop 1 2014
LSE SADL Workshop 1 2014LSE SADL Workshop 1 2014
LSE SADL Workshop 1 2014
 
College 101 - 2006 Presentation
College 101 - 2006 PresentationCollege 101 - 2006 Presentation
College 101 - 2006 Presentation
 
Liverpool 2018 presentation
Liverpool 2018 presentationLiverpool 2018 presentation
Liverpool 2018 presentation
 
Linked In
Linked InLinked In
Linked In
 

Kürzlich hochgeladen

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 

Kürzlich hochgeladen (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 

Qcon SF 2013

  • 1. Data Science for Higher Ed Gloria Lau Manager, Data Science @ LinkedIn
  • 2.
  • 3. LinkedIn data. For students*. *prospective students, current students and recent graduates
  • 4. WHY? We have career outcome data to derive better insights about higher education
  • 5. Common questions from user studies Prospective students: I want to be a pediatrician. Where should I go to school? I don’t know what I want but I am an A student. So? Current students: Show me the internship / job opportunities. Should I double / change major? Recent graduates: Show me the job opportunities. Should I consider further education?
  • 6. The Answer for the type A’s Show me the career outcome data per school / field of study / degree
  • 7. The Answer for the exploratory kind Show me the career outcome data in a form that allows for serendipitous discoveries  build me some data products to help me draw insights from aggregate data  build me some data products that are delightful
  • 8. OK! Let’s start building some data products for students! type A’s and non type A’s, we have answers for you
  • 9.
  • 12. Data Science for Higher Ed A case study From plumbing to fixture. From standardization to delightful data products.
  • 13. Standardization • Standardization is about understanding our data, and building the foundational layer that maps <school_name> to <school_id> so that we can build data products on top • Entity resolution • Recognizable entities • Typeahead
  • 14. Entity Resolution • User types in University of California, Berkeley  easy • User types in UCB  hard / ambiguous
  • 15. Entity Resolution • Name feature: fuzzy match, edit distance, prefix match, etc • Profile feature: email, groups, etc • Network feature: connections, invitations, etc
  • 16. Recognizable entities • User types in University of California, Berkeley  easy • User types in UCB  hard / ambiguous / alias not understood • User types in 東京大学  harder / canonical name not understood
  • 17. Recognizable entities • You don’t know what you don’t know • Your standardization is only as good as your recognized dataset • LinkedIn data is very global
  • 18. Recognizable entities • IPEDS for US school data • Crowdsourcing for non-US school + government data • • internal and external with schema spec’ed out Alias – bootstrap from member data
  • 19. Typeahead • Plug the hole from the front(-end) as soon as you can • Invest in a good typeahead early on so that you don’t even need to standardize • Helps standardization rate tremendously • Make sure you have aliases and localized strings in your typeahead
  • 20. Plumbing? checked Onto building delightful* data products *The level of delightfulness is directly correlated to how good your standardization layer is.
  • 21. Similar Schools Serendipitous discoveries. Sideways browse. Based on career outcome data + some more.
  • 23. Similar schools • Aggregate profile per school based on alumni data • Industry, job title, job function, company, skills, etc • Feature engineering and balancing • Dot-product of 2 aggregate profiles = school similarity
  • 24. Similar schools – issues • Observation #1: similarity identified between tiny specialized schools and big research institutions • Observation #2: similarity identified between non-US specialized schools and big US research institutions
  • 26. Similar schools - issues • Observation: no data • New community colleges and non-US schools have very sparse data • Solution: attribute-based similarity • From IPEDS and crowdsourced data Kyoritsu Women's University
  • 28.
  • 29. Notable Alumni • Who’s notable? • Wikipedia match • • • School standardization Name mapping Success stories
  • 30. Who’s notable – Wikipedia stories …
  • 31. Wikipedia stories • Lightweight school standardization • • Name mapping • • ✓ Name feature ✕ profile feature ✕ network feature Even when you are notable, your name isn’t unique Crowdsourcing for evaluation • Profile from LinkedIn vs profile from Wikipedia
  • 33. Are we done? Do we have notable alumni for all schools? Similar issue like similar schools – data sparseness
  • 34. Who’s notable - Success stories • Many schools don’t have notable alumni section in Wikipedia • Success stories based on LinkedIn data • Features of success • • • CXO’s at Fortune companies Generalizes to high seniority at top companies But what does it mean to be • • Senior • • A top company An alum They all depend on…
  • 35. Standardization • Degree standardization - alumni • Company standardization • • IBM vs international brotherhood of magicians Title & seniority standardization • founder of the gloria lau franchise vs founder of LinkedIn • VP in financial sector vs VP in software engineering industry
  • 36. Evaluation – I know it when I see it
  • 37. INSIGHTS: unique & standardized data to describe schools. similar schools. notable alumni. to drive STUDENT DECISIONS

Hinweis der Redaktion

  1. Students and recent graduates are the fastest growing segment at Linkedin
  2. Princeton’s data
  3. Invest in getting good dataset globally Members grow into new markets
  4. Go after government datasets if you can
  5. Be smart – building out your vocabulary could have unexpected effects on your typeahead performance
  6. Former NFL players turn realtors
  7. Yoga class vs MBA at stanford – degree standardization