Rahul Shetty - Corporate relocation prediction - Codemotion Amsterdam 2019

•

0 gefällt mir•116 views

Whenever you do something that nobody has tried to do before, you often encounter difficulties, that well, nobody has found before! During this talk we will walk you through the data, the modelling, the problems that we encountered and solutions while working together with Graydon on predicting whether different kinds of companies will of relocate. For this use case, we analyzed 10 years of data for two million companies using around 100 descriptors / features, and produced a predictive model using decision trees and random forests on Google Cloud Platform.

Technologie

Corporate
Relocation Prediction
A 10 Year, 2 million company use
case
Rahul Shetty, Ana Maldonado, Mauricio Rodriguez Lara

Relocation Prediction use case
Problem:
businesses, schools, hospitals, etc. move locations over time
(growth, bankruptcy, new markets, etc)
Can we predict if they will relocate?
- To where?
- When?
- Why?
=> For now, we focus only on
relocation probability

For businesses we have historical Corporate Data:
- Company size, credit rating, relocation, etc...
=> Can company characteristics predict relocation?
- Useful information for service providers, realtors, city councils,
investors and developers
=> Investigatory POC: 6 week study
- Limit the scope to determine if relocation can be predicted, and
if so, which properties can be a signal
Relocation Prediction use case

(Big) Data
We encountered some challenges:
- Monthly data from branches of 2 million companies, going back
10 years… ~ 300 million rows
- Disperse Data: where/how should it be gathered?
- Monthly data too granular: how to aggregate?
- Client did not have a suitable platform for data handling and
analysis...

Data & Modeling Considerations
- High dimensional time series data
- Preserve the temporal granularity to maximize information
- Neural Networks?
- LSTM or CNNs?
- NN design/exploration time > available time
- Simplify data and modeling due to time constraints

Preparing the Data
- Step 1: Collect the data on an appropriate platform:
- Set up Google Cloud platform in one week
- Step 2: Aggregate the Data
- From monthly to yearly: predicted relocation based on
yearly data
- Choose how to deal with categorical variables
- Subsequent Steps : Spawn virtual machine(s) on GCP for
modeling

Summary Statistics
- Final dataset: 75 features from 1 year and ‘has_relocated’
target from following year
- 2 million entries per year
- ~5% relocation (imbalanced dataset)
- Goal: Build a model that can predict ‘has_relocated’
better than randomly (better than 95% accurate)

Modeling step 1: Exploring Models
- Apply binary classification algorithms: SVM, logistic regression,
decision trees (DT), random forests (RF)
- Choose models with best performance: AUC, kappa
- DTs and RF did best
- Apply Sampling Techniques to improve models
- Tune model parameters
- Validate

Modeling step 2: ResultsTPR
FPR
AUC: 0.66
Best DT model produced
by undersampling data,
5-fold CV, and DT
parameters explored via
grid search

Modeling Results: Features
The most important
features having an
influence on a
‘has_relocated’ index
were related to:
- Company financial
assessments and
health
- Company age

Validation
How well can yearly models
predict the next year’s
relocation?
AUC

Validation
How well can yearly models
predict the next year’s
relocation?
… in general, rather well
AUC

Validation
How well can yearly models
predict the next year’s
relocation?
… in general, rather well
… except for 2016 (?)
AUC

- Company properties can be indicative of whether they relocate
- Yearly aggregated data is sufficient for high level indications of
relocation.
- More granular modeling (e.g. with NN) may provide additional
information
Take aways
- Possible to perform successful
POC on big data within 6 weeks
on GCP

Having had more time we would have:
- Full time series modeling
- NN, hierarchical modeling, etc...
- Automate prediction, given company characteristics
- Investigate anomalous year
- Make use of modeling results:
Future work

Any questions?
rshetty@qualogy.com
Thank you!

Weitere ähnliche Inhalte

Ähnlich wie Rahul Shetty - Corporate relocation prediction - Codemotion Amsterdam 2019

AnalyticsVishnu Rajendran C R

Risk dkdkocis

Body Media PosterSaurabh Shrivastava

Resume___Ajay_Prabhakar.Ajay Prabhakar

Resume Christian Cisneros SanchezChristian Cisneros

Maximize Efficiency with Minitab Workspace and Minitab Statistical Software -...Minitab, LLC

Building a better alll in 2015Libby Bierman

sheethal_kamathSheethal Kamath

Analytics and MBA is a great career choiceHimanshu Arora

HWZ-Darden Konferenz: Building a Sustainable Analytics OrientationHWZ Hochschule für Wirtschaft

Adept Change Management_Panna Visani 2015_1Panna Visani MBCS ACCA

Itm 2bSagar Kothurwar

WorldAtWorkConfernce_USBank_OS FINAL (no notes)Laura Roach

Retail Stores Network: Optimal Size & Geographical DistributionSotiris Athanassopoulos

Resume, references and degreesJuan Antonio Beis

What is analyticsshweta saxena

Operations analytics assignmentssuser58cd6d

Strategic Solutionsrnance

Finance Transformation for JABIL with IBM Cognos TM1Tridant

Resume Rohan Godbole Rohan Godbole

Ähnlich wie Rahul Shetty - Corporate relocation prediction - Codemotion Amsterdam 2019 (20)

Analytics

Risk dk

Body Media Poster

Resume___Ajay_Prabhakar.

Resume Christian Cisneros Sanchez

Maximize Efficiency with Minitab Workspace and Minitab Statistical Software -...

Building a better alll in 2015

sheethal_kamath

Analytics and MBA is a great career choice

HWZ-Darden Konferenz: Building a Sustainable Analytics Orientation

Adept Change Management_Panna Visani 2015_1

Itm 2b

WorldAtWorkConfernce_USBank_OS FINAL (no notes)

Retail Stores Network: Optimal Size & Geographical Distribution

Resume, references and degrees

What is analytics

Operations analytics assignment

Strategic Solutions

Finance Transformation for JABIL with IBM Cognos TM1

Resume Rohan Godbole

Mehr von Codemotion

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Codemotion

Pompili - From hero to_zero: The FatalNoise neverending storyCodemotion

Pastore - Commodore 65 - La storiaCodemotion

Pennisi - Essere Richard AltwasserCodemotion

Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Codemotion

Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Codemotion

Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Codemotion

Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 - Codemotion

Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Codemotion

Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Codemotion

Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Codemotion

Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Codemotion

Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Codemotion

Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Codemotion

Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Codemotion

James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...Codemotion

Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Codemotion

Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Codemotion

Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Codemotion

Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Codemotion

Mehr von Codemotion (20)

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...

Pompili - From hero to_zero: The FatalNoise neverending story

Pastore - Commodore 65 - La storia

Pennisi - Essere Richard Altwasser

Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...

Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019

Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019

Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -

Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...

Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...

Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...

Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...

Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019

Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019

Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019

James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...

Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...

Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019

Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019

Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019

Kürzlich hochgeladen

Artificial Intelligence: Facts and MythsJoaquim Jorge

Slack Application Development 101 Slidespraypatel2

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

A Year of the Servo Reboot: Where Are We Now?Igalia

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Histor y of HAM Radio presentation slidevu2urc

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Real Time Object Detection Using Open CVKhem

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Kürzlich hochgeladen (20)

Artificial Intelligence: Facts and Myths

Slack Application Development 101 Slides

Handwritten Text Recognition for manuscripts and early printed texts

A Year of the Servo Reboot: Where Are We Now?

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Breaking the Kubernetes Kill Chain: Host Path Mount

Powerful Google developer tools for immediate impact! (2023-24 C)

Axa Assurance Maroc - Insurer Innovation Award 2024

Histor y of HAM Radio presentation slide

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

08448380779 Call Girls In Friends Colony Women Seeking Men

Data Cloud, More than a CDP by Matt Robison

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Real Time Object Detection Using Open CV

What Are The Drone Anti-jamming Systems Technology?

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

IAC 2024 - IA Fast Track to Search Focused AI Solutions

CNv6 Instructor Chapter 6 Quality of Service

How to Troubleshoot Apps for the Modern Connected Worker

Rahul Shetty - Corporate relocation prediction - Codemotion Amsterdam 2019

1. Corporate Relocation Prediction A 10 Year, 2 million company use case Rahul Shetty, Ana Maldonado, Mauricio Rodriguez Lara

2. Who We Are

3. Who We Are

4. Relocation Prediction use case Problem: businesses, schools, hospitals, etc. move locations over time (growth, bankruptcy, new markets, etc) Can we predict if they will relocate? - To where? - When? - Why? => For now, we focus only on relocation probability

5. For businesses we have historical Corporate Data: - Company size, credit rating, relocation, etc... => Can company characteristics predict relocation? - Useful information for service providers, realtors, city councils, investors and developers => Investigatory POC: 6 week study - Limit the scope to determine if relocation can be predicted, and if so, which properties can be a signal Relocation Prediction use case

6. (Big) Data We encountered some challenges: - Monthly data from branches of 2 million companies, going back 10 years… ~ 300 million rows - Disperse Data: where/how should it be gathered? - Monthly data too granular: how to aggregate? - Client did not have a suitable platform for data handling and analysis...

7. Data & Modeling Considerations - High dimensional time series data - Preserve the temporal granularity to maximize information - Neural Networks? - LSTM or CNNs? - NN design/exploration time > available time - Simplify data and modeling due to time constraints

8. Preparing the Data - Step 1: Collect the data on an appropriate platform: - Set up Google Cloud platform in one week - Step 2: Aggregate the Data - From monthly to yearly: predicted relocation based on yearly data - Choose how to deal with categorical variables - Subsequent Steps : Spawn virtual machine(s) on GCP for modeling

9. Summary Statistics - Final dataset: 75 features from 1 year and ‘has_relocated’ target from following year - 2 million entries per year - ~5% relocation (imbalanced dataset) - Goal: Build a model that can predict ‘has_relocated’ better than randomly (better than 95% accurate)

10. Modeling step 1: Exploring Models - Apply binary classification algorithms: SVM, logistic regression, decision trees (DT), random forests (RF) - Choose models with best performance: AUC, kappa - DTs and RF did best - Apply Sampling Techniques to improve models - Tune model parameters - Validate

11. Modeling step 2: ResultsTPR FPR AUC: 0.66 Best DT model produced by undersampling data, 5-fold CV, and DT parameters explored via grid search

12. Modeling Results: Features The most important features having an influence on a ‘has_relocated’ index were related to: - Company financial assessments and health - Company age

13. Validation How well can yearly models predict the next year’s relocation? AUC

14. Validation How well can yearly models predict the next year’s relocation? … in general, rather well AUC

15. Validation How well can yearly models predict the next year’s relocation? … in general, rather well … except for 2016 (?) AUC

16. - Company properties can be indicative of whether they relocate - Yearly aggregated data is sufficient for high level indications of relocation. - More granular modeling (e.g. with NN) may provide additional information Take aways - Possible to perform successful POC on big data within 6 weeks on GCP

17. Having had more time we would have: - Full time series modeling - NN, hierarchical modeling, etc... - Automate prediction, given company characteristics - Investigate anomalous year - Make use of modeling results: Future work

18. Any questions? rshetty@qualogy.com Thank you!

Rahul Shetty - Corporate relocation prediction - Codemotion Amsterdam 2019

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Rahul Shetty - Corporate relocation prediction - Codemotion Amsterdam 2019

Ähnlich wie Rahul Shetty - Corporate relocation prediction - Codemotion Amsterdam 2019 (20)

Mehr von Codemotion

Mehr von Codemotion (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Rahul Shetty - Corporate relocation prediction - Codemotion Amsterdam 2019