SlideShare a Scribd company logo
1 of 56
@neal_lathia
computer laboratory: university of cambridge
online   offline
urban
        data mining
 web




        urbanmining.wordpress.com
online
user data + algorithms → relevance ☺
public transport


user data + algorithms → relevance
“smart” cards
1 facilitate payment
2 collect user data
“smart” cards
time-stamped locations,
modality, payments,
user categories


anonymised with
persistent user ids
“smart” cards datasets
100% - 1 month
~5.1 million people
~78.8 million trips

5% - 2 x 83 days
~300k people
~7.7 million trips
Purchase Geography                                   Mobility Flow
45
                                                                                      Zone 1
                                          PAYG                                        Zone 2
40
                                          Travel Cards                                Zone 3
35                                                                                    Zone 4
                                                                                      Zone 5
30                                                                                    Zone 6
25

20

15

10

5                                                            arrive
0
     1   2   3       4    5    6      7        8         9
using transport data for...

    1 predicting disruption relevance
    2 personalised travel time
    3 fare purchase recommendation
can we use transport data for...

    1 predicting disruption relevance
      i.e., rank station importance correctly?
can we use transport data for...

       predicting disruption relevance
       i.e., rank station importance correctly?
       (where you will go in the future)
percentile ranking

0.0 (best)
…
0.5 (random)
…
1.0 (inverse)
percentile ranking

0.0 (best)
...
0.25 (rank stations by popularity)
...
0.5 (random)
…
1.0 (inverse)
percentile ranking

0.0 (best)
...
0.06 (factor in user's history)
...
0.25 (rank stations by popularity)
...
0.5 (random)
…
1.0 (inverse)
percentile ranking

0.0 (best)
…
0.05 (“those who touch in here also touch in at...”)
...
0.06 (factor in user's history)
...
0.25 (rank stations by popularity)
...
0.5 (random)
…
1.0 (inverse)
accurate ranking without

    1 explicitly asking
    2 network topology, rail schedule
using transport data for...

    1 predicting disruption relevance
    2 personalised travel time
can we use transport data for...

    2 predict your travel time
      i.e., time between touch in/out?
mean absolute error (minutes)

0.0 (best)
…
mean absolute error (minutes)

0.0 (best)
…
9.82 (time tabled)
mean absolute error (minutes)

0.0 (best)
…
3.30 (mean time)
...
9.82 (time tabled)
mean absolute error (minutes)

0.0 (best)
…
3.28 (“people who travel at this time...”)
3.30 (mean time)
...
9.82 (time tabled)
mean absolute error (minutes)

0.0 (best)
…
3.17 (“people who are as familiar as you...”)
3.28 (“people who travel at this time...”)
3.30 (mean time)
...
9.82 (time tabled)
mean absolute error (minutes)

0.0 (best)
…
3.13 (“your trips in the past...”)
3.17 (“people who are as familiar as you...”)
3.28 (“people who travel at this time...”)
3.30 (mean time)
...
9.82 (time tabled)
accurate predictions without

    1 explicitly asking
    2 network topology, rail schedule
    3 ongoing disruptions, delays
using transport data for...

    1 predicting disruption relevance
    2 personalised travel time
    3 fare purchase recommendation
30
                                                                                    Purchase Behaviour
                                                                                                            Travel Cards
                                                                   25
                                                                                                            PAYG


                                                                   20




                                                     % Purchases
                                                                   15



                                                                   10



                                                                   5



                                                                   0
                                                                        Mon   Tue       Wed    Thu    Fri   Sat      Sun




45
             Purchase Geography
                                                                                      Mobility Flow
40
                                      PAYG                                                                        Zone 1
                                      Travel Cards                                                                Zone 2
35                                                   arrive                                                       Zone 3
30                                                                                                                Zone 4
                                                                                                                  Zone 5
25                                                                                                                Zone 6
20

15

10

5

0
     1   2   3   4    5    6      7       8     9
(a) high regularity in purchases & movements
(b) small increments, short terms
(c) purchase on refused entry?
are people making the right choice?
£200 million
     overspend
(a) failure to predict your movements
(b) failing to match mobility with fares
can we use transport data for...

    3 predict the fares you should buy
      i.e., what will be cheapest?
classification accuracy

0.0% (worst)
...
100% (oracle)
classification accuracy

0.0 (worst)
…
77% everyone on pay as you go
...
100% (oracle)
classification accuracy

0.0 (worst)
…
77% everyone on pay as you go
80% naïve bayes
...
100% (oracle)
classification accuracy

0.0 (worst)
…
77% everyone on pay as you go
80% naïve bayes
…
97% (“people like you should have bought...”)
100% (oracle)
classification accuracy

0.0 (worst)
…
77% everyone on pay as you go
80% naïve bayes
…
97% (“people like you should have bought...”)
98% decision trees
100% (oracle)
money saved

£0.0 (worst)
…
£326,447.95 everyone on pay as you go
£393,585.81 naïve bayes
…
£465,822.17 (“people like you...”)
£473,918.38 decision trees
£479,583.91 (oracle)
“smart” cards
1 facilitate payment
2 collect user data

3 enable powerful,
  personalised
  information systems
using transport data for...

    1 behaviours ~ policy & incentives
    2 community well-being
References
N. Lathia, J. Froehlich, L. Capra. Mining Public Transport Usage for Personalised Intelligent
Transport Systems. In IEEE International Conference on Data Mining. December 2010, Sydney,
Australia.

N. Lathia, C. Smith, J. Froehlich, L. Capra. Individuals Among Commuters: Building
Personalised Transport Information Systems from Fare Collection Systems. Under submission.

N. Lathia, L. Capra. Mining Mobility Data to Minimise Travellers' Spending on Public Transport.
In ACM International Conference on Knowledge Discovery and Data Mining. August 2011. San
Diego, USA.

N. Lathia, L. Capra. How Smart is Your Smart Card? Measuring Travel Behaviours,
Perceptions, and Incentives. In ACM International Conference on Ubiquitous Computing.
September 2011. Beijing, China.

N. Lathia, D. Quercia, J. Crowcroft. The Hidden Image of the City: Sensing Community Well-
Being from Urban Mobility. To Appear, 10th International Conference on Pervasive Computing.
June 2012. Newcastle, UK.

More Related Content

Viewers also liked

issb experience of one student
issb experience of one studentissb experience of one student
issb experience of one student
Omair Ayaz
 
PSYCHOLOGICAL TESTS AT ISSB
PSYCHOLOGICAL TESTS AT ISSBPSYCHOLOGICAL TESTS AT ISSB
PSYCHOLOGICAL TESTS AT ISSB
Omair Ayaz
 

Viewers also liked (9)

Ameria Group: Investor Relations Presentation Q2 2014
Ameria Group: Investor Relations Presentation Q2 2014 Ameria Group: Investor Relations Presentation Q2 2014
Ameria Group: Investor Relations Presentation Q2 2014
 
Pmc profile 22032016
Pmc profile 22032016Pmc profile 22032016
Pmc profile 22032016
 
la comunicacion
la comunicacionla comunicacion
la comunicacion
 
issb experience of one student
issb experience of one studentissb experience of one student
issb experience of one student
 
Final m3 online session 1 wbs3760 24.2.17
Final m3 online session 1 wbs3760 24.2.17Final m3 online session 1 wbs3760 24.2.17
Final m3 online session 1 wbs3760 24.2.17
 
Paris Redis Meetup Introduction
Paris Redis Meetup IntroductionParis Redis Meetup Introduction
Paris Redis Meetup Introduction
 
PSYCHOLOGICAL TESTS AT ISSB
PSYCHOLOGICAL TESTS AT ISSBPSYCHOLOGICAL TESTS AT ISSB
PSYCHOLOGICAL TESTS AT ISSB
 
CSI.SP: Valuating The Informal Built Environment by Daan van den Berg, Jasper...
CSI.SP: Valuating The Informal Built Environment by Daan van den Berg, Jasper...CSI.SP: Valuating The Informal Built Environment by Daan van den Berg, Jasper...
CSI.SP: Valuating The Informal Built Environment by Daan van den Berg, Jasper...
 
Podocarpus
PodocarpusPodocarpus
Podocarpus
 

More from Neal Lathia

Using Smartphones to Research Daily Life
Using Smartphones to Research Daily LifeUsing Smartphones to Research Daily Life
Using Smartphones to Research Daily Life
Neal Lathia
 

More from Neal Lathia (20)

Everything around the NLP (London.AI Feb 2021)
Everything around the NLP (London.AI Feb 2021)Everything around the NLP (London.AI Feb 2021)
Everything around the NLP (London.AI Feb 2021)
 
Using machine learning for customer service (Data Talks Club)
Using machine learning for customer service (Data Talks Club)Using machine learning for customer service (Data Talks Club)
Using machine learning for customer service (Data Talks Club)
 
Using language models to supercharge Monzo’s customer support
 Using language models to supercharge Monzo’s customer support Using language models to supercharge Monzo’s customer support
Using language models to supercharge Monzo’s customer support
 
Making Better Decisions Faster
Making Better Decisions FasterMaking Better Decisions Faster
Making Better Decisions Faster
 
Machine Learning, Faster
Machine Learning, FasterMachine Learning, Faster
Machine Learning, Faster
 
AI & Personalised Experiences
AI & Personalised ExperiencesAI & Personalised Experiences
AI & Personalised Experiences
 
Opportunities & Challenges in Personalised Travel
Opportunities & Challenges in Personalised TravelOpportunities & Challenges in Personalised Travel
Opportunities & Challenges in Personalised Travel
 
Bootstrapping a Destination Recommendation Engine
Bootstrapping a Destination Recommendation EngineBootstrapping a Destination Recommendation Engine
Bootstrapping a Destination Recommendation Engine
 
Machine Learning for Product Managers
Machine Learning for Product ManagersMachine Learning for Product Managers
Machine Learning for Product Managers
 
Mining Smartphone Data (with Python)
Mining Smartphone Data (with Python)Mining Smartphone Data (with Python)
Mining Smartphone Data (with Python)
 
Happier and Healthier with Smartphone Data
Happier and Healthier with Smartphone DataHappier and Healthier with Smartphone Data
Happier and Healthier with Smartphone Data
 
Data Science in Digital Health
Data Science in Digital HealthData Science in Digital Health
Data Science in Digital Health
 
Using Smartphones to Measure (and Intervene in) Daily Life
Using Smartphones to Measure (and Intervene in) Daily LifeUsing Smartphones to Measure (and Intervene in) Daily Life
Using Smartphones to Measure (and Intervene in) Daily Life
 
Analysing Daily Behaviours with Large-Scale Smartphone Data
Analysing Daily Behaviours with Large-Scale Smartphone DataAnalysing Daily Behaviours with Large-Scale Smartphone Data
Analysing Daily Behaviours with Large-Scale Smartphone Data
 
Cambridge Quantified Self Meetup
Cambridge Quantified Self MeetupCambridge Quantified Self Meetup
Cambridge Quantified Self Meetup
 
Data Science in #mHealth
Data Science in #mHealthData Science in #mHealth
Data Science in #mHealth
 
Tube Star: Crowd-Sourced Experiences on Public Transport
Tube Star: Crowd-Sourced Experiences on Public Transport Tube Star: Crowd-Sourced Experiences on Public Transport
Tube Star: Crowd-Sourced Experiences on Public Transport
 
Emotion Sense: From Design to Deployment
Emotion Sense: From Design to DeploymentEmotion Sense: From Design to Deployment
Emotion Sense: From Design to Deployment
 
Opportunities and Challenges of Using Smartphones for Health Monitoring and I...
Opportunities and Challenges of Using Smartphones for Health Monitoring and I...Opportunities and Challenges of Using Smartphones for Health Monitoring and I...
Opportunities and Challenges of Using Smartphones for Health Monitoring and I...
 
Using Smartphones to Research Daily Life
Using Smartphones to Research Daily LifeUsing Smartphones to Research Daily Life
Using Smartphones to Research Daily Life
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 

Turning Oyster Cards into Information

  • 2. online offline
  • 3. urban data mining web urbanmining.wordpress.com
  • 4.
  • 5.
  • 6.
  • 7. online user data + algorithms → relevance ☺
  • 8.
  • 9.
  • 10.
  • 11. public transport user data + algorithms → relevance
  • 12. “smart” cards 1 facilitate payment 2 collect user data
  • 13. “smart” cards time-stamped locations, modality, payments, user categories anonymised with persistent user ids
  • 14. “smart” cards datasets 100% - 1 month ~5.1 million people ~78.8 million trips 5% - 2 x 83 days ~300k people ~7.7 million trips
  • 15.
  • 16.
  • 17.
  • 18. Purchase Geography Mobility Flow 45 Zone 1 PAYG Zone 2 40 Travel Cards Zone 3 35 Zone 4 Zone 5 30 Zone 6 25 20 15 10 5 arrive 0 1 2 3 4 5 6 7 8 9
  • 19. using transport data for... 1 predicting disruption relevance 2 personalised travel time 3 fare purchase recommendation
  • 20.
  • 21. can we use transport data for... 1 predicting disruption relevance i.e., rank station importance correctly?
  • 22. can we use transport data for... predicting disruption relevance i.e., rank station importance correctly? (where you will go in the future)
  • 23. percentile ranking 0.0 (best) … 0.5 (random) … 1.0 (inverse)
  • 24. percentile ranking 0.0 (best) ... 0.25 (rank stations by popularity) ... 0.5 (random) … 1.0 (inverse)
  • 25. percentile ranking 0.0 (best) ... 0.06 (factor in user's history) ... 0.25 (rank stations by popularity) ... 0.5 (random) … 1.0 (inverse)
  • 26. percentile ranking 0.0 (best) … 0.05 (“those who touch in here also touch in at...”) ... 0.06 (factor in user's history) ... 0.25 (rank stations by popularity) ... 0.5 (random) … 1.0 (inverse)
  • 27. accurate ranking without 1 explicitly asking 2 network topology, rail schedule
  • 28. using transport data for... 1 predicting disruption relevance 2 personalised travel time
  • 29.
  • 30.
  • 31.
  • 32. can we use transport data for... 2 predict your travel time i.e., time between touch in/out?
  • 33. mean absolute error (minutes) 0.0 (best) …
  • 34. mean absolute error (minutes) 0.0 (best) … 9.82 (time tabled)
  • 35. mean absolute error (minutes) 0.0 (best) … 3.30 (mean time) ... 9.82 (time tabled)
  • 36. mean absolute error (minutes) 0.0 (best) … 3.28 (“people who travel at this time...”) 3.30 (mean time) ... 9.82 (time tabled)
  • 37. mean absolute error (minutes) 0.0 (best) … 3.17 (“people who are as familiar as you...”) 3.28 (“people who travel at this time...”) 3.30 (mean time) ... 9.82 (time tabled)
  • 38. mean absolute error (minutes) 0.0 (best) … 3.13 (“your trips in the past...”) 3.17 (“people who are as familiar as you...”) 3.28 (“people who travel at this time...”) 3.30 (mean time) ... 9.82 (time tabled)
  • 39. accurate predictions without 1 explicitly asking 2 network topology, rail schedule 3 ongoing disruptions, delays
  • 40. using transport data for... 1 predicting disruption relevance 2 personalised travel time 3 fare purchase recommendation
  • 41. 30 Purchase Behaviour Travel Cards 25 PAYG 20 % Purchases 15 10 5 0 Mon Tue Wed Thu Fri Sat Sun 45 Purchase Geography Mobility Flow 40 PAYG Zone 1 Travel Cards Zone 2 35 arrive Zone 3 30 Zone 4 Zone 5 25 Zone 6 20 15 10 5 0 1 2 3 4 5 6 7 8 9
  • 42. (a) high regularity in purchases & movements (b) small increments, short terms (c) purchase on refused entry?
  • 43. are people making the right choice?
  • 44. £200 million overspend
  • 45. (a) failure to predict your movements (b) failing to match mobility with fares
  • 46. can we use transport data for... 3 predict the fares you should buy i.e., what will be cheapest?
  • 48. classification accuracy 0.0 (worst) … 77% everyone on pay as you go ... 100% (oracle)
  • 49. classification accuracy 0.0 (worst) … 77% everyone on pay as you go 80% naïve bayes ... 100% (oracle)
  • 50. classification accuracy 0.0 (worst) … 77% everyone on pay as you go 80% naïve bayes … 97% (“people like you should have bought...”) 100% (oracle)
  • 51. classification accuracy 0.0 (worst) … 77% everyone on pay as you go 80% naïve bayes … 97% (“people like you should have bought...”) 98% decision trees 100% (oracle)
  • 52. money saved £0.0 (worst) … £326,447.95 everyone on pay as you go £393,585.81 naïve bayes … £465,822.17 (“people like you...”) £473,918.38 decision trees £479,583.91 (oracle)
  • 53. “smart” cards 1 facilitate payment 2 collect user data 3 enable powerful, personalised information systems
  • 54.
  • 55. using transport data for... 1 behaviours ~ policy & incentives 2 community well-being
  • 56. References N. Lathia, J. Froehlich, L. Capra. Mining Public Transport Usage for Personalised Intelligent Transport Systems. In IEEE International Conference on Data Mining. December 2010, Sydney, Australia. N. Lathia, C. Smith, J. Froehlich, L. Capra. Individuals Among Commuters: Building Personalised Transport Information Systems from Fare Collection Systems. Under submission. N. Lathia, L. Capra. Mining Mobility Data to Minimise Travellers' Spending on Public Transport. In ACM International Conference on Knowledge Discovery and Data Mining. August 2011. San Diego, USA. N. Lathia, L. Capra. How Smart is Your Smart Card? Measuring Travel Behaviours, Perceptions, and Incentives. In ACM International Conference on Ubiquitous Computing. September 2011. Beijing, China. N. Lathia, D. Quercia, J. Crowcroft. The Hidden Image of the City: Sensing Community Well- Being from Urban Mobility. To Appear, 10th International Conference on Pervasive Computing. June 2012. Newcastle, UK.