SlideShare ist ein Scribd-Unternehmen logo
1 von 22
ETA
Prediction
Challenge
@DataHack
A taxi goes from
Chinatown to Times
Square. How long will
it take to arrive?
Taxi challenge by @Final
In this challenge, you are given data
on taxi rides in New York, containing
information on each ride such as the
start and end points, date, time of
day, distance, etc...
Our purpose is to predict the travel
time (in logarithmic scale) of a ride.
The data is split to train and test sets,
and we can use both general data of
the ride with local data on similar
rides from the train set.
Data : Goal :
Ride Information - Given Dataset
● From / To coordinates (lon, lat)
● Departure timestamp
● Trip distance (road distance)
● Vendor - Taxi company (Found to be not important)
● Passenger count (Found to be not important)
Data Wizard, expert in big
data processing and
production ready ML.
Googler, Wazer and traffic
analytics expert.
Data Ninja, a pure
professional in every data
spect from gathering and
exploring to modeling.
Kaggle Master
An innovative ML expert
and programmer, expert in
feature engineering and
selection techniques.
Kaggle Master
The Team
A group of talented and creative world class professionals in ML and Traffic Analytics
Nir Malbin Gad Benram Seffi Cohen
CDS (Chief Data Scientist)
for the Israeli Defense
Forces, and pioneer in ML
ensemble techniques.
Kaggle Master
Daniel Marcous
How it works
Dataset
Train
(Train model on)
Test
(Make prediction on)
Public score
(30%)
Private score
(70%)
Reminders
The Metric
Mean Square Error (log values) / Variance (constant)
sum((y-y')^2) / sum((y-avg(y))^2)
Notes :
● Interesting part :
𝚺((real-pred)^2) ~ Least Squares
● Log values
○ Mistake weight goes down for longer rides
○ Mistake is determined by error percentile -
10 minutes mistake on a 20 minutes ride
matters the same as
20 minutes mistake on a 40 minutes ride.
{log(a)-log(b)=log(a/b)}
Predicting ETA Using :
1. Ride Information
2. Environment
3. Geography
4. Inferred States
Machine
Learning
Data Shortage
● We Don’t Have
○ Historical speeds
○ Real Time speeds
● Box coordinates to NYC (remove 0.0 etc.)
● Remove very long / far rides (>2h/65km)
● Remove unreasonable speed / time-distance ratio
○ Remove 5% anomalies (Top & Bottom)
Data Cleaning
New feature - Abnormal Ride
Feature Engineering
Datetime based features
● Month start / end
● Day / Day of week / Hour / 15 Minute interval
● Is weekend / business day
● Is work hour (09:00-17:00)
● Is rush hour (morning / afternoon)
● Is holiday
● Coordinate Transformations (Coding directionality)
○ PCA 2 2
○ RBF
● Spheric (geo) distance
● Distance percentile
● Spheric-Trip distance ratio
Location based features
City based features
● NYC Neighbourhood (pair crossing)
● Distance to points of interest (100X2)
○ Schools / Hospitals / Parks etc.
PCA 2
Weather based features
● Temperature
● Events - Rain / Snow etc.
● Humidity
● Wind
● Visibility
● Min / Max / Avg / std etc.
PCA 2
Inferred Traffic based
features
PCA 1
● Assumption :
our data is a representative sample
of the NYC’s - “driving population”
● Crowdedness
○ #rides in X radius
■ 100 / 500 / 1500 / 5000
■ Euclidean / Manhattan
News based
features
1. Crawling NYTimes
2. Topic Modeling
3. Finding topics correlated with ETA
4. Using top10 correlated topics as
features
a. Number of articles on a day for
every topic
Results
Caveats
● Timeseries future mixing
● Not exactly a metric that might be the most important
○ Same weight for positive / negative error
● Crowdedness - assumes that data is a representative sample of the total
car population
○ E.g. 2 times the samples equals 2 times the traffic
● Variance - taken from original validation dataset (constant)
Public Leader Board
Team Score
Team DCountdown 0.150041
Squanchers 0.160468
Noa's stars 0.165869
Aperture Science 0.167958
R-North 0.175308
TAU Deep Learning Lab 0.182602
Mr Terminal 0.193593
MTG 0.282009
SuperFish 0.302637
Summary
Machine Learning Approach
● Black Box (~ish)
● Harder to deploy
● Retrain when system changes
Advantages : Disadvantages :
● No manual tuning
● No complex heuristics
● Optimise different metrics
● Personalisation
● Taking different ideas and their
interactions into account as one

Weitere ähnliche Inhalte

Was ist angesagt?

Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringHJ van Veen
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learningdataalcott
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GANNAVER Engineering
 
Wine Quality Analysis Using Machine Learning
Wine Quality Analysis Using Machine LearningWine Quality Analysis Using Machine Learning
Wine Quality Analysis Using Machine LearningMahima -
 
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena SharovaUnsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena SharovaPyData
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsGabriel Moreira
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
 
Adaptation and Evaluation of Recommendationsfor Short-term Shopping Goals
Adaptation and Evaluation of Recommendationsfor Short-term Shopping GoalsAdaptation and Evaluation of Recommendationsfor Short-term Shopping Goals
Adaptation and Evaluation of Recommendationsfor Short-term Shopping GoalsLukasLerche
 
Deep learning: challenges and applications
Deep learning: challenges and  applicationsDeep learning: challenges and  applications
Deep learning: challenges and applicationsAboul Ella Hassanien
 
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016Taehoon Kim
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection MLMaatougSelim
 
Reinforcement Learning 7. n-step Bootstrapping
Reinforcement Learning 7. n-step BootstrappingReinforcement Learning 7. n-step Bootstrapping
Reinforcement Learning 7. n-step BootstrappingSeung Jae Lee
 
Reinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo MethodsReinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo MethodsSeung Jae Lee
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-LearningKuppusamy P
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree LearningMilind Gokhale
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learningSubrat Panda, PhD
 

Was ist angesagt? (20)

Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learning
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Overfitting.pptx
Overfitting.pptxOverfitting.pptx
Overfitting.pptx
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
 
Wine Quality Analysis Using Machine Learning
Wine Quality Analysis Using Machine LearningWine Quality Analysis Using Machine Learning
Wine Quality Analysis Using Machine Learning
 
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena SharovaUnsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
Adaptation and Evaluation of Recommendationsfor Short-term Shopping Goals
Adaptation and Evaluation of Recommendationsfor Short-term Shopping GoalsAdaptation and Evaluation of Recommendationsfor Short-term Shopping Goals
Adaptation and Evaluation of Recommendationsfor Short-term Shopping Goals
 
Deep learning: challenges and applications
Deep learning: challenges and  applicationsDeep learning: challenges and  applications
Deep learning: challenges and applications
 
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection ML
 
Reinforcement Learning 7. n-step Bootstrapping
Reinforcement Learning 7. n-step BootstrappingReinforcement Learning 7. n-step Bootstrapping
Reinforcement Learning 7. n-step Bootstrapping
 
Reinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo MethodsReinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo Methods
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 

Ähnlich wie Prediction of taxi rides ETA

Kharita: Robust Road Map Inference Through Network Alignment of Trajectories
Kharita: Robust Road Map Inference Through Network Alignment of TrajectoriesKharita: Robust Road Map Inference Through Network Alignment of Trajectories
Kharita: Robust Road Map Inference Through Network Alignment of Trajectoriesvipyoung
 
Webinar: Using smart card and GPS data for policy and planning: the case of T...
Webinar: Using smart card and GPS data for policy and planning: the case of T...Webinar: Using smart card and GPS data for policy and planning: the case of T...
Webinar: Using smart card and GPS data for policy and planning: the case of T...BRTCoE
 
Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...
Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...
Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...Biplav Srivastava
 
Theme 3b Users perspective of integrated transit systems
Theme 3b Users perspective of integrated transit systemsTheme 3b Users perspective of integrated transit systems
Theme 3b Users perspective of integrated transit systemsBRTCoE
 
Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...
Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...
Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...csandit
 
Crunching Gigabytes Locally
Crunching Gigabytes LocallyCrunching Gigabytes Locally
Crunching Gigabytes LocallyDima Korolev
 
Machine Learning Approach to Report Prioritization with an ...
Machine Learning Approach to Report Prioritization with an ...Machine Learning Approach to Report Prioritization with an ...
Machine Learning Approach to Report Prioritization with an ...butest
 
Cab travel time prediction using ensemble models
Cab travel time prediction using ensemble modelsCab travel time prediction using ensemble models
Cab travel time prediction using ensemble modelsAyan Sengupta
 
Insight_Project_Presentation
Insight_Project_PresentationInsight_Project_Presentation
Insight_Project_Presentationdforthomme
 
3rd Conference on Sustainable Urban Mobility
3rd Conference on Sustainable Urban Mobility3rd Conference on Sustainable Urban Mobility
3rd Conference on Sustainable Urban MobilityLIFE GreenYourMove
 
Chapter 3&4
Chapter 3&4Chapter 3&4
Chapter 3&4EWIT
 
Supply chain logistics : vehicle routing and scheduling
Supply chain logistics : vehicle  routing and  schedulingSupply chain logistics : vehicle  routing and  scheduling
Supply chain logistics : vehicle routing and schedulingRetigence Technologies
 
Spark Summit EU talk by Javier Aguedes
Spark Summit EU talk by Javier AguedesSpark Summit EU talk by Javier Aguedes
Spark Summit EU talk by Javier AguedesSpark Summit
 
IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light...
IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light...IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light...
IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light...Yamato OKAMOTO
 
Replacing Manhattan Subway Service with On-demand transportation
Replacing Manhattan Subway Service with On-demand transportationReplacing Manhattan Subway Service with On-demand transportation
Replacing Manhattan Subway Service with On-demand transportationChristian Moscardi
 
Christian Moscardi Presentation
Christian Moscardi PresentationChristian Moscardi Presentation
Christian Moscardi PresentationJoseph Chow
 
KTH-Texxi Project 2010
KTH-Texxi Project 2010KTH-Texxi Project 2010
KTH-Texxi Project 2010Texxi Global
 
Participatory Project
Participatory ProjectParticipatory Project
Participatory Project#Xiao Zhe#
 

Ähnlich wie Prediction of taxi rides ETA (20)

Kharita: Robust Road Map Inference Through Network Alignment of Trajectories
Kharita: Robust Road Map Inference Through Network Alignment of TrajectoriesKharita: Robust Road Map Inference Through Network Alignment of Trajectories
Kharita: Robust Road Map Inference Through Network Alignment of Trajectories
 
Webinar: Using smart card and GPS data for policy and planning: the case of T...
Webinar: Using smart card and GPS data for policy and planning: the case of T...Webinar: Using smart card and GPS data for policy and planning: the case of T...
Webinar: Using smart card and GPS data for policy and planning: the case of T...
 
Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...
Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...
Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...
 
Analysing road traffic
Analysing road trafficAnalysing road traffic
Analysing road traffic
 
Theme 3b Users perspective of integrated transit systems
Theme 3b Users perspective of integrated transit systemsTheme 3b Users perspective of integrated transit systems
Theme 3b Users perspective of integrated transit systems
 
Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...
Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...
Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...
 
Crunching Gigabytes Locally
Crunching Gigabytes LocallyCrunching Gigabytes Locally
Crunching Gigabytes Locally
 
Machine Learning Approach to Report Prioritization with an ...
Machine Learning Approach to Report Prioritization with an ...Machine Learning Approach to Report Prioritization with an ...
Machine Learning Approach to Report Prioritization with an ...
 
Cab travel time prediction using ensemble models
Cab travel time prediction using ensemble modelsCab travel time prediction using ensemble models
Cab travel time prediction using ensemble models
 
Insight_Project_Presentation
Insight_Project_PresentationInsight_Project_Presentation
Insight_Project_Presentation
 
3rd Conference on Sustainable Urban Mobility
3rd Conference on Sustainable Urban Mobility3rd Conference on Sustainable Urban Mobility
3rd Conference on Sustainable Urban Mobility
 
Chapter 3&4
Chapter 3&4Chapter 3&4
Chapter 3&4
 
Supply chain logistics : vehicle routing and scheduling
Supply chain logistics : vehicle  routing and  schedulingSupply chain logistics : vehicle  routing and  scheduling
Supply chain logistics : vehicle routing and scheduling
 
Spark Summit EU talk by Javier Aguedes
Spark Summit EU talk by Javier AguedesSpark Summit EU talk by Javier Aguedes
Spark Summit EU talk by Javier Aguedes
 
IV2021-431-slides.pdf
IV2021-431-slides.pdfIV2021-431-slides.pdf
IV2021-431-slides.pdf
 
IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light...
IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light...IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light...
IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light...
 
Replacing Manhattan Subway Service with On-demand transportation
Replacing Manhattan Subway Service with On-demand transportationReplacing Manhattan Subway Service with On-demand transportation
Replacing Manhattan Subway Service with On-demand transportation
 
Christian Moscardi Presentation
Christian Moscardi PresentationChristian Moscardi Presentation
Christian Moscardi Presentation
 
KTH-Texxi Project 2010
KTH-Texxi Project 2010KTH-Texxi Project 2010
KTH-Texxi Project 2010
 
Participatory Project
Participatory ProjectParticipatory Project
Participatory Project
 

Mehr von Daniel Marcous

Cloud AI Platform Notebooks - Kaggle IL
Cloud AI Platform Notebooks - Kaggle ILCloud AI Platform Notebooks - Kaggle IL
Cloud AI Platform Notebooks - Kaggle ILDaniel Marcous
 
Towards Smart Transportation DSS 2018
Towards Smart Transportation DSS 2018Towards Smart Transportation DSS 2018
Towards Smart Transportation DSS 2018Daniel Marcous
 
Distributed Databases - Concepts & Architectures
Distributed Databases - Concepts & ArchitecturesDistributed Databases - Concepts & Architectures
Distributed Databases - Concepts & ArchitecturesDaniel Marcous
 
Distributed K-Betweenness (Spark)
Distributed K-Betweenness (Spark)Distributed K-Betweenness (Spark)
Distributed K-Betweenness (Spark)Daniel Marcous
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroDaniel Marcous
 
Big Data - Big Insights - Waze @Google
Big Data - Big Insights - Waze @GoogleBig Data - Big Insights - Waze @Google
Big Data - Big Insights - Waze @GoogleDaniel Marcous
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architecturesDaniel Marcous
 

Mehr von Daniel Marcous (10)

Cloud AI Platform Notebooks - Kaggle IL
Cloud AI Platform Notebooks - Kaggle ILCloud AI Platform Notebooks - Kaggle IL
Cloud AI Platform Notebooks - Kaggle IL
 
S2
S2S2
S2
 
Towards Smart Transportation DSS 2018
Towards Smart Transportation DSS 2018Towards Smart Transportation DSS 2018
Towards Smart Transportation DSS 2018
 
Distributed Databases - Concepts & Architectures
Distributed Databases - Concepts & ArchitecturesDistributed Databases - Concepts & Architectures
Distributed Databases - Concepts & Architectures
 
Distributed K-Betweenness (Spark)
Distributed K-Betweenness (Spark)Distributed K-Betweenness (Spark)
Distributed K-Betweenness (Spark)
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
 
Big Data - Big Insights - Waze @Google
Big Data - Big Insights - Waze @GoogleBig Data - Big Insights - Waze @Google
Big Data - Big Insights - Waze @Google
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architectures
 
Data Visualisation
Data VisualisationData Visualisation
Data Visualisation
 
Geo data analytics
Geo data analyticsGeo data analytics
Geo data analytics
 

Kürzlich hochgeladen

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Kürzlich hochgeladen (20)

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 

Prediction of taxi rides ETA

  • 2. A taxi goes from Chinatown to Times Square. How long will it take to arrive?
  • 3. Taxi challenge by @Final In this challenge, you are given data on taxi rides in New York, containing information on each ride such as the start and end points, date, time of day, distance, etc... Our purpose is to predict the travel time (in logarithmic scale) of a ride. The data is split to train and test sets, and we can use both general data of the ride with local data on similar rides from the train set. Data : Goal :
  • 4. Ride Information - Given Dataset ● From / To coordinates (lon, lat) ● Departure timestamp ● Trip distance (road distance) ● Vendor - Taxi company (Found to be not important) ● Passenger count (Found to be not important)
  • 5. Data Wizard, expert in big data processing and production ready ML. Googler, Wazer and traffic analytics expert. Data Ninja, a pure professional in every data spect from gathering and exploring to modeling. Kaggle Master An innovative ML expert and programmer, expert in feature engineering and selection techniques. Kaggle Master The Team A group of talented and creative world class professionals in ML and Traffic Analytics Nir Malbin Gad Benram Seffi Cohen CDS (Chief Data Scientist) for the Israeli Defense Forces, and pioneer in ML ensemble techniques. Kaggle Master Daniel Marcous
  • 6. How it works Dataset Train (Train model on) Test (Make prediction on) Public score (30%) Private score (70%)
  • 7. Reminders The Metric Mean Square Error (log values) / Variance (constant) sum((y-y')^2) / sum((y-avg(y))^2) Notes : ● Interesting part : 𝚺((real-pred)^2) ~ Least Squares ● Log values ○ Mistake weight goes down for longer rides ○ Mistake is determined by error percentile - 10 minutes mistake on a 20 minutes ride matters the same as 20 minutes mistake on a 40 minutes ride. {log(a)-log(b)=log(a/b)}
  • 8. Predicting ETA Using : 1. Ride Information 2. Environment 3. Geography 4. Inferred States Machine Learning
  • 9. Data Shortage ● We Don’t Have ○ Historical speeds ○ Real Time speeds
  • 10. ● Box coordinates to NYC (remove 0.0 etc.) ● Remove very long / far rides (>2h/65km) ● Remove unreasonable speed / time-distance ratio ○ Remove 5% anomalies (Top & Bottom) Data Cleaning New feature - Abnormal Ride
  • 12. Datetime based features ● Month start / end ● Day / Day of week / Hour / 15 Minute interval ● Is weekend / business day ● Is work hour (09:00-17:00) ● Is rush hour (morning / afternoon) ● Is holiday
  • 13. ● Coordinate Transformations (Coding directionality) ○ PCA 2 2 ○ RBF ● Spheric (geo) distance ● Distance percentile ● Spheric-Trip distance ratio Location based features
  • 14. City based features ● NYC Neighbourhood (pair crossing) ● Distance to points of interest (100X2) ○ Schools / Hospitals / Parks etc. PCA 2
  • 15. Weather based features ● Temperature ● Events - Rain / Snow etc. ● Humidity ● Wind ● Visibility ● Min / Max / Avg / std etc. PCA 2
  • 16. Inferred Traffic based features PCA 1 ● Assumption : our data is a representative sample of the NYC’s - “driving population” ● Crowdedness ○ #rides in X radius ■ 100 / 500 / 1500 / 5000 ■ Euclidean / Manhattan
  • 17. News based features 1. Crawling NYTimes 2. Topic Modeling 3. Finding topics correlated with ETA 4. Using top10 correlated topics as features a. Number of articles on a day for every topic
  • 19. Caveats ● Timeseries future mixing ● Not exactly a metric that might be the most important ○ Same weight for positive / negative error ● Crowdedness - assumes that data is a representative sample of the total car population ○ E.g. 2 times the samples equals 2 times the traffic ● Variance - taken from original validation dataset (constant)
  • 20. Public Leader Board Team Score Team DCountdown 0.150041 Squanchers 0.160468 Noa's stars 0.165869 Aperture Science 0.167958 R-North 0.175308 TAU Deep Learning Lab 0.182602 Mr Terminal 0.193593 MTG 0.282009 SuperFish 0.302637
  • 22. Machine Learning Approach ● Black Box (~ish) ● Harder to deploy ● Retrain when system changes Advantages : Disadvantages : ● No manual tuning ● No complex heuristics ● Optimise different metrics ● Personalisation ● Taking different ideas and their interactions into account as one