SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
GoDataDriven
PROUDLY PART OF THE XEBIA GROUP
Real time data driven applications
Giovanni Lanzani	

	

Data Whisperer
and SQL vs NoSQL databases
Feedback
@gglanzani
Real-time, data driven app?
•No store and retrieve;	

•Store, {transform, enrich, analyse} and retrieve;	

•Real-time: retrieve is not a batch process;	

•App: something your mother could use:	

SELECT attendees
FROM NoSQLMatters
WHERE password = '1234';
Get insight about event impact
Get insight about event impact
Get insight about event impact
Get insight about event impact
Get insight about event impact
Challenges
1. Big Data	

2. Privacy;	

3. Some real-time analysis;	

4. Real-time retrieval.
Is it Big Data?
Is it Big Data?
Everybody talks about it	

Nobody knows how to do it	

Everyone thinks everyone else is doing it, so everyone
claims they’re doing it…	

Dan Ariely
2. Privacy
2. Privacy
3. (Some) real-time analysis
•Harder than it looks;	

•Large data;	

•Retrieval is by giving date, center location +
radius.
4. Real-Time Retrieval
AngularJS python app
REST
Front-end Back-end
JSON
Architecture
JS-1
JS-2
date hour id_activity postcode hits delta sbi
2013-01-01 12 1234 1234AB 35 22 1
2013-01-08 12 1234 1234AB 45 35 1
2013-01-01 11 2345 5555ZB 2 1 2
2013-01-08 11 2345 5555ZB 55 2 2
Data Example
Who has my data?
Who has my data?
•First iteration was a (pre)-POC, less data (3GB vs
500GB);	

•Time constraints;	

•Oeps: everything is a pandas df!
Advantage of “everything is a df”
Pro:	

•Fast!!	

•Use what you know	

•NO DBA’s!	

•We all love CSV’s!
Advantage of “everything is a df”
Pro:	

•Fast!!	

•Use what you know	

•NO DBA’s!	

•We all love CSV’s!	

Contra:	

•Doesn’t scale;	

•Huge startup time;	

•NO DBA’s!	

•We all hate CSV’s!
AngularJS python app
REST
Front-end Back-end Database
JSON
?
If you don’t
Issues?!
•With a radius of 10km, in Amsterdam, you get
10k postcodes.You need to do this in your SQL:	

!
!
!
•Index on date and postcode, but single queries
running more than 20 minutes.
SELECT * FROM datapoints
WHERE
date IN date_array
AND
postcode IN postcode_array;
PostGIS is a spatial database extender for PostgreSQL.
Supports geographic objects allowing location queries:	

SELECT *
FROM datapoints
WHERE ST_DWithin(lon, lat, 1500)
AND dates IN ('2013-02-30', '2013-02-31');
-- every point within 1.5km
-- from (lat, lon) on imaginary dates
Postgres + Postgis (2.x)
Other db’s?
How we solved it
1. Align data on disk by date;	

2. Use the temporary table trick:	

!
!
!
!
3. Lose precision: 1234AB→1234
CREATE TEMPORARY TABLE tmp (postcodes STRING NOT NULL
PRIMARY KEY);
INSERT INTO tmp (postcodes) VALUES postcode_array;
!
SELECT * FROM tmp
JOIN datapoints d
ON d.postcode = tmp.postcodes
WHERE
d.dt IN dates_array;
Take home messages
1. Geospatial problems are hard and queries can be
really slow;	

2. Not everybody has innite resources: be smart
and KISS!	

3. SQL or NoSQL? (Size, schema)
GoDataDriven
We’re hiring / Questions? / Thank you!
@gglanzani	

giovannilanzani@godatadriven.com
Giovanni Lanzani	

Data Whisperer

Weitere ähnliche Inhalte

Was ist angesagt?

FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...
FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...
FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...Zhenzhong Xu
 
Effective monitoring with statsd - Alexis lê-quôc
Effective monitoring with statsd - Alexis lê-quôcEffective monitoring with statsd - Alexis lê-quôc
Effective monitoring with statsd - Alexis lê-quôcDevopsdays
 
Time and ordering in streaming distributed systems
Time and ordering in streaming distributed systemsTime and ordering in streaming distributed systems
Time and ordering in streaming distributed systemsZhenzhong Xu
 
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...Anna Ossowski
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017Monal Daxini
 
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSkynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSylvain Kalache
 
Inside Hulu's Data platform (BigDataCamp LA 2013)
Inside Hulu's Data platform (BigDataCamp LA 2013)Inside Hulu's Data platform (BigDataCamp LA 2013)
Inside Hulu's Data platform (BigDataCamp LA 2013)Prasan Samtani
 
Monitoring, Hold the Infrastructure - Getting the Most out of AWS Lambda – Da...
Monitoring, Hold the Infrastructure - Getting the Most out of AWS Lambda – Da...Monitoring, Hold the Infrastructure - Getting the Most out of AWS Lambda – Da...
Monitoring, Hold the Infrastructure - Getting the Most out of AWS Lambda – Da...Amazon Web Services
 
Event streaming pipeline with Windows Azure and ArcGIS Geoevent extension
Event streaming pipeline with Windows Azure and ArcGIS Geoevent extensionEvent streaming pipeline with Windows Azure and ArcGIS Geoevent extension
Event streaming pipeline with Windows Azure and ArcGIS Geoevent extensionRoberto Messora
 
Thinking in Graphs - GraphQL problems and more - Maciej Rybaniec (23.06.2017)
Thinking in Graphs - GraphQL problems and more - Maciej Rybaniec (23.06.2017)Thinking in Graphs - GraphQL problems and more - Maciej Rybaniec (23.06.2017)
Thinking in Graphs - GraphQL problems and more - Maciej Rybaniec (23.06.2017)Grand Parade Poland
 
Scaling Rails for $5.50 per day - October 2014 ATLRUG Presentation
Scaling Rails for $5.50 per day  - October 2014 ATLRUG PresentationScaling Rails for $5.50 per day  - October 2014 ATLRUG Presentation
Scaling Rails for $5.50 per day - October 2014 ATLRUG Presentationjasnow
 
Scaling monitoring with Datadog
Scaling monitoring with DatadogScaling monitoring with Datadog
Scaling monitoring with Datadogalexismidon
 
Running a Massively Parallel Self-serve Distributed Data System At Scale
Running a Massively Parallel Self-serve Distributed Data System At ScaleRunning a Massively Parallel Self-serve Distributed Data System At Scale
Running a Massively Parallel Self-serve Distributed Data System At ScaleZhenzhong Xu
 
20170624 GraphQL Presentation
20170624 GraphQL Presentation20170624 GraphQL Presentation
20170624 GraphQL PresentationMartin Heidegger
 
Druid - DevconTLV X
Druid - DevconTLV XDruid - DevconTLV X
Druid - DevconTLV XYakir Buskilla
 
ResourceSpace: Recent pains and future gains
ResourceSpace: Recent pains and future gainsResourceSpace: Recent pains and future gains
ResourceSpace: Recent pains and future gainsResourceSpace
 
O'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data PipelinesO'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data PipelinesSingleStore
 

Was ist angesagt? (17)

FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...
FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...
FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...
 
Effective monitoring with statsd - Alexis lê-quôc
Effective monitoring with statsd - Alexis lê-quôcEffective monitoring with statsd - Alexis lê-quôc
Effective monitoring with statsd - Alexis lê-quôc
 
Time and ordering in streaming distributed systems
Time and ordering in streaming distributed systemsTime and ordering in streaming distributed systems
Time and ordering in streaming distributed systems
 
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
 
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSkynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
 
Inside Hulu's Data platform (BigDataCamp LA 2013)
Inside Hulu's Data platform (BigDataCamp LA 2013)Inside Hulu's Data platform (BigDataCamp LA 2013)
Inside Hulu's Data platform (BigDataCamp LA 2013)
 
Monitoring, Hold the Infrastructure - Getting the Most out of AWS Lambda – Da...
Monitoring, Hold the Infrastructure - Getting the Most out of AWS Lambda – Da...Monitoring, Hold the Infrastructure - Getting the Most out of AWS Lambda – Da...
Monitoring, Hold the Infrastructure - Getting the Most out of AWS Lambda – Da...
 
Event streaming pipeline with Windows Azure and ArcGIS Geoevent extension
Event streaming pipeline with Windows Azure and ArcGIS Geoevent extensionEvent streaming pipeline with Windows Azure and ArcGIS Geoevent extension
Event streaming pipeline with Windows Azure and ArcGIS Geoevent extension
 
Thinking in Graphs - GraphQL problems and more - Maciej Rybaniec (23.06.2017)
Thinking in Graphs - GraphQL problems and more - Maciej Rybaniec (23.06.2017)Thinking in Graphs - GraphQL problems and more - Maciej Rybaniec (23.06.2017)
Thinking in Graphs - GraphQL problems and more - Maciej Rybaniec (23.06.2017)
 
Scaling Rails for $5.50 per day - October 2014 ATLRUG Presentation
Scaling Rails for $5.50 per day  - October 2014 ATLRUG PresentationScaling Rails for $5.50 per day  - October 2014 ATLRUG Presentation
Scaling Rails for $5.50 per day - October 2014 ATLRUG Presentation
 
Scaling monitoring with Datadog
Scaling monitoring with DatadogScaling monitoring with Datadog
Scaling monitoring with Datadog
 
Running a Massively Parallel Self-serve Distributed Data System At Scale
Running a Massively Parallel Self-serve Distributed Data System At ScaleRunning a Massively Parallel Self-serve Distributed Data System At Scale
Running a Massively Parallel Self-serve Distributed Data System At Scale
 
20170624 GraphQL Presentation
20170624 GraphQL Presentation20170624 GraphQL Presentation
20170624 GraphQL Presentation
 
Druid - DevconTLV X
Druid - DevconTLV XDruid - DevconTLV X
Druid - DevconTLV X
 
ResourceSpace: Recent pains and future gains
ResourceSpace: Recent pains and future gainsResourceSpace: Recent pains and future gains
ResourceSpace: Recent pains and future gains
 
O'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data PipelinesO'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data Pipelines
 

Ähnlich wie Real time data driven applications (SQL vs NoSQL databases)

Real time data driven applications (and SQL vs NoSQL databases)
Real time data driven applications (and SQL vs NoSQL databases)Real time data driven applications (and SQL vs NoSQL databases)
Real time data driven applications (and SQL vs NoSQL databases)GoDataDriven
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Codemotion
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about SparkGiivee The
 
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDBMongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDBMongoDB
 
Sea Amsterdam 2014 November 19
Sea Amsterdam 2014 November 19Sea Amsterdam 2014 November 19
Sea Amsterdam 2014 November 19GoDataDriven
 
PyATL Meetup, Oct 8, 2015
PyATL Meetup, Oct 8, 2015PyATL Meetup, Oct 8, 2015
PyATL Meetup, Oct 8, 2015Roy Russo
 
Our Data Ourselves, Pydata 2015
Our Data Ourselves, Pydata 2015Our Data Ourselves, Pydata 2015
Our Data Ourselves, Pydata 2015kingsBSD
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Demi Ben-Ari
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQLWSO2
 
Trending with Purpose
Trending with PurposeTrending with Purpose
Trending with PurposeJason Dixon
 
How to Make Norikra Perfect
How to Make Norikra PerfectHow to Make Norikra Perfect
How to Make Norikra PerfectSATOSHI TAGOMORI
 
JSON all the way
JSON all the wayJSON all the way
JSON all the wayRonan Berder
 
Games Industry Analytics Forum 2 - Plumbee
Games Industry Analytics Forum 2 - PlumbeeGames Industry Analytics Forum 2 - Plumbee
Games Industry Analytics Forum 2 - PlumbeeGIAF
 
[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_Data[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_DataNAVER D2
 
Pre-Aggregated Analytics And Social Feeds Using MongoDB
Pre-Aggregated Analytics And Social Feeds Using MongoDBPre-Aggregated Analytics And Social Feeds Using MongoDB
Pre-Aggregated Analytics And Social Feeds Using MongoDBRackspace
 
Using Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comUsing Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comDamien Krotkine
 
Hack The Future: 10 technical disciplines
Hack The Future: 10 technical disciplinesHack The Future: 10 technical disciplines
Hack The Future: 10 technical disciplinesHack The Future
 
Azure Digital Twins
Azure Digital TwinsAzure Digital Twins
Azure Digital TwinsMarco Parenzan
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDBDenny Lee
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysDemi Ben-Ari
 

Ähnlich wie Real time data driven applications (SQL vs NoSQL databases) (20)

Real time data driven applications (and SQL vs NoSQL databases)
Real time data driven applications (and SQL vs NoSQL databases)Real time data driven applications (and SQL vs NoSQL databases)
Real time data driven applications (and SQL vs NoSQL databases)
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
 
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDBMongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
 
Sea Amsterdam 2014 November 19
Sea Amsterdam 2014 November 19Sea Amsterdam 2014 November 19
Sea Amsterdam 2014 November 19
 
PyATL Meetup, Oct 8, 2015
PyATL Meetup, Oct 8, 2015PyATL Meetup, Oct 8, 2015
PyATL Meetup, Oct 8, 2015
 
Our Data Ourselves, Pydata 2015
Our Data Ourselves, Pydata 2015Our Data Ourselves, Pydata 2015
Our Data Ourselves, Pydata 2015
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Trending with Purpose
Trending with PurposeTrending with Purpose
Trending with Purpose
 
How to Make Norikra Perfect
How to Make Norikra PerfectHow to Make Norikra Perfect
How to Make Norikra Perfect
 
JSON all the way
JSON all the wayJSON all the way
JSON all the way
 
Games Industry Analytics Forum 2 - Plumbee
Games Industry Analytics Forum 2 - PlumbeeGames Industry Analytics Forum 2 - Plumbee
Games Industry Analytics Forum 2 - Plumbee
 
[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_Data[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_Data
 
Pre-Aggregated Analytics And Social Feeds Using MongoDB
Pre-Aggregated Analytics And Social Feeds Using MongoDBPre-Aggregated Analytics And Social Feeds Using MongoDB
Pre-Aggregated Analytics And Social Feeds Using MongoDB
 
Using Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comUsing Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.com
 
Hack The Future: 10 technical disciplines
Hack The Future: 10 technical disciplinesHack The Future: 10 technical disciplines
Hack The Future: 10 technical disciplines
 
Azure Digital Twins
Azure Digital TwinsAzure Digital Twins
Azure Digital Twins
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
 

Mehr von GoDataDriven

Streamlining Data Science Workflows with a Feature Catalog
Streamlining Data Science Workflows with a Feature CatalogStreamlining Data Science Workflows with a Feature Catalog
Streamlining Data Science Workflows with a Feature CatalogGoDataDriven
 
Visualizing Big Data in a Small Screen
Visualizing Big Data in a Small ScreenVisualizing Big Data in a Small Screen
Visualizing Big Data in a Small ScreenGoDataDriven
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowGoDataDriven
 
Training Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organizationTraining Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organizationGoDataDriven
 
My Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics EngineerMy Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics EngineerGoDataDriven
 
dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo SanchezGoDataDriven
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformGoDataDriven
 
How to create a Devcontainer for your Python project
How to create a Devcontainer for your Python projectHow to create a Devcontainer for your Python project
How to create a Devcontainer for your Python projectGoDataDriven
 
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...GoDataDriven
 
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022GoDataDriven
 
MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022GoDataDriven
 
MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022GoDataDriven
 
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022GoDataDriven
 
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022GoDataDriven
 
AWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de HaanAWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de HaanGoDataDriven
 
The 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven CompaniesThe 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven CompaniesGoDataDriven
 
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...GoDataDriven
 
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...GoDataDriven
 
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't HofSmart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't HofGoDataDriven
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019GoDataDriven
 

Mehr von GoDataDriven (20)

Streamlining Data Science Workflows with a Feature Catalog
Streamlining Data Science Workflows with a Feature CatalogStreamlining Data Science Workflows with a Feature Catalog
Streamlining Data Science Workflows with a Feature Catalog
 
Visualizing Big Data in a Small Screen
Visualizing Big Data in a Small ScreenVisualizing Big Data in a Small Screen
Visualizing Big Data in a Small Screen
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlow
 
Training Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organizationTraining Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organization
 
My Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics EngineerMy Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics Engineer
 
dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchez
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 
How to create a Devcontainer for your Python project
How to create a Devcontainer for your Python projectHow to create a Devcontainer for your Python project
How to create a Devcontainer for your Python project
 
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
 
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
 
MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022
 
MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022
 
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
 
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
 
AWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de HaanAWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de Haan
 
The 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven CompaniesThe 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven Companies
 
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
 
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
 
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't HofSmart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
 

KĂźrzlich hochgeladen

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 

KĂźrzlich hochgeladen (20)

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 

Real time data driven applications (SQL vs NoSQL databases)

  • 1. GoDataDriven PROUDLY PART OF THE XEBIA GROUP Real time data driven applications Giovanni Lanzani Data Whisperer and SQL vs NoSQL databases
  • 3. Real-time, data driven app? •No store and retrieve; •Store, {transform, enrich, analyse} and retrieve; •Real-time: retrieve is not a batch process; •App: something your mother could use: SELECT attendees FROM NoSQLMatters WHERE password = '1234';
  • 4. Get insight about event impact
  • 5. Get insight about event impact
  • 6. Get insight about event impact
  • 7. Get insight about event impact
  • 8. Get insight about event impact
  • 9. Challenges 1. Big Data 2. Privacy; 3. Some real-time analysis; 4. Real-time retrieval.
  • 10. Is it Big Data?
  • 11. Is it Big Data? Everybody talks about it Nobody knows how to do it Everyone thinks everyone else is doing it, so everyone claims they’re doing it… Dan Ariely
  • 15. •Harder than it looks; •Large data; •Retrieval is by giving date, center location + radius. 4. Real-Time Retrieval
  • 16. AngularJS python app REST Front-end Back-end JSON Architecture
  • 17. JS-1
  • 18. JS-2
  • 19. date hour id_activity postcode hits delta sbi 2013-01-01 12 1234 1234AB 35 22 1 2013-01-08 12 1234 1234AB 45 35 1 2013-01-01 11 2345 5555ZB 2 1 2 2013-01-08 11 2345 5555ZB 55 2 2 Data Example
  • 20. Who has my data?
  • 21. Who has my data? •First iteration was a (pre)-POC, less data (3GB vs 500GB); •Time constraints; •Oeps: everything is a pandas df!
  • 22. Advantage of “everything is a df” Pro: •Fast!! •Use what you know •NO DBA’s! •We all love CSV’s!
  • 23. Advantage of “everything is a df” Pro: •Fast!! •Use what you know •NO DBA’s! •We all love CSV’s! Contra: •Doesn’t scale; •Huge startup time; •NO DBA’s! •We all hate CSV’s!
  • 24. AngularJS python app REST Front-end Back-end Database JSON ? If you don’t
  • 25. Issues?! •With a radius of 10km, in Amsterdam, you get 10k postcodes.You need to do this in your SQL: ! ! ! •Index on date and postcode, but single queries running more than 20 minutes. SELECT * FROM datapoints WHERE date IN date_array AND postcode IN postcode_array;
  • 26. PostGIS is a spatial database extender for PostgreSQL. Supports geographic objects allowing location queries: SELECT * FROM datapoints WHERE ST_DWithin(lon, lat, 1500) AND dates IN ('2013-02-30', '2013-02-31'); -- every point within 1.5km -- from (lat, lon) on imaginary dates Postgres + Postgis (2.x)
  • 28. How we solved it 1. Align data on disk by date; 2. Use the temporary table trick: ! ! ! ! 3. Lose precision: 1234AB→1234 CREATE TEMPORARY TABLE tmp (postcodes STRING NOT NULL PRIMARY KEY); INSERT INTO tmp (postcodes) VALUES postcode_array; ! SELECT * FROM tmp JOIN datapoints d ON d.postcode = tmp.postcodes WHERE d.dt IN dates_array;
  • 29. Take home messages 1. Geospatial problems are hard and queries can be really slow; 2. Not everybody has innite resources: be smart and KISS! 3. SQL or NoSQL? (Size, schema)
  • 30. GoDataDriven We’re hiring / Questions? / Thank you! @gglanzani giovannilanzani@godatadriven.com Giovanni Lanzani Data Whisperer