SlideShare ist ein Scribd-Unternehmen logo
1 von 57
Downloaden Sie, um offline zu lesen
javier ramirez
@supercoco9
API analytics
with Redis
and BigQuery
javier ramirez @supercoco9 https://teowaki.com api days 14
REST API (Ruby on Rails)
+
Web on top (AngularJS)
Use a
hosted
solution
questions?
api days 14
javier ramirez @supercoco9 https://teowaki.com api days 14
data that’s an order of
magnitude greater than
data you’re accustomed
to
javier ramirez @supercoco9 https://teowaki.com api days2014
Doug Laney
VP Research, Business Analytics and Performance Management at Gartner
data that exceeds the
processing capacity of
conventional database
systems. The data is too big,
moves too fast, or doesn’t fit
the structures of your
database architectures.
Ed Dumbill
program chair for the O’Reilly Strata Conference
javier ramirez @supercoco9 https://teowaki.com api days2014
bigdata is doing a
fullscan to 330MM rows,
matching them against a
regexp, and getting the
result (223MM rows) in
just 5 seconds
javier ramirez @supercoco9 https://teowaki.com api days2014
Javier Ramirez
impresionable teowaki founder
1. non intrusive metrics
2. keep the history
3. avoid vendor lock-in
4. interactive queries
5. cheap
6. extra ball: real time
javier ramirez @supercoco9 https://teowaki.com api days 14
javier ramirez @supercoco9 https://teowaki.com api days2014
open source, BSD licensed, advanced
key-value store. It is often referred to as a
data structure server since keys can contain
strings, hashes, lists, sets, sorted sets and
hyperloglogs.
http://redis.io
started in 2009 by Salvatore Sanfilippo @antirez
112 contributors at https://github.com/antirez/redis
javier ramirez @supercoco9 https://teowaki.com api days2014
twitter
stackoverflow
pinterest
booking.com
World of Warcraft
YouPorn
HipChat
Snapchat
javier ramirez @supercoco9 https://teowaki.com api days 14
ntopng
LogStash
Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (with pipelining)
$ ./redis-benchmark -r 1000000 -n 2000000 -t get,set,lpush,lpop -P 16 -q
SET: 552,028 requests per second
GET: 707,463 requests per second
LPUSH: 767,459 requests per second
LPOP: 770,119 requests per second
Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (without pipelining)
$ ./redis-benchmark -r 1000000 -n 2000000 -t get,set,lpush,lpop -q
SET: 122,556 requests per second
GET: 123,601 requests per second
LPUSH: 136,752 requests per second
LPOP: 132,424 requests per second
javier ramirez @supercoco9 https://teowaki.com api days2014
javier ramirez @supercoco9 https://teowaki.com api days 14
Non intrusive metrics
Capture data really fast.
Then send the data on
the background
javier ramirez @supercoco9 https://teowaki.com api days2014
Redis keeps
everything in
memory
all the time
javier ramirez @supercoco9 https://teowaki.com api days2014
javier ramirez @supercoco9 https://teowaki.com api days 14
Gzip to
AWS S3/Glacier
or
Google Cloud Storage
javier ramirez @supercoco9 https://teowaki.com api days 14
javier ramirez @supercoco9 https://teowaki.com api days 14
Hadoop
Cassandra
Amazon Redshift
...
javier ramirez @supercoco9 https://teowaki.com api days 14
tools we considered:
but...
hard to set up and monitor
expensive cluster
not interactive enough
javier ramirez @supercoco9 https://teowaki.com api days 14
Our choice:
Google BigQuery
Data analysis as a service
http://developers.google.com/bigquery
javier ramirez @supercoco9 https://teowaki.com api days 14
Based on “Dremel”
Specifically designed for
interactive queries over
petabytes of real-time
data
javier ramirez @supercoco9 https://teowaki.com api days 14
loading data
You just send the data in
text (or JSON) format
javier ramirez @supercoco9 https://teowaki.com api days 14
SQL
javier ramirez @supercoco9 https://teowaki.com api days 14
select name from USERS order by date;
select count(*) from users;
select max(date) from USERS;
select sum(total) from ORDERS group by user;
specific extensions for
analytics
javier ramirez @supercoco9 https://teowaki.com api days 14
within
flatten
nest
stddev
top
first
last
nth
variance
var_pop
var_samp
covar_pop
covar_samp
quantiles
web console screenshot
javier ramirez @supercoco9 https://teowaki.com api days 14
javier ramirez @supercoco9 https://teowaki.com api days 14
window functions
javier ramirez @supercoco9 https://teowaki.com api days 14
our most active user
javier ramirez @supercoco9 https://teowaki.com api days 14
country segmented traffic
javier ramirez @supercoco9 https://teowaki.com api days 14
10 request we should be caching
correlations.
not to mistake with
causality
javier ramirez @supercoco9 https://teowaki.com api days 14
Things you always wanted to
try but were too scared to
javier ramirez @supercoco9 https://teowaki.com api days 14
select count(*) from
publicdata:samples.wikipedia
where REGEXP_MATCH(title, "[0-9]*")
AND wp_namespace = 0;
223,163,387
Query complete (5.6s elapsed, 9.13 GB processed, Cost: 32¢)
javier ramirez @supercoco9 http://teowaki.com api days2014
5 most created resources
select uri, count(*) total from
stats where method = 'POST'
group by URI;
javier ramirez @supercoco9 http://teowaki.com api days2014
...but
/users/javier/shouts
/users/rgo/shouts
/teams/javier-community/links
/teams/nosqlmatters-cgn/links
javier ramirez @supercoco9 http://teowaki.com api days2014
5 most created resources
new users per month
SELECT repository_name, repository_language,
repository_description, COUNT(repository_name) as cnt,
repository_url
FROM github.timeline
WHERE type="WatchEvent"
AND PARSE_UTC_USEC(created_at) >=
PARSE_UTC_USEC("#{yesterday} 20:00:00")
AND repository_url IN (
SELECT repository_url
FROM github.timeline
WHERE type="CreateEvent"
AND PARSE_UTC_USEC(repository_created_at) >=
PARSE_UTC_USEC('#{yesterday} 20:00:00')
AND repository_fork = "false"
AND payload_ref_type = "repository"
GROUP BY repository_url
)
GROUP BY repository_name, repository_language,
repository_description, repository_url
HAVING cnt >= 5
ORDER BY cnt DESC
LIMIT 25
Automation with Apps Script
Read from bigquery
Create a spreadsheet on Drive
E-mail it everyday as a PDF
javier ramirez @supercoco9 https://teowaki.com api days 14
bigquery pricing
$26 per stored TB
1000000 rows => $0.00416 / month
£0.00243 / month
$5 per processed TB
1 full scan = 160 MB
1 count = 0 MB
1 full scan over 1 column = 5.4 MB
100 GB => $0.05 / month £0.03javier ramirez @supercoco9 https://teowaki.com api days 14
£0.054307 / month*
per 1MM rows
*the 1st
1TB every month are free of charge
javier ramirez @supercoco9 https://teowaki.com api days 14
1. non intrusive metrics
2. keep the history
3. avoid vendor lock-in
4. interactive queries
5. cheap
6. extra ball: real time
javier ramirez @supercoco9 https://teowaki.com api days 14
Find related links at
https://teowaki.com/teams/javier-community/link-categories/bigquery-talk
Thanks!
Gr ciesà
Javier Ramírez
@supercoco9
api days 14

Weitere ähnliche Inhalte

Andere mochten auch

The Lincoln Institue - 10 Ways to Regenerate America's Legacy Cities
The Lincoln Institue - 10 Ways to Regenerate America's Legacy CitiesThe Lincoln Institue - 10 Ways to Regenerate America's Legacy Cities
The Lincoln Institue - 10 Ways to Regenerate America's Legacy CitiesCassidy Swanson
 
Enhance the browser_experience
Enhance the browser_experienceEnhance the browser_experience
Enhance the browser_experienceHTML5 Spain
 
I want to be an efficient developer - APIdays Barcelona version
I want to be an efficient developer - APIdays Barcelona versionI want to be an efficient developer - APIdays Barcelona version
I want to be an efficient developer - APIdays Barcelona versionQuentin Adam
 
APIfying the Web with import.io (at APIdays mediterranea)
APIfying the Web with import.io (at APIdays mediterranea)APIfying the Web with import.io (at APIdays mediterranea)
APIfying the Web with import.io (at APIdays mediterranea)Ignacio Elola Villar
 
AIL Platform APIDays Mediterranea
AIL Platform APIDays MediterraneaAIL Platform APIDays Mediterranea
AIL Platform APIDays MediterraneaJoan Protasio
 
The importance of /me
The importance of /meThe importance of /me
The importance of /meBruno Pedro
 
Battelfield REST, API Development from the trenches
Battelfield REST, API Development from the trenchesBattelfield REST, API Development from the trenches
Battelfield REST, API Development from the trenchesDaniel Cerecedo
 
8 . Valle de Ricote - Ricote - la sierra
8 . Valle de Ricote -  Ricote - la sierra8 . Valle de Ricote -  Ricote - la sierra
8 . Valle de Ricote - Ricote - la sierramaestriko
 
Fichas proyectos fsc
Fichas proyectos fscFichas proyectos fsc
Fichas proyectos fscForo Abierto
 
Erfolgsfaktoren zur Auswahl und Organisation von studentischen Projektstudien
Erfolgsfaktoren zur Auswahl und Organisation von studentischen ProjektstudienErfolgsfaktoren zur Auswahl und Organisation von studentischen Projektstudien
Erfolgsfaktoren zur Auswahl und Organisation von studentischen ProjektstudienMichael Groeschel
 
Mano a Mano - Charla sobre TICE para maestras
Mano a Mano - Charla sobre TICE para maestrasMano a Mano - Charla sobre TICE para maestras
Mano a Mano - Charla sobre TICE para maestrasFernando Cormenzana
 

Andere mochten auch (15)

Patent wars, Innovation, Roads
Patent wars, Innovation, RoadsPatent wars, Innovation, Roads
Patent wars, Innovation, Roads
 
The Lincoln Institue - 10 Ways to Regenerate America's Legacy Cities
The Lincoln Institue - 10 Ways to Regenerate America's Legacy CitiesThe Lincoln Institue - 10 Ways to Regenerate America's Legacy Cities
The Lincoln Institue - 10 Ways to Regenerate America's Legacy Cities
 
Enhance the browser_experience
Enhance the browser_experienceEnhance the browser_experience
Enhance the browser_experience
 
I want to be an efficient developer - APIdays Barcelona version
I want to be an efficient developer - APIdays Barcelona versionI want to be an efficient developer - APIdays Barcelona version
I want to be an efficient developer - APIdays Barcelona version
 
APIfying the Web with import.io (at APIdays mediterranea)
APIfying the Web with import.io (at APIdays mediterranea)APIfying the Web with import.io (at APIdays mediterranea)
APIfying the Web with import.io (at APIdays mediterranea)
 
AIL Platform APIDays Mediterranea
AIL Platform APIDays MediterraneaAIL Platform APIDays Mediterranea
AIL Platform APIDays Mediterranea
 
Build a Restfull app using drupal
Build a Restfull app using drupalBuild a Restfull app using drupal
Build a Restfull app using drupal
 
The importance of /me
The importance of /meThe importance of /me
The importance of /me
 
Battelfield REST, API Development from the trenches
Battelfield REST, API Development from the trenchesBattelfield REST, API Development from the trenches
Battelfield REST, API Development from the trenches
 
8 . Valle de Ricote - Ricote - la sierra
8 . Valle de Ricote -  Ricote - la sierra8 . Valle de Ricote -  Ricote - la sierra
8 . Valle de Ricote - Ricote - la sierra
 
Fichas proyectos fsc
Fichas proyectos fscFichas proyectos fsc
Fichas proyectos fsc
 
7 Reglas para una campaña de email marketing efectiva
7 Reglas para una campaña de email marketing efectiva7 Reglas para una campaña de email marketing efectiva
7 Reglas para una campaña de email marketing efectiva
 
Erfolgsfaktoren zur Auswahl und Organisation von studentischen Projektstudien
Erfolgsfaktoren zur Auswahl und Organisation von studentischen ProjektstudienErfolgsfaktoren zur Auswahl und Organisation von studentischen Projektstudien
Erfolgsfaktoren zur Auswahl und Organisation von studentischen Projektstudien
 
Mano a Mano - Charla sobre TICE para maestras
Mano a Mano - Charla sobre TICE para maestrasMano a Mano - Charla sobre TICE para maestras
Mano a Mano - Charla sobre TICE para maestras
 
Black metal
Black metalBlack metal
Black metal
 

Mehr von javier ramirez

¿Se puede vivir del open source? T3chfest
¿Se puede vivir del open source? T3chfest¿Se puede vivir del open source? T3chfest
¿Se puede vivir del open source? T3chfestjavier ramirez
 
QuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databaseQuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databasejavier ramirez
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...javier ramirez
 
Deduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDBDeduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDBjavier ramirez
 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)javier ramirez
 
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Databasejavier ramirez
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez
 
QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728javier ramirez
 
Processing and analysing streaming data with Python. Pycon Italy 2022
Processing and analysing streaming  data with Python. Pycon Italy 2022Processing and analysing streaming  data with Python. Pycon Italy 2022
Processing and analysing streaming data with Python. Pycon Italy 2022javier ramirez
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...javier ramirez
 
Servicios e infraestructura de AWS y la próxima región en Aragón
Servicios e infraestructura de AWS y la próxima región en AragónServicios e infraestructura de AWS y la próxima región en Aragón
Servicios e infraestructura de AWS y la próxima región en Aragónjavier ramirez
 
Primeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverlessPrimeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverlessjavier ramirez
 
How AWS is reinventing the cloud
How AWS is reinventing the cloudHow AWS is reinventing the cloud
How AWS is reinventing the cloudjavier ramirez
 
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMAnalitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMjavier ramirez
 
Getting started with streaming analytics
Getting started with streaming analyticsGetting started with streaming analytics
Getting started with streaming analyticsjavier ramirez
 
Getting started with streaming analytics: Setting up a pipeline
Getting started with streaming analytics: Setting up a pipelineGetting started with streaming analytics: Setting up a pipeline
Getting started with streaming analytics: Setting up a pipelinejavier ramirez
 
Getting started with streaming analytics: Deep Dive
Getting started with streaming analytics: Deep DiveGetting started with streaming analytics: Deep Dive
Getting started with streaming analytics: Deep Divejavier ramirez
 
Getting started with streaming analytics: streaming basics (1 of 3)
Getting started with streaming analytics: streaming basics (1 of 3)Getting started with streaming analytics: streaming basics (1 of 3)
Getting started with streaming analytics: streaming basics (1 of 3)javier ramirez
 
Monitorización de seguridad y detección de amenazas con AWS
Monitorización de seguridad y detección de amenazas con AWSMonitorización de seguridad y detección de amenazas con AWS
Monitorización de seguridad y detección de amenazas con AWSjavier ramirez
 

Mehr von javier ramirez (20)

¿Se puede vivir del open source? T3chfest
¿Se puede vivir del open source? T3chfest¿Se puede vivir del open source? T3chfest
¿Se puede vivir del open source? T3chfest
 
QuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databaseQuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series database
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
 
Deduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDBDeduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDB
 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)
 
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Database
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728
 
Processing and analysing streaming data with Python. Pycon Italy 2022
Processing and analysing streaming  data with Python. Pycon Italy 2022Processing and analysing streaming  data with Python. Pycon Italy 2022
Processing and analysing streaming data with Python. Pycon Italy 2022
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...
 
Servicios e infraestructura de AWS y la próxima región en Aragón
Servicios e infraestructura de AWS y la próxima región en AragónServicios e infraestructura de AWS y la próxima región en Aragón
Servicios e infraestructura de AWS y la próxima región en Aragón
 
Primeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverlessPrimeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverless
 
How AWS is reinventing the cloud
How AWS is reinventing the cloudHow AWS is reinventing the cloud
How AWS is reinventing the cloud
 
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMAnalitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
 
Getting started with streaming analytics
Getting started with streaming analyticsGetting started with streaming analytics
Getting started with streaming analytics
 
Getting started with streaming analytics: Setting up a pipeline
Getting started with streaming analytics: Setting up a pipelineGetting started with streaming analytics: Setting up a pipeline
Getting started with streaming analytics: Setting up a pipeline
 
Getting started with streaming analytics: Deep Dive
Getting started with streaming analytics: Deep DiveGetting started with streaming analytics: Deep Dive
Getting started with streaming analytics: Deep Dive
 
Getting started with streaming analytics: streaming basics (1 of 3)
Getting started with streaming analytics: streaming basics (1 of 3)Getting started with streaming analytics: streaming basics (1 of 3)
Getting started with streaming analytics: streaming basics (1 of 3)
 
Monitorización de seguridad y detección de amenazas con AWS
Monitorización de seguridad y detección de amenazas con AWSMonitorización de seguridad y detección de amenazas con AWS
Monitorización de seguridad y detección de amenazas con AWS
 

Kürzlich hochgeladen

HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfayushiqss
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyAnusha Are
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456KiaraTiradoMicha
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 

Kürzlich hochgeladen (20)

HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 

api analytics using Redis, BigQuery and AppsScripts by Javier Ramirez from teowaki (Apidays Mediterranea)

  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7. javier ramirez @supercoco9 https://teowaki.com api days 14 REST API (Ruby on Rails) + Web on top (AngularJS)
  • 10. javier ramirez @supercoco9 https://teowaki.com api days 14
  • 11. data that’s an order of magnitude greater than data you’re accustomed to javier ramirez @supercoco9 https://teowaki.com api days2014 Doug Laney VP Research, Business Analytics and Performance Management at Gartner
  • 12. data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the structures of your database architectures. Ed Dumbill program chair for the O’Reilly Strata Conference javier ramirez @supercoco9 https://teowaki.com api days2014
  • 13. bigdata is doing a fullscan to 330MM rows, matching them against a regexp, and getting the result (223MM rows) in just 5 seconds javier ramirez @supercoco9 https://teowaki.com api days2014 Javier Ramirez impresionable teowaki founder
  • 14. 1. non intrusive metrics 2. keep the history 3. avoid vendor lock-in 4. interactive queries 5. cheap 6. extra ball: real time javier ramirez @supercoco9 https://teowaki.com api days 14
  • 15. javier ramirez @supercoco9 https://teowaki.com api days2014
  • 16. open source, BSD licensed, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets, sorted sets and hyperloglogs. http://redis.io started in 2009 by Salvatore Sanfilippo @antirez 112 contributors at https://github.com/antirez/redis javier ramirez @supercoco9 https://teowaki.com api days2014
  • 17. twitter stackoverflow pinterest booking.com World of Warcraft YouPorn HipChat Snapchat javier ramirez @supercoco9 https://teowaki.com api days 14 ntopng LogStash
  • 18. Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (with pipelining) $ ./redis-benchmark -r 1000000 -n 2000000 -t get,set,lpush,lpop -P 16 -q SET: 552,028 requests per second GET: 707,463 requests per second LPUSH: 767,459 requests per second LPOP: 770,119 requests per second Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (without pipelining) $ ./redis-benchmark -r 1000000 -n 2000000 -t get,set,lpush,lpop -q SET: 122,556 requests per second GET: 123,601 requests per second LPUSH: 136,752 requests per second LPOP: 132,424 requests per second javier ramirez @supercoco9 https://teowaki.com api days2014
  • 19. javier ramirez @supercoco9 https://teowaki.com api days 14 Non intrusive metrics Capture data really fast. Then send the data on the background
  • 20. javier ramirez @supercoco9 https://teowaki.com api days2014
  • 21. Redis keeps everything in memory all the time javier ramirez @supercoco9 https://teowaki.com api days2014
  • 22. javier ramirez @supercoco9 https://teowaki.com api days 14
  • 23. Gzip to AWS S3/Glacier or Google Cloud Storage javier ramirez @supercoco9 https://teowaki.com api days 14
  • 24. javier ramirez @supercoco9 https://teowaki.com api days 14
  • 25. Hadoop Cassandra Amazon Redshift ... javier ramirez @supercoco9 https://teowaki.com api days 14 tools we considered:
  • 26. but... hard to set up and monitor expensive cluster not interactive enough javier ramirez @supercoco9 https://teowaki.com api days 14
  • 27. Our choice: Google BigQuery Data analysis as a service http://developers.google.com/bigquery javier ramirez @supercoco9 https://teowaki.com api days 14
  • 28. Based on “Dremel” Specifically designed for interactive queries over petabytes of real-time data javier ramirez @supercoco9 https://teowaki.com api days 14
  • 29. loading data You just send the data in text (or JSON) format javier ramirez @supercoco9 https://teowaki.com api days 14
  • 30. SQL javier ramirez @supercoco9 https://teowaki.com api days 14 select name from USERS order by date; select count(*) from users; select max(date) from USERS; select sum(total) from ORDERS group by user;
  • 31. specific extensions for analytics javier ramirez @supercoco9 https://teowaki.com api days 14 within flatten nest stddev top first last nth variance var_pop var_samp covar_pop covar_samp quantiles
  • 32. web console screenshot javier ramirez @supercoco9 https://teowaki.com api days 14
  • 33. javier ramirez @supercoco9 https://teowaki.com api days 14 window functions
  • 34. javier ramirez @supercoco9 https://teowaki.com api days 14 our most active user
  • 35. javier ramirez @supercoco9 https://teowaki.com api days 14 country segmented traffic
  • 36. javier ramirez @supercoco9 https://teowaki.com api days 14 10 request we should be caching
  • 37. correlations. not to mistake with causality javier ramirez @supercoco9 https://teowaki.com api days 14
  • 38. Things you always wanted to try but were too scared to javier ramirez @supercoco9 https://teowaki.com api days 14 select count(*) from publicdata:samples.wikipedia where REGEXP_MATCH(title, "[0-9]*") AND wp_namespace = 0; 223,163,387 Query complete (5.6s elapsed, 9.13 GB processed, Cost: 32¢)
  • 39. javier ramirez @supercoco9 http://teowaki.com api days2014 5 most created resources select uri, count(*) total from stats where method = 'POST' group by URI;
  • 40. javier ramirez @supercoco9 http://teowaki.com api days2014 ...but /users/javier/shouts /users/rgo/shouts /teams/javier-community/links /teams/nosqlmatters-cgn/links
  • 41. javier ramirez @supercoco9 http://teowaki.com api days2014 5 most created resources
  • 42. new users per month
  • 43. SELECT repository_name, repository_language, repository_description, COUNT(repository_name) as cnt, repository_url FROM github.timeline WHERE type="WatchEvent" AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC("#{yesterday} 20:00:00") AND repository_url IN ( SELECT repository_url FROM github.timeline WHERE type="CreateEvent" AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('#{yesterday} 20:00:00') AND repository_fork = "false" AND payload_ref_type = "repository" GROUP BY repository_url ) GROUP BY repository_name, repository_language, repository_description, repository_url HAVING cnt >= 5 ORDER BY cnt DESC LIMIT 25
  • 44.
  • 45.
  • 46.
  • 47. Automation with Apps Script Read from bigquery Create a spreadsheet on Drive E-mail it everyday as a PDF javier ramirez @supercoco9 https://teowaki.com api days 14
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53. bigquery pricing $26 per stored TB 1000000 rows => $0.00416 / month £0.00243 / month $5 per processed TB 1 full scan = 160 MB 1 count = 0 MB 1 full scan over 1 column = 5.4 MB 100 GB => $0.05 / month £0.03javier ramirez @supercoco9 https://teowaki.com api days 14
  • 54. £0.054307 / month* per 1MM rows *the 1st 1TB every month are free of charge javier ramirez @supercoco9 https://teowaki.com api days 14
  • 55. 1. non intrusive metrics 2. keep the history 3. avoid vendor lock-in 4. interactive queries 5. cheap 6. extra ball: real time javier ramirez @supercoco9 https://teowaki.com api days 14
  • 56.
  • 57. Find related links at https://teowaki.com/teams/javier-community/link-categories/bigquery-talk Thanks! Gr ciesà Javier Ramírez @supercoco9 api days 14