Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

Bigdata: Analítica
para tu API con
Redis, AWS y Google
Bigquery
javier ramirez
@supercoco9

Servicios para desarrolladores
expuestos mediante API REST
+
web en AngularJS como
cliente del API
javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

solución obvia:
usar un servicio como
3scale o apigee

javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

¡gracias!
¿preguntas?
javier ramirez
@supercoco9
javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

1. medir sin interferir
2. almacenar el histórico
3. evitar el vendor lock-in
4. consultas interactivas
5. baratito
6. bola extra: tiempo real
javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

data that’s an order of
magnitude greater than
data you’re accustomed
to
Doug Laney
VP Research, Business Analytics and Performance Management at Gartner
javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

data that exceeds the
processing capacity of
conventional database
systems. The data is too big,
moves too fast, or doesn’t fit
the structures of your
database architectures.
Ed Dumbill
program chair for the O’Reilly Strata Conference
javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

bigdata is doing a
fullscan to 330MM rows,
matching them against a
regexp, and getting the
result (223MM rows) in
just 5 seconds
Javier Ramirez
impresionable teowaki founder
javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (with pipelining)
$ ./redis-benchmark -r 1000000 -n 2000000 -t get,set,lpush,lpop -P 16 -q

SET: 552,028 requests per second
GET: 707,463 requests per second
LPUSH: 767,459 requests per second
LPOP: 770,119 requests per second
Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (without pipelining)
$ ./redis-benchmark -r 1000000 -n 2000000 -t get,set,lpush,lpop -q

SET: 122,556 requests per second
GET: 123,601 requests per second
LPUSH: 136,752 requests per second
LPOP: 132,424 requests per second
javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

open source, BSD licensed, advanced
key-value store. It is often referred to as a
data structure server since keys can contain
strings, hashes, lists, sets and sorted sets.
http://redis.io
started in 2009 by Salvatore Sanfilippo @antirez
100 contributors at

https://github.com/antirez/redis
javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

Redis lo guarda
todo
en memoria
todo el rato
javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

para qué se usa

javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

twitter
Cada time line (800 tweets
por usuario) está en redis
5000 writes per second avg
300K reads per second
javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

viacom
Grafo de dependencias entre objetos. Caché “potente”
Redis como cola para trabajos de fondo
Registro de actividad y contadores de visualizaciones
como buffer antes de guardar en mysql
Scripts Lua trabajando en nodos esclavos para
recalcular rankings de popularidad. El nuevo proceso
lleva 1/60th del tiempo que tardaba la versión anterior
en mysql

javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

nginx + lua + redis

apache + mruby + redis

javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

la solución obvia:
guardar en ficheros de
texto comprimidos en S3
http://aws.amazon.com/s3/

javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

fichero de texto tsv
2013-10-15T03:51:25Z

105

GET /teams/107-tw/links

category_guid=111 application/json

93.96.140.216

application/json
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:23.0) Gecko/20100101
Firefox/23.0
Teowaki Client

javier ramirez

@supercoco9

ES

http://teowaki.com

Madrid

codemotion 2013

mejor todavía:
enviar los ficheros a
glacier para reducir el
coste
http://aws.amazon.com/glacier/

javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

Hadoop (map/reduce)

http://hadoop.apache.org/
started in 2005 by Doug Cutting and Mike Cafarella
javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

cassandra
http://cassandra.apache.org/
released in 2008 by facebook.
javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

otras soluciones big data:

hadoop+voldemort+kafka
http://engineering.linkedin.com/projects

hbase
http://hbase.apache.org/
javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

Amazon Redshift

http://aws.amazon.com/redshift/

javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

Nuestra elección:
google bigquery
http://developers.google.com/bigquery

javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

Basado en Dremel
Específicamente diseñado
para consultas
interactivas en tiempo
real sobre petabytes
javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

Columnar storage
Fácil de comprimir
Conveniente para consultas
sobre series largas en una
misma columna
javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

Analítica de datos como
servicio
Permite ficheros planos o
JSON con jerarquía

javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

carga de datos
bq load --nosynchronous_mode
--encoding UTF-8
--field_delimiter 'tab'
--max_bad_records 100
--source_format CSV
api.stats
20131014T11-42-05Z.gz

javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

conceptos de bigquery
projects
datasets
tables
jobs

javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

Screenshot de la consola

javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

SQL casi estándar
select
from
join
where
group by
having
order
limit

javier ramirez

@supercoco9

avg
count
max
min
sum
+
*
/
%
http://teowaki.com

&
|
^
<<
>>
~

=
!=
<>
>
<
>= <=
IN
IS NULL
BETWEEN

AND
OR
NOT

codemotion 2013

Funciones variadas
current_date
current_time
now
datediff
day
day_of_week
day_of_year
hour
minute
quarter
year...
javier ramirez

@supercoco9

abs
acos
atan
ceil
floor
degrees
log
log2
log10
PI
SQRT...

http://teowaki.com

concat
contains
left
length
lower
upper
lpad
rpad
right
substr

codemotion 2013

“sql” específico para
analítica
within
flatten
nest

stddev

variance

top
first
last
nth

var_pop
var_samp
covar_pop
covar_samp
quantiles

javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

window functions

javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

correlaciones

javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

A más fotos de Alf, menos fotos
de gatos

javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

Cosas que siempre quisiste
probar pero no te dejaron
select count(*) from
publicdata:samples.wikipedia

where REGEXP_MATCH(title, "[0-9]*")
AND wp_namespace = 0;

223,163,387
Query complete (5.6s elapsed, 9.13 GB processed, Cost: 32¢)

javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

SELECT repository_name, repository_language,
repository_description, COUNT(repository_name) as cnt,
repository_url
FROM github.timeline
WHERE type="WatchEvent"
AND PARSE_UTC_USEC(created_at) >=
PARSE_UTC_USEC("#{yesterday} 20:00:00")
AND repository_url IN (
SELECT repository_url
FROM github.timeline
WHERE type="CreateEvent"
AND PARSE_UTC_USEC(repository_created_at) >=
PARSE_UTC_USEC('#{yesterday} 20:00:00')
AND repository_fork = "false"
AND payload_ref_type = "repository"
GROUP BY repository_url
)
GROUP BY repository_name, repository_language,
repository_description, repository_url
HAVING cnt >= 5
ORDER BY cnt DESC
LIMIT 25

tráfico por país

javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

usuario más activo

javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

10 request que deberíamos cachear

javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

5 recursos que más se crean

javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

precio redis
2* máquinas (master/slave) en digital
ocean
$10 mensuales
* estas mismas instancias de redis se
usan para todo el resto de procesos en
la app
javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

precio s3
$0.095 per GB
un fichero comprimido de 1.6 MB
representa 300K filas
$0.0001541698 / mes
javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

precio glacier
$0.01 per GB
un fichero comprimido de 1.6 MB
representa 300K filas
$0.000016 / mes
javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

precio bigquery
$80 por TB almacenado
300000 filas => $0.007629392 / mes

$35 por TB procesado
1 full scan = 84 MB
1 count = 0 MB
1 full scan de 1 columna = 5.4 MB
10 GB => $0.35 / mes
javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

redis
s3 storage
s3 transfer
glacier transfer
glacier storage
bigquery storage
bigquery queries

$10.0000000000
$00.0001541698
$00.0050000000
$00.0500000000
$00.0000160000
$00.0076293920
$00.3500000000

$10.41 / mes
para 330000 registros
javier ramirez

@supercoco9

http://teowaki.com

codemotion 2013

Si te ha gustado mi presentación, por favor
agradécemelo registrándote en

http://teowaki.com
Es un sitio para desarrolladores, puedes usarlo de
forma gratuíta y yo creo que mola bastante

<3 <3 <3
Javier Ramírez
@supercoco9
codemotion 2013

Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

Ähnlich wie Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013 (20)

Mehr von javier ramirez

Mehr von javier ramirez (20)

Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013