Presentación en El Cubo de Sevilla en Febrero de 2015, sobre la necesidad de la utilización de datos en el día a día de las empresas tecnológicas. Genial ambiente!
Add a Data Scientist to your startup.. or call it quits!
1. ADD A DATA SCIENTIST TO YOUR
STARTUP…
OR CALL IT QUITS
Justo Hidalgo
@justohidalgo
2. Hi!
• Co-founder,
– BizDev, Data
• Data Integration and Management,
Product Strategy and Innovation
• Ph.D. in Computer Science on Data
Integration and Web Automation
• Ergo: Love Data
• @justohidalgo
3. A service to read and discover digital books that
works on any device
4. Part I: Why Data SciencePart I: Why Data Science
10. AARRR
AcquireAcquire ActivateActivate RetainRetain ReferRefer Get RevenueGet Revenue
SEO
SEM
Campaigns
Email
Blogs
…
Landing
Page
Product
Features
…
Content
(blogs,
articles,
…)
Emails
Alerts
…
Campaigns
Emails
…
Shopping
cart
Subscriptions
Lead Gen
…
traffic social business
25. Yes, Data Science is a
Frankenstein monster… but start
working on it…
26. Rol objetivo /
Tipo de
herramienta
CEO Data
Businessperso
n
Data Creative Data Developer Friki de
datos
Adquisición de
Datos
LabView,
MATLAB
ETLy
Virtualizac
ión de
Datos
Pentaho Kettle Denodo,
Composite,
Microstrategy,
Talend
Ab Initio, Apache
Flume, Amazon
Kinesis,
Datastage, IBM
Web
Scraping
Denodo, Kapow import.io,
Scrapy
Procesamient
o de Datos
Microsoft Excel IBM Watson
Analytics
R Hive Numenta
Análisis y
Modelado
de Datos
Azure Data
Factory, Denodo
Express, Weka
Apache Tika
Berkeley Data
Analytics Stack
(Spark, Shark,
MLbase)
Apache
UIMA, SciPy
para Python,
Herramien
tas de
“workbenc
h”
Matlab, Octave,
R
Aprendizaj
e
Automátic
o/ Minería
de Datos
IBM DB2
Intelligent
Miner, Oracle
DataMining,
RapidMiner,
DataRPM
Azure Machine
Learning,
Context
Relevant,
Orange, Weka
Apache Mahout,
H20.ai, Mallet,
MLBase,
Prediction.io,
Spark MLLib
ScalaNLP
Procesami
ento de
Lenguaje
Natural
Cortical.io, HP
Autonomy,
Oracle Endeca,
Smartlogic
Bitext, Luminoso AlchemyAPI,
Apache
OpenNLP
NLTK,
Stanford
CoreNLP
Interactive
Analytics
Azure Stream
Analytics,
Intercom.io
Apache Chukwa,
Apache Storm,
BigQuery,
CitusDB, ELK
stack
(Elasticsearch,
Logstash,
Kibana),
HortonWorks,
Impala
Business
Analytics
Pentaho,
Platfora
Microstrategy,
SAP Business
Objects, SAS
IBM SPSS,
QlikTech,
Tableau
Analítica
Web/ Digit
al
Google
Analytics,
KISSMetrics,
MixPanel
Adobe Analytics
(Omniture), IBM
Digital Analytics
(Coremetrics),
OpenText Web
and Social
1
Piwik,
WebTrekk,
Webtrends
Segment.io
Source:http://els2014/metricas
33. • Dashboard for
publishers
• Sales analysis
and forecast
• Product
experience
• Reader behavior
• Marketing
• Research (MSc)
34. Currently:
– Raw user data: 80 GB
– Book info: Hundreds of GB
– Over 1 TB of data
February, 2014:
– 700,000+ registered users in 24symbols.com
– 35,000 new registered users monthly (accelerated growth)
– Over 2,000 publishers, 200,000 books and growing
– New instances per country
➤Whitelabel with mobile carriers => hundreds of thousands of
users per country
➤Currently: 24symbols.com + 4 projects with mobile carriers +
internet.org (Colombia, but many more countries coming)
http://www.flickr.com/photos/qualityandstyle/4628275080/
Ockham’s razor, also spelled Occam’s razor, also called law of economy or law of parsimony, principle stated by William of Ockham (1285–1347/49), a Scholastic, that Pluralitas non est ponenda sine necessitate, “Plurality should not be posited without necessity.” The principle gives precedence to simplicity; of two competing theories, the simpler explanation of an entity is to be preferred. The principle is also expressed as “Entities are not to be multiplied beyond necessity.”
All metrics are ACTIONABLE
If you don’t plan to do anything with a metric, DON’T MEASURE IT!
i.e. Don’t waste your valuable and limited time
Kill useless metrics.
Based on Dave McClure’s AARRR metrics model (http://500hats.typepad.com/500blogs/2007/09/startup-metrics.html)