Modern Thinking: Cómo el Big Data y Cognitive están cambiando la estrategia de Marketing
Por: Ismael Yuste, Strategic Cloud Engineer Google Cloud
Presentación: Introducción a las soluciones Big Data de Google
4. Google
Data Centers
Los centros de datos de Google son la
base de toda la plataforma de Google
Cloud. Ofrecen poder computación,
almacenamiento, memoria, GPUs para
nuestras aplicaciones. Además,
alberga el corazón de aplicaciones
como Gmail, Youtube, Search...
● Rapidez
● Baja latencia
● Eficiencia de operaciones
● Eficiencia Energética
● Uso de Energías Renovables
● Cercanía al usuario
● Seguridad de la Información
6. Big Data
Soluciones de Big Data integradas de
principio a fin, que permite capturar los
datos, procesarlos y almacenarlos en
una plataforma integrada. Combina
servicios nativos en la nube y
herramientas Open Source
gestionadas, tanto en tiempo real como
por lotes.
Big Data
BigQuery
Cloud
Dataflow
Cloud
Dataproc
Cloud
Datalab
Cloud
Pub/Sub
Genomics
7. Big Data - Big Query
Tu almacén de
datos corporativo,
rápido, económico
y completamente
gestionado para
análisis de
grandes grupos
de datos
● Ingestión de datos flexible.
● Disponibilidad global.
● Seguridad y permisos integrados.
● Control de coste.
● Altamente disponible.
● Completamente integrado.
● Conecta con otros productos de Google.
8. Big Data - Cloud Dataflow
Servicio
completamente
gestionado y
modelo de
programación
para el proceso de
Big Data
● Gestión de Recursos integrado.
● A demanda.
● Ejecución de los trabajos inteligente.
● Auto escalado.
● Modelo de programación unificado.
● Open Source.
● Monitorizaje.
● Integración.
● Procesado confiable y consistente.
9. Big Data - Cloud Dataproc
Servicio
gestionado Spark
y Hadoop
● Gestión de Cluster integrado.
● Cluster dimensionables.
● Integración.
● Versionado.
● Herramientas de Gestión.
● Acciones de inicialización.
● Gestión manual o automática.
● Máquinas Virtuales flexibles.
10. Big Data
Datalab. Herramienta de exploración, análisis y visualización de
Big Data.
Pub/Sub. Servicio global en tiempo real para gestión de
mensajes y streaming de datos.
11. Big Data
Dataprep. Servicio de datos inteligente que permite explorar,
limpiar y preparar datos estructurados o no para su posterior
análisis.
Data Studio. Convierte tus datos en informes y cuadros de
mando que son sencillos de crear, de compartir, y totalmente
personalizables, desde fuentes de datos como Bigquery,
Analytics o Youtube.
12. Data Lifecycle Steps
Ingest
The first stage is to pull in
the raw data, such as
streaming data from
devices, on-premises
batch data, application
logs, or mobile-app user
events and analytics.
Store
After the data has been
retrieved, it needs to be
stored in a format that is
durable and can be easily
accessed.
Process & Analyze
In this stage, the data is
transformed from raw
form into actionable
information.
Explore & Visualize
The final stage is to
convert the results of the
analysis into a format
that is easy to draw
insights from and to
share with colleagues
and peers.
14. Typical Big Data
Jobs Programming
Resource
provisioning
Performance
tuning
Monitoring
Reliability
Deployment &
configuration
Handling
growing scale
Utilization
improvements
15. Big Data with
Google
Focus on insights.
Not infrastructure.
From batch to real-time.
Programming
Understanding
16. Data & Analytics
Cloud Dataproc
Fully managed Hadoop and Spark with
industry-leading performance
BigQuery
Fully managed data warehouse for
large-scale analytics
Cloud Dataflow
Real-time data pipelines, with open source
SDK via Apache Beam
17. Separation of Storage and Compute
● Access any storage system from any processing tool
● Keep as much data as you want, economically
● Share data in place, no more FTP and copying
Storage
Processing
BigQuery Storage
(tables)
BigQuery Analytics
Cloud Bigtable
(NoSQL)
Cloud Dataproc
Cloud Storage
(files)
Cloud Dataflow
18. 10+ years of Big Data innovation - Open Source
Google
Papers
20082002 2004 2006 2010 2012 2014 2015
GFS
Map
Reduce
Flume
Java
Millwheel
Open
Source
2005
Google
Cloud
Products BigQuery Pub/Sub Dataflow Bigtable
BigTable Dremel PubSub
Tensorflow
Dataflow
Apache
Beam(Incubating)
20. Machine Learning
Google Cloud ML Platform facilita
servicios modernos de machine
learning, con modelos pre-entrenados y
un servicio para generar tus propios
modelos.
Machine Learning
Cloud Machine
Learning
Vision API
Speech
API
Natural
Language API
Translation
API
Jobs API
21. Machine Learning - Cloud ML
Machine
learning sobre
cualquier tipo y
volumen de
datos
● Predicción a escala.
● Construcción de modelos sencilla.
● Capacidades de Aprendizaje Profundo (Deep Learning).
● Integración.
● HyperTune.
● Servicio gestionado y escalable.
● Modelos portables.
22. Machine Learning - APIs
Vision API . Analiza imágenes con el poder
de Google.
Speech API. Convierte conversaciones a
texto con el poder de la nube.
23. Machine Learning - APIs
Natural Language API . Saca conclusiones
de texto desestructurado con Cloud ML.
Translation API. Traduce sobre la marcha
entre miles de pares de lenguas.
24. Machine Learning - APIs
Jobs API . Gestiona tu portal de empleo con
Cloud ML.
Cloud Video Intelligence API. Analiza y
extrae información de tus videos.
25. Referencias para estar al día
Google Cloud Platform Blog
Google Cloud Platform Web
GCP Twitter
Google + GCP Community
GCP Podcast
Google Cloud Platform Canal de Youtube
26. Ejemplos de uso
When art meets big data: Analyzing 200,000 items from The Met
collection in BigQuery
Today we’re adding a new public dataset to
Google BigQuery: over 200,000 items from The
Metropolitan Museum of Art (aka “The Met”),
representing all its public domain art from a
total of 1.5 million art objects. The Met Museum
Public Domain dataset includes metadata about
each piece of art, along with an image or
images of the artifact. Google and The Met
Museum have been close collaborators for
years through Google Arts & Culture and we’re
incredibly excited to bring the museum's public
dataset to BigQuery.
27. Ejemplos de uso
Traveloka’s journey to stream analytics on Google Cloud Platform
Traveloka is a travel technology company based
in Jakarta, Indonesia, currently operating in six
countries. Founded in 2012 by former Silicon
Valley engineers, its goal is to revolutionize
human mobility.
One of the most strategic parts of our business
is a streaming data processing pipeline that
powers a number of use cases, including fraud
detection, personalization, ads optimization,
cross selling, A/B testing, and promotion
eligibility. That pipeline is also used by our
business analysts for monitoring and
understanding business metrics, both for
historical analysis and in real time.
28. Ejemplos de uso
Getting Your Feet Wet in the Data Lake: Analytics 360 in BigQuery
Benefits for Data Engineers, Analysts and
Marketers
As a Big Data platform, BigQuery offers benefits
for multiple stages and roles in the Big Data
process:
For marketers and analysts, you can run ad hoc
queries and get the results within minutes or
seconds. The elusive quest for understanding
online and offline attribution, user funnels, and
long-term customer value comes within reach.
For data engineers, BigQuery offers a
tremendous operational benefit, as outlined in
the next section.
29. Ejemplos de uso
How WePay uses stream analytics for real-time fraud detection
using GCP and Apache Kafka
When payments platform WePay was founded in 2008,
MySQL was our only backend storage. It served its purpose
well when data volume and traffic throughput were relatively
low, but by 2016, our business was growing rapidly and they
were growing along with it. Consequently, we started to see
performance degradation to the point where we could no
longer run concurrent queries without a negative impact on
latency.
Clearly, we needed a new stream analytics pipeline for fraud
detection that would give us answers to queries in near-real
time without affecting our main transactional business
system. In this post, I’ll explain how we built and deployed
such a pipeline to production using Apache Kafka and
Google Cloud Platform (GCP) services like Google Cloud
Dataflow and Cloud Bigtable.