SlideShare ist ein Scribd-Unternehmen logo
1 von 53
Downloaden Sie, um offline zu lesen
OpenSistemas
Open Source Innovation for your Business
Apache Spark y como lo usamos en nuestros proyectos
Who am I
Data Solutions Manager at OpenSistemas
Former President of Hispalinux (Spanish
LUG)
Author “La Pastilla Roja” first spanish book
about Free Software
Menú
Un poco de historia: Hadoop
Qué es Apache Spark
Arquitectura Kappa
Algunos casos reales con Spark
Un poco de historia
En principio de los tiempos
Los Papers de Google
Nacimiento del proyecto Hadoop
La era pre Spark
Google publica algunos papers claves:
●
HDFS
●
MapReduce
●
HBASE
La era pre Spark
Nace Hadoop
●
Primera implementación libre
●
Yahoo es el sponsor
●
Se desarrolla uno de los entornos de trabajo
más interesantes y productivos de los años
2000
Batch-Hadoop (MR1)
Batch-MapReduce
Batch-Cascading
Cosas que no terminan de
gustarnos de Hadoop
Algunas cosas de Hadoop no terminan de
convencernos
No todo los problemas se solucionan con
Map Reduce
Hadoop ha generado un montón de
proyectos y tecnologías para resolverlo
Mientras tanto...
Hadoop ha cumplido 10 años
Ha habido muchos cambios tanto en hardware
como en software
In memory Computing
Y de pronto nace:
Apache Spark
Es un motor de analítica paralelizado
Escrito desde cero
Heredando lo mejor de la experiencia
“hadoop”
Probablemente la obra de ingeniería más
bella de los últimos años
Qué es Apache Spark
La primera reseña importante esta pensado
desde el principio para aprovechar al máximo
la memoria
Resiliente y escalable por definición
Qué es Apache Spark
Escrito, pensado e influido por el lenguaje
Scala
Spark, the elevator pitch
Spark streaming
Spark SQL
Cuando cualquier información con un
esquema se puede consultar con SQL
A little context
July 2, 2014 Jay Kreps coined the term Kappa
Architecture in an article for O’reilly Radar
Who is Jay Kreps
Jay has been involved in lots of projects
●
Author of the essay
●
The Log: What every software engineer
should know about real-time data's unifying
abstraction (12/16/2013)
●
https://engineering.linkedin.com/distributed-systems/log-what-every-software-
engineer-should-know-about-real-time-datas-unifying
Jay Kreps
Author of the book: I ♥ Logs
Jay Kreps
Involved with projects as:
●
Apache Kafka
●
Apache Samza
●
Voldemort
●
Azkaban
●
Ex-Linkedin
●
Now co-founder and CEO of Confluent
Lambda Architecture
Look something like this:
https://www.mapr.com/developercentral/lambda-architecture
Lambda Architecture
Batch layer that provides the following
functionality
●
managing the master dataset, an
immutable, append-only set of raw data
●
pre-computing arbitrary query functions,
called batch views
https://www.mapr.com/developercentral/lambda-architecture
Lambda Architecture
Serving layer
●
This layer indexes the batch views so that they can be
queried in ad hoc with low latency
Speed layer
●
This layer accommodates all requests that are subject
to low latency requirements. Using fast and
incremental algorithms, the speed layer deals with
recent data only
Lambda Architecture
Batch layer datasets can be in a distributed filesystem, while
MapReduce can be used to create batch views that can be fed
to the serving layer
The serving layer can be implemented using NoSQL
technologies such as HBase,Apache Druid, etc
Querying can be implemented by technologies such as Apache
Drill or Impala
Speed layer can be realized with data streaming technologies
such as Apache Storm or Spark Streaming
https://www.mapr.com/developercentral/lambda-architecture
Pros of Lambda Architecture
Retain the input data unchanged
●
Think about modeling data transformations, series of
data states from the original input
Lambda architecture take in account the problem of
reprocessing data
●
This happens all the time, the code will change, and
you will need to reprocess all the information. Lots of
reasons and you will need to live with this.
Cons of Lambda Architecture
Maintain the code that need to produce the same result
from two complex distributed system is painful
●
Very different code for MapReduce and Storm/Apache Spark
Not only is about different code, is also about debugging
and interaction with other products like (hive, Oozie,
Cascading, etc)
At the end is a problem about different and diverging
programming paradigms
So what is Kappa Architecture
The proposal of Jay Kreps is so simple
●
Use kafka (or other system) that will let you retain
the full log of the data you need to reprocess
●
When you want to do the reprocessing, start a
second instance of your stream processing job that
starts processing from the beginning of the
retained data, but direct this output data to a new
output table
part II
●
When the second job has caught up, switch
the application to read from the new table
●
Stop the old version of the job, and delete
the old output table
So what is Kappa Architecture
This architecture looks something like this
So what is Kappa Architecture
The first benefit is that only you need to reprocessing
only when you change the code
You can check if the new version is working ok and if
not reverse to the old output table
You can mirror a Kafka topic to HDFS so you are not
limited to the Kafka retention configuration
You have only a code to maintain with an unique
framework
So what is Kappa Architecture
The real advantage is not about efficiency at
all (You will need extra temporarily storage
when reprocessing for example) is allowing
your team to develop, test, debug and
operate their systems on top of a single
processing framework
So what is Kappa Architecture
Is not a silver bullet to solve every problem at
Big Data
Is not a list of prescriptions of technologies.
You can implement with your favorite
frameworks
Is not a rigid set of rules. But helps to maintain
the complex projects simple
What is not Kappa Architecture
We start working with projects with a
complex structure like Linkedin looks at early
stage
That’s very usual
How we use Kappa
Architecture
How we use Kappa
Architecture
We try to refactoring the data flows to fix in
a Kappa Architecture
How we use Kappa
Architecture
How we use Kappa
Architecture
We use Kafka as Stream Data Platform
Instead of Samza we feel more comfortable
with Spark Streaming
At ASPGems we choose Apache Spark as our
Analytics Engine and not only for Spark
Streaming
How we use Kappa
Architecture
At the end, Kappa Architecture is design
pattern for us
We use/clone this pattern in almost our
projects
We have projects of every size, volume of data
or speed needing and fix with the Kappa
Architecture
How we use Kappa
Architecture
Telefónica - MSS
We use KA to calculate near real time KPIs,
SLAs related with the managed security system
We simplify the data flow of the input data
Kafka in the streaming data platform
As MPP we use CassandraDB
Use cases
IOT – OBD II
One of our clients install On Board Devices in the cars
of its customers.
We implement an API to got all the information in real
time and inject the information in Kafka
The business rules are implemented in a CEP running
into Apache Spark Streaming
As MPP we use Elastic Search
Use cases
Insurance company
We implement Kappa Architecture to process
click stream in real time and clustering users
We show content and offers that better fix
users
Use cases
Energy Facility
We implement Kappa Architecture to process and
predict energy consume.
Our customer include energy storage systems and we
got all the information about energy storage (ultra-
capacitors and batteries).
We process this information to calculate the effective
lifetime of the components and its degradation.
Use cases
Questions
Thank you
Juantomás García
juantomas@opensistemas.com
@juantomas
www.opensistemas.com
comercial@opensistemas.com

Weitere ähnliche Inhalte

Was ist angesagt?

Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ IndixRajesh Muppalla
 
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...Spark Summit
 
Implementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkImplementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkDataWorks Summit
 
Neo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache KafkaNeo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache Kafkajexp
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streamingdatamantra
 
Using Visualization to Succeed with Big Data
Using Visualization to Succeed with Big Data Using Visualization to Succeed with Big Data
Using Visualization to Succeed with Big Data Pactera_US
 
An Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaAn Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaSpark Summit
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...confluent
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnDatabricks
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsAsis Mohanty
 
Spark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit
 
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...Spark Summit
 
Zeppelin at Twitter
Zeppelin at TwitterZeppelin at Twitter
Zeppelin at TwitterPrasad Wagle
 
Operationalize Apache Spark Analytics
Operationalize Apache Spark AnalyticsOperationalize Apache Spark Analytics
Operationalize Apache Spark AnalyticsDatabricks
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Databricks
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureDatabricks
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...Spark Summit
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaDatabricks
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingDatabricks
 

Was ist angesagt? (20)

Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ Indix
 
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
 
Implementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkImplementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache Spark
 
Neo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache KafkaNeo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache Kafka
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streaming
 
Using Visualization to Succeed with Big Data
Using Visualization to Succeed with Big Data Using Visualization to Succeed with Big Data
Using Visualization to Succeed with Big Data
 
An Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaAn Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal Malohlava
 
Tailored for Spark
Tailored for SparkTailored for Spark
Tailored for Spark
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture Patterns
 
Spark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos Erotocritou
 
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
 
Zeppelin at Twitter
Zeppelin at TwitterZeppelin at Twitter
Zeppelin at Twitter
 
Operationalize Apache Spark Analytics
Operationalize Apache Spark AnalyticsOperationalize Apache Spark Analytics
Operationalize Apache Spark Analytics
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data Capture
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 

Andere mochten auch

El software como acción humana
El software como acción humanaEl software como acción humana
El software como acción humanaOpenSistemas
 
NoSQL: Un Cambio de Paradigma - Apache Cassandra
NoSQL: Un Cambio de Paradigma - Apache CassandraNoSQL: Un Cambio de Paradigma - Apache Cassandra
NoSQL: Un Cambio de Paradigma - Apache CassandraWladimir Cabarcas
 
El futuro Data Driven en e-Learning y RR.HH.
El futuro Data Driven en e-Learning y RR.HH.El futuro Data Driven en e-Learning y RR.HH.
El futuro Data Driven en e-Learning y RR.HH.OpenSistemas
 
Tutorial en Apache Spark - Clasificando tweets en realtime
Tutorial en Apache Spark - Clasificando tweets en realtimeTutorial en Apache Spark - Clasificando tweets en realtime
Tutorial en Apache Spark - Clasificando tweets en realtimeSocialmetrix
 
Kappa Architecture, IoT of the cars - LibreCon 2016
Kappa Architecture, IoT of the cars - LibreCon 2016Kappa Architecture, IoT of the cars - LibreCon 2016
Kappa Architecture, IoT of the cars - LibreCon 2016LibreCon
 
Área de Soporte - OpenSistemas
Área de Soporte - OpenSistemasÁrea de Soporte - OpenSistemas
Área de Soporte - OpenSistemasOpenSistemas
 
Drupal 7. Puesta en producción en sistemas multientorno
Drupal 7. Puesta en producción en sistemas multientornoDrupal 7. Puesta en producción en sistemas multientorno
Drupal 7. Puesta en producción en sistemas multientornoOpenSistemas
 
Área Education - OpenSistemas
Área Education - OpenSistemasÁrea Education - OpenSistemas
Área Education - OpenSistemasOpenSistemas
 
Cómo crear ports en FreeBSD #PicnicCode2015
Cómo crear ports en FreeBSD #PicnicCode2015Cómo crear ports en FreeBSD #PicnicCode2015
Cómo crear ports en FreeBSD #PicnicCode2015OpenSistemas
 
osBrain: una herramienta para la inversión automática en bolsa y mercados de ...
osBrain: una herramienta para la inversión automática en bolsa y mercados de ...osBrain: una herramienta para la inversión automática en bolsa y mercados de ...
osBrain: una herramienta para la inversión automática en bolsa y mercados de ...OpenSistemas
 
Minería de datos para trading automático
Minería de datos para trading automáticoMinería de datos para trading automático
Minería de datos para trading automáticoOpenSistemas
 

Andere mochten auch (13)

El software como acción humana
El software como acción humanaEl software como acción humana
El software como acción humana
 
NoSQL: Un Cambio de Paradigma - Apache Cassandra
NoSQL: Un Cambio de Paradigma - Apache CassandraNoSQL: Un Cambio de Paradigma - Apache Cassandra
NoSQL: Un Cambio de Paradigma - Apache Cassandra
 
El futuro Data Driven en e-Learning y RR.HH.
El futuro Data Driven en e-Learning y RR.HH.El futuro Data Driven en e-Learning y RR.HH.
El futuro Data Driven en e-Learning y RR.HH.
 
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
 
ASPgems - kappa architecture
ASPgems - kappa architectureASPgems - kappa architecture
ASPgems - kappa architecture
 
Tutorial en Apache Spark - Clasificando tweets en realtime
Tutorial en Apache Spark - Clasificando tweets en realtimeTutorial en Apache Spark - Clasificando tweets en realtime
Tutorial en Apache Spark - Clasificando tweets en realtime
 
Kappa Architecture, IoT of the cars - LibreCon 2016
Kappa Architecture, IoT of the cars - LibreCon 2016Kappa Architecture, IoT of the cars - LibreCon 2016
Kappa Architecture, IoT of the cars - LibreCon 2016
 
Área de Soporte - OpenSistemas
Área de Soporte - OpenSistemasÁrea de Soporte - OpenSistemas
Área de Soporte - OpenSistemas
 
Drupal 7. Puesta en producción en sistemas multientorno
Drupal 7. Puesta en producción en sistemas multientornoDrupal 7. Puesta en producción en sistemas multientorno
Drupal 7. Puesta en producción en sistemas multientorno
 
Área Education - OpenSistemas
Área Education - OpenSistemasÁrea Education - OpenSistemas
Área Education - OpenSistemas
 
Cómo crear ports en FreeBSD #PicnicCode2015
Cómo crear ports en FreeBSD #PicnicCode2015Cómo crear ports en FreeBSD #PicnicCode2015
Cómo crear ports en FreeBSD #PicnicCode2015
 
osBrain: una herramienta para la inversión automática en bolsa y mercados de ...
osBrain: una herramienta para la inversión automática en bolsa y mercados de ...osBrain: una herramienta para la inversión automática en bolsa y mercados de ...
osBrain: una herramienta para la inversión automática en bolsa y mercados de ...
 
Minería de datos para trading automático
Minería de datos para trading automáticoMinería de datos para trading automático
Minería de datos para trading automático
 

Ähnlich wie Apache Spark y como lo usamos en nuestros proyectos

Stream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and KafkaStream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and KafkaItai Yaffe
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkSlim Baltagi
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache FlinkAKASH SIHAG
 
Big_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_SessionBig_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_SessionRUHULAMINHAZARIKA
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with SparkKnoldus Inc.
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to sparkHome
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamDataWorks Summit/Hadoop Summit
 
Analyzing Data at Scale with Apache Spark
Analyzing Data at Scale with Apache SparkAnalyzing Data at Scale with Apache Spark
Analyzing Data at Scale with Apache SparkNicola Ferraro
 
Introduction to GCP Data Flow Presentation
Introduction to GCP Data Flow PresentationIntroduction to GCP Data Flow Presentation
Introduction to GCP Data Flow PresentationKnoldus Inc.
 
Introduction to GCP DataFlow Presentation
Introduction to GCP DataFlow PresentationIntroduction to GCP DataFlow Presentation
Introduction to GCP DataFlow PresentationKnoldus Inc.
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architectureSohil Jain
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architectureSohil Jain
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Anant Corporation
 
Cascading on starfish
Cascading on starfishCascading on starfish
Cascading on starfishFei Dong
 
Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014mahchiev
 

Ähnlich wie Apache Spark y como lo usamos en nuestros proyectos (20)

Stream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and KafkaStream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and Kafka
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache Flink
 
Apache spark
Apache sparkApache spark
Apache spark
 
Big_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_SessionBig_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_Session
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with Spark
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache Beam
 
Module01
 Module01 Module01
Module01
 
Analyzing Data at Scale with Apache Spark
Analyzing Data at Scale with Apache SparkAnalyzing Data at Scale with Apache Spark
Analyzing Data at Scale with Apache Spark
 
Introduction to GCP Data Flow Presentation
Introduction to GCP Data Flow PresentationIntroduction to GCP Data Flow Presentation
Introduction to GCP Data Flow Presentation
 
Introduction to GCP DataFlow Presentation
Introduction to GCP DataFlow PresentationIntroduction to GCP DataFlow Presentation
Introduction to GCP DataFlow Presentation
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
Cascading on starfish
Cascading on starfishCascading on starfish
Cascading on starfish
 
Apache Spark PDF
Apache Spark PDFApache Spark PDF
Apache Spark PDF
 
spark_v1_2
spark_v1_2spark_v1_2
spark_v1_2
 
Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014
 

Mehr von OpenSistemas

OpenSistemas Corporate Presentation
OpenSistemas Corporate PresentationOpenSistemas Corporate Presentation
OpenSistemas Corporate PresentationOpenSistemas
 
Data Platform & Analytics OpenSistemas MSFT Playbook
Data Platform & Analytics OpenSistemas MSFT PlaybookData Platform & Analytics OpenSistemas MSFT Playbook
Data Platform & Analytics OpenSistemas MSFT PlaybookOpenSistemas
 
Proceso de liberación en el marco legal del código abierto - OpenSistemas
Proceso de liberación en el marco legal del código abierto - OpenSistemasProceso de liberación en el marco legal del código abierto - OpenSistemas
Proceso de liberación en el marco legal del código abierto - OpenSistemasOpenSistemas
 
Virtualization - Solaris LDOMs - OpenSistemas
Virtualization - Solaris LDOMs - OpenSistemasVirtualization - Solaris LDOMs - OpenSistemas
Virtualization - Solaris LDOMs - OpenSistemasOpenSistemas
 
CACert - A Community-driven Certification Authority - OpenSistemas
CACert - A Community-driven Certification Authority - OpenSistemasCACert - A Community-driven Certification Authority - OpenSistemas
CACert - A Community-driven Certification Authority - OpenSistemasOpenSistemas
 
Floss leaders - OpenSistemas
Floss leaders - OpenSistemasFloss leaders - OpenSistemas
Floss leaders - OpenSistemasOpenSistemas
 
Business Intelligence and Pentaho Services - OpenSistemas
Business Intelligence and Pentaho Services - OpenSistemasBusiness Intelligence and Pentaho Services - OpenSistemas
Business Intelligence and Pentaho Services - OpenSistemasOpenSistemas
 
easyGTD - product Info
easyGTD - product InfoeasyGTD - product Info
easyGTD - product InfoOpenSistemas
 
easyGTD - presentación producto
easyGTD - presentación productoeasyGTD - presentación producto
easyGTD - presentación productoOpenSistemas
 

Mehr von OpenSistemas (10)

From SF with Love
From SF with LoveFrom SF with Love
From SF with Love
 
OpenSistemas Corporate Presentation
OpenSistemas Corporate PresentationOpenSistemas Corporate Presentation
OpenSistemas Corporate Presentation
 
Data Platform & Analytics OpenSistemas MSFT Playbook
Data Platform & Analytics OpenSistemas MSFT PlaybookData Platform & Analytics OpenSistemas MSFT Playbook
Data Platform & Analytics OpenSistemas MSFT Playbook
 
Proceso de liberación en el marco legal del código abierto - OpenSistemas
Proceso de liberación en el marco legal del código abierto - OpenSistemasProceso de liberación en el marco legal del código abierto - OpenSistemas
Proceso de liberación en el marco legal del código abierto - OpenSistemas
 
Virtualization - Solaris LDOMs - OpenSistemas
Virtualization - Solaris LDOMs - OpenSistemasVirtualization - Solaris LDOMs - OpenSistemas
Virtualization - Solaris LDOMs - OpenSistemas
 
CACert - A Community-driven Certification Authority - OpenSistemas
CACert - A Community-driven Certification Authority - OpenSistemasCACert - A Community-driven Certification Authority - OpenSistemas
CACert - A Community-driven Certification Authority - OpenSistemas
 
Floss leaders - OpenSistemas
Floss leaders - OpenSistemasFloss leaders - OpenSistemas
Floss leaders - OpenSistemas
 
Business Intelligence and Pentaho Services - OpenSistemas
Business Intelligence and Pentaho Services - OpenSistemasBusiness Intelligence and Pentaho Services - OpenSistemas
Business Intelligence and Pentaho Services - OpenSistemas
 
easyGTD - product Info
easyGTD - product InfoeasyGTD - product Info
easyGTD - product Info
 
easyGTD - presentación producto
easyGTD - presentación productoeasyGTD - presentación producto
easyGTD - presentación producto
 

Kürzlich hochgeladen

Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 

Kürzlich hochgeladen (20)

Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 

Apache Spark y como lo usamos en nuestros proyectos

  • 1. OpenSistemas Open Source Innovation for your Business Apache Spark y como lo usamos en nuestros proyectos
  • 2. Who am I Data Solutions Manager at OpenSistemas Former President of Hispalinux (Spanish LUG) Author “La Pastilla Roja” first spanish book about Free Software
  • 3. Menú Un poco de historia: Hadoop Qué es Apache Spark Arquitectura Kappa Algunos casos reales con Spark
  • 4. Un poco de historia En principio de los tiempos Los Papers de Google Nacimiento del proyecto Hadoop
  • 5. La era pre Spark Google publica algunos papers claves: ● HDFS ● MapReduce ● HBASE
  • 6. La era pre Spark Nace Hadoop ● Primera implementación libre ● Yahoo es el sponsor ● Se desarrolla uno de los entornos de trabajo más interesantes y productivos de los años 2000
  • 10. Cosas que no terminan de gustarnos de Hadoop Algunas cosas de Hadoop no terminan de convencernos No todo los problemas se solucionan con Map Reduce Hadoop ha generado un montón de proyectos y tecnologías para resolverlo
  • 11.
  • 12. Mientras tanto... Hadoop ha cumplido 10 años Ha habido muchos cambios tanto en hardware como en software
  • 14. Y de pronto nace: Apache Spark Es un motor de analítica paralelizado Escrito desde cero Heredando lo mejor de la experiencia “hadoop” Probablemente la obra de ingeniería más bella de los últimos años
  • 15. Qué es Apache Spark La primera reseña importante esta pensado desde el principio para aprovechar al máximo la memoria Resiliente y escalable por definición
  • 16. Qué es Apache Spark Escrito, pensado e influido por el lenguaje Scala
  • 17.
  • 19.
  • 20.
  • 21.
  • 22.
  • 24. Spark SQL Cuando cualquier información con un esquema se puede consultar con SQL
  • 25. A little context July 2, 2014 Jay Kreps coined the term Kappa Architecture in an article for O’reilly Radar
  • 26. Who is Jay Kreps Jay has been involved in lots of projects ● Author of the essay ● The Log: What every software engineer should know about real-time data's unifying abstraction (12/16/2013) ● https://engineering.linkedin.com/distributed-systems/log-what-every-software- engineer-should-know-about-real-time-datas-unifying
  • 27. Jay Kreps Author of the book: I ♥ Logs
  • 28. Jay Kreps Involved with projects as: ● Apache Kafka ● Apache Samza ● Voldemort ● Azkaban ● Ex-Linkedin ● Now co-founder and CEO of Confluent
  • 29. Lambda Architecture Look something like this: https://www.mapr.com/developercentral/lambda-architecture
  • 30. Lambda Architecture Batch layer that provides the following functionality ● managing the master dataset, an immutable, append-only set of raw data ● pre-computing arbitrary query functions, called batch views https://www.mapr.com/developercentral/lambda-architecture
  • 31. Lambda Architecture Serving layer ● This layer indexes the batch views so that they can be queried in ad hoc with low latency Speed layer ● This layer accommodates all requests that are subject to low latency requirements. Using fast and incremental algorithms, the speed layer deals with recent data only
  • 32. Lambda Architecture Batch layer datasets can be in a distributed filesystem, while MapReduce can be used to create batch views that can be fed to the serving layer The serving layer can be implemented using NoSQL technologies such as HBase,Apache Druid, etc Querying can be implemented by technologies such as Apache Drill or Impala Speed layer can be realized with data streaming technologies such as Apache Storm or Spark Streaming https://www.mapr.com/developercentral/lambda-architecture
  • 33. Pros of Lambda Architecture Retain the input data unchanged ● Think about modeling data transformations, series of data states from the original input Lambda architecture take in account the problem of reprocessing data ● This happens all the time, the code will change, and you will need to reprocess all the information. Lots of reasons and you will need to live with this.
  • 34. Cons of Lambda Architecture Maintain the code that need to produce the same result from two complex distributed system is painful ● Very different code for MapReduce and Storm/Apache Spark Not only is about different code, is also about debugging and interaction with other products like (hive, Oozie, Cascading, etc) At the end is a problem about different and diverging programming paradigms
  • 35. So what is Kappa Architecture The proposal of Jay Kreps is so simple ● Use kafka (or other system) that will let you retain the full log of the data you need to reprocess ● When you want to do the reprocessing, start a second instance of your stream processing job that starts processing from the beginning of the retained data, but direct this output data to a new output table
  • 36. part II ● When the second job has caught up, switch the application to read from the new table ● Stop the old version of the job, and delete the old output table So what is Kappa Architecture
  • 37. This architecture looks something like this So what is Kappa Architecture
  • 38. The first benefit is that only you need to reprocessing only when you change the code You can check if the new version is working ok and if not reverse to the old output table You can mirror a Kafka topic to HDFS so you are not limited to the Kafka retention configuration You have only a code to maintain with an unique framework So what is Kappa Architecture
  • 39. The real advantage is not about efficiency at all (You will need extra temporarily storage when reprocessing for example) is allowing your team to develop, test, debug and operate their systems on top of a single processing framework So what is Kappa Architecture
  • 40. Is not a silver bullet to solve every problem at Big Data Is not a list of prescriptions of technologies. You can implement with your favorite frameworks Is not a rigid set of rules. But helps to maintain the complex projects simple What is not Kappa Architecture
  • 41. We start working with projects with a complex structure like Linkedin looks at early stage That’s very usual How we use Kappa Architecture
  • 42. How we use Kappa Architecture
  • 43. We try to refactoring the data flows to fix in a Kappa Architecture How we use Kappa Architecture
  • 44. How we use Kappa Architecture
  • 45. We use Kafka as Stream Data Platform Instead of Samza we feel more comfortable with Spark Streaming At ASPGems we choose Apache Spark as our Analytics Engine and not only for Spark Streaming How we use Kappa Architecture
  • 46. At the end, Kappa Architecture is design pattern for us We use/clone this pattern in almost our projects We have projects of every size, volume of data or speed needing and fix with the Kappa Architecture How we use Kappa Architecture
  • 47. Telefónica - MSS We use KA to calculate near real time KPIs, SLAs related with the managed security system We simplify the data flow of the input data Kafka in the streaming data platform As MPP we use CassandraDB Use cases
  • 48. IOT – OBD II One of our clients install On Board Devices in the cars of its customers. We implement an API to got all the information in real time and inject the information in Kafka The business rules are implemented in a CEP running into Apache Spark Streaming As MPP we use Elastic Search Use cases
  • 49. Insurance company We implement Kappa Architecture to process click stream in real time and clustering users We show content and offers that better fix users Use cases
  • 50. Energy Facility We implement Kappa Architecture to process and predict energy consume. Our customer include energy storage systems and we got all the information about energy storage (ultra- capacitors and batteries). We process this information to calculate the effective lifetime of the components and its degradation. Use cases