SlideShare a Scribd company logo
1 of 18
Download to read offline
Real-Time Data Processing Pipeline &
Visualization with Docker, Spark, Kafka
and Cassandra
Roberto G. Hashioka – 2016-10-04 – TIAD – Paris
Personal Information
• Roberto Gandolfo Hashioka
• @rogaha (Github) e @rhashioka (Twitter)
• Finance -> Software Engineer
• Growth & Data Engineer at Docker
Summary
• Background / Motivation
• Project Goals
• How to build it?
• DEMO
Background
• Gather of data from multiple sources and process them in “real-time”
• Transform raw data into meaningful and useful information used to enable more effective
decision-making process
• Provide more visibility into trends on: 1) user behavior 2) feature engagement 3) opportunities
for future investments
• Data transparency and standardization
Project Goals
• Create a data processing pipeline that can handle a huge amount of events per second
• Automate the development environment — Docker compose.
• Automate the remote machines management — Docker for AWS / Machine.
• Reduce the time to market / time to development — New hires / new features.
Project / Language Stack
How to build it?
• Step 1: Install Docker for Mac/Win and dockerize all the applications
link: https://www.docker.com/products/docker
Exemplo de Dockerfile
-----------------------------------------------------------------------------------------------------------
FROM ubuntu:14.04
MAINTAINER Roberto Hashioka (roberto@docker.com)
RUN apt-get update && apt-get install -y nginx
RUN echo “Hello World! #TIAD” > /usr/share/nginx/html/index.html
EXPOSE 80
------------------------------------------------------------------------------------------------------------
$ docker build –t rogaha/web_demotiad2016 .
$ docker run –d –p 80:80 –-name web_demotiad2016 rogaha/web_demotiad2016
How to build it?
• Step 2: Define your services stack with a docker-compose file
Docker Compose
containers:
web:
build: .
command: python app.py
ports:
- "5000:5000"
volumes:
- .:/code
links:
- redis
environment:
- PYTHONUNBUFFERED=1
redis:
image: redis:latest
command: redis-server --appendonly yes
How to build it?
• Step 3: Test the applications locally from your laptop using containers
How to build it?
How to build it?
• Step 4: Provision your remote servers and deploy your containers
How to build it?
How to build it?
• Step 5: Scale your services with Docker swarm
DEMO
source code: https://github.com/rogaha/data-processing-pipeline
Open Source Projects Used
• Docker (https://github.com/docker/docker)
• An open platform for distributed applications for developers and sysadmins
• Apache Spark / Spark SQL (https://github.com/apache/spark)
• A fast, in-memory data processing engine. Spark SQL lets you query structured data as a resilient distributed dataset (RDD)
• Apache Kafka (https://github.com/apache/kafka)
• A fast and scalable pub-sub messaging service
• Apache Zookeeper (https://github.com/apache/zookeeper)
• A distributed configuration service, synchronization service, and naming registry for large distributed systems
• Apache Cassandra (https://github.com/apache/cassandra)
• Scalable, high-available and distributed columnar NoSQL database
• D3 (https://github.com/mbostock/d3)
• A JavaScript visualization library for HTML and SVG.
Thanks!
Questions?
@rhashioka

More Related Content

What's hot

TÌM HIỂU FIREWALL VÀ TRIỂN KHAI TRÊN MÃ NGUỒN MỞ
TÌM HIỂU FIREWALL VÀ TRIỂN KHAI TRÊN MÃ NGUỒN MỞTÌM HIỂU FIREWALL VÀ TRIỂN KHAI TRÊN MÃ NGUỒN MỞ
TÌM HIỂU FIREWALL VÀ TRIỂN KHAI TRÊN MÃ NGUỒN MỞTiki.vn
 
Nhom 16 big data
Nhom 16 big dataNhom 16 big data
Nhom 16 big dataDuy Phan
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudDatabricks
 
Giải pháp xử lý big data trên apache spark
Giải pháp xử lý big data trên apache sparkGiải pháp xử lý big data trên apache spark
Giải pháp xử lý big data trên apache sparkLinh Ngoc
 
Luận văn: Nghiên cứu và triển khai hệ thống Private Cloud cho các ứng dụng đà...
Luận văn: Nghiên cứu và triển khai hệ thống Private Cloud cho các ứng dụng đà...Luận văn: Nghiên cứu và triển khai hệ thống Private Cloud cho các ứng dụng đà...
Luận văn: Nghiên cứu và triển khai hệ thống Private Cloud cho các ứng dụng đà...Viết thuê trọn gói ZALO 0934573149
 
Hướng dẫn cài đặt và cấu hình cacti trên cent os
Hướng dẫn cài đặt và cấu hình cacti trên cent osHướng dẫn cài đặt và cấu hình cacti trên cent os
Hướng dẫn cài đặt và cấu hình cacti trên cent oslaonap166
 
seminar presentation on apache-spark
seminar presentation on apache-sparkseminar presentation on apache-spark
seminar presentation on apache-sparkJawhar Ali
 
bao cao linux
bao cao linuxbao cao linux
bao cao linuxbinhnv186
 
Hệ điều hành (chương 2)
Hệ điều hành (chương 2)Hệ điều hành (chương 2)
Hệ điều hành (chương 2)realpotter
 
Chuyển mạch nhãn đa giao thức mpls
Chuyển mạch nhãn đa giao thức mplsChuyển mạch nhãn đa giao thức mpls
Chuyển mạch nhãn đa giao thức mplssuccessnguyen86
 
9 CÂU NÓI NỔI TIẾNG VỀ BIG DATA-DỮ LIỆU LỚN
9 CÂU NÓI NỔI TIẾNG VỀ BIG DATA-DỮ LIỆU LỚN9 CÂU NÓI NỔI TIẾNG VỀ BIG DATA-DỮ LIỆU LỚN
9 CÂU NÓI NỔI TIẾNG VỀ BIG DATA-DỮ LIỆU LỚNHuynh Huu Tai
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer GuideDeon Huang
 
Giới thiệu cơ bản về Big Data và các ứng dụng thực tiễn
Giới thiệu cơ bản về Big Data và các ứng dụng thực tiễnGiới thiệu cơ bản về Big Data và các ứng dụng thực tiễn
Giới thiệu cơ bản về Big Data và các ứng dụng thực tiễnTrieu Nguyen
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Thuật toán mã hóa rsa
Thuật toán mã hóa rsaThuật toán mã hóa rsa
Thuật toán mã hóa rsaBảo Điệp
 

What's hot (20)

TÌM HIỂU FIREWALL VÀ TRIỂN KHAI TRÊN MÃ NGUỒN MỞ
TÌM HIỂU FIREWALL VÀ TRIỂN KHAI TRÊN MÃ NGUỒN MỞTÌM HIỂU FIREWALL VÀ TRIỂN KHAI TRÊN MÃ NGUỒN MỞ
TÌM HIỂU FIREWALL VÀ TRIỂN KHAI TRÊN MÃ NGUỒN MỞ
 
Nhom 16 big data
Nhom 16 big dataNhom 16 big data
Nhom 16 big data
 
Luận văn: Ứng dụng công nghệ IoT cho giám sát môi trường, HAY
Luận văn: Ứng dụng công nghệ IoT cho giám sát môi trường, HAYLuận văn: Ứng dụng công nghệ IoT cho giám sát môi trường, HAY
Luận văn: Ứng dụng công nghệ IoT cho giám sát môi trường, HAY
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
Giải pháp xử lý big data trên apache spark
Giải pháp xử lý big data trên apache sparkGiải pháp xử lý big data trên apache spark
Giải pháp xử lý big data trên apache spark
 
Luận văn: Nghiên cứu và triển khai hệ thống Private Cloud cho các ứng dụng đà...
Luận văn: Nghiên cứu và triển khai hệ thống Private Cloud cho các ứng dụng đà...Luận văn: Nghiên cứu và triển khai hệ thống Private Cloud cho các ứng dụng đà...
Luận văn: Nghiên cứu và triển khai hệ thống Private Cloud cho các ứng dụng đà...
 
Hadoop Internals
Hadoop InternalsHadoop Internals
Hadoop Internals
 
Hướng dẫn cài đặt và cấu hình cacti trên cent os
Hướng dẫn cài đặt và cấu hình cacti trên cent osHướng dẫn cài đặt và cấu hình cacti trên cent os
Hướng dẫn cài đặt và cấu hình cacti trên cent os
 
seminar presentation on apache-spark
seminar presentation on apache-sparkseminar presentation on apache-spark
seminar presentation on apache-spark
 
bao cao linux
bao cao linuxbao cao linux
bao cao linux
 
Hệ điều hành (chương 2)
Hệ điều hành (chương 2)Hệ điều hành (chương 2)
Hệ điều hành (chương 2)
 
Chuyển mạch nhãn đa giao thức mpls
Chuyển mạch nhãn đa giao thức mplsChuyển mạch nhãn đa giao thức mpls
Chuyển mạch nhãn đa giao thức mpls
 
9 CÂU NÓI NỔI TIẾNG VỀ BIG DATA-DỮ LIỆU LỚN
9 CÂU NÓI NỔI TIẾNG VỀ BIG DATA-DỮ LIỆU LỚN9 CÂU NÓI NỔI TIẾNG VỀ BIG DATA-DỮ LIỆU LỚN
9 CÂU NÓI NỔI TIẾNG VỀ BIG DATA-DỮ LIỆU LỚN
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer Guide
 
Đề tài: Hệ thống giám sát mạng dựa trên phần mềm Zabbix, 9đ
Đề tài: Hệ thống giám sát mạng dựa trên phần mềm Zabbix, 9đĐề tài: Hệ thống giám sát mạng dựa trên phần mềm Zabbix, 9đ
Đề tài: Hệ thống giám sát mạng dựa trên phần mềm Zabbix, 9đ
 
Luận văn: Tìm hiểu giải pháp ảo hóa docker và ứng dụng, HOT
Luận văn: Tìm hiểu giải pháp ảo hóa docker và ứng dụng, HOTLuận văn: Tìm hiểu giải pháp ảo hóa docker và ứng dụng, HOT
Luận văn: Tìm hiểu giải pháp ảo hóa docker và ứng dụng, HOT
 
Đề tài: Tìm hiểu giải pháp an ninh mạng với firewall, HAY, 9đ
Đề tài: Tìm hiểu giải pháp an ninh mạng với firewall, HAY, 9đĐề tài: Tìm hiểu giải pháp an ninh mạng với firewall, HAY, 9đ
Đề tài: Tìm hiểu giải pháp an ninh mạng với firewall, HAY, 9đ
 
Giới thiệu cơ bản về Big Data và các ứng dụng thực tiễn
Giới thiệu cơ bản về Big Data và các ứng dụng thực tiễnGiới thiệu cơ bản về Big Data và các ứng dụng thực tiễn
Giới thiệu cơ bản về Big Data và các ứng dụng thực tiễn
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Thuật toán mã hóa rsa
Thuật toán mã hóa rsaThuật toán mã hóa rsa
Thuật toán mã hóa rsa
 

Viewers also liked

TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, S...
TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, S...TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, S...
TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, S...The Incredible Automation Day
 
Test strategies for data processing pipelines
Test strategies for data processing pipelinesTest strategies for data processing pipelines
Test strategies for data processing pipelinesLars Albertsson
 
[246] foursquare데이터라이프사이클 설현준
[246] foursquare데이터라이프사이클 설현준[246] foursquare데이터라이프사이클 설현준
[246] foursquare데이터라이프사이클 설현준NAVER D2
 
Large scale data processing pipelines at trivago
Large scale data processing pipelines at trivago Large scale data processing pipelines at trivago
Large scale data processing pipelines at trivago Clemens Valiente
 
Building a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSBuilding a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSAmazon Web Services
 
[225]yarn 기반의 deep learning application cluster 구축 김제민
[225]yarn 기반의 deep learning application cluster 구축 김제민[225]yarn 기반의 deep learning application cluster 구축 김제민
[225]yarn 기반의 deep learning application cluster 구축 김제민NAVER D2
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Stormviirya
 
[115] clean fe development_윤지수
[115] clean fe development_윤지수[115] clean fe development_윤지수
[115] clean fe development_윤지수NAVER D2
 
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...Brandon O'Brien
 
[211]대규모 시스템 시각화 현동석김광림
[211]대규모 시스템 시각화 현동석김광림[211]대규모 시스템 시각화 현동석김광림
[211]대규모 시스템 시각화 현동석김광림NAVER D2
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkDataWorks Summit
 
[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민
[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민
[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민NAVER D2
 
[112]rest에서 graph ql과 relay로 갈아타기 이정우
[112]rest에서 graph ql과 relay로 갈아타기 이정우[112]rest에서 graph ql과 relay로 갈아타기 이정우
[112]rest에서 graph ql과 relay로 갈아타기 이정우NAVER D2
 
[236] 카카오의데이터파이프라인 윤도영
[236] 카카오의데이터파이프라인 윤도영[236] 카카오의데이터파이프라인 윤도영
[236] 카카오의데이터파이프라인 윤도영NAVER D2
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakHakka Labs
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkGuido Schmutz
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Spark Summit
 

Viewers also liked (18)

TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, S...
TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, S...TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, S...
TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, S...
 
Test strategies for data processing pipelines
Test strategies for data processing pipelinesTest strategies for data processing pipelines
Test strategies for data processing pipelines
 
[246] foursquare데이터라이프사이클 설현준
[246] foursquare데이터라이프사이클 설현준[246] foursquare데이터라이프사이클 설현준
[246] foursquare데이터라이프사이클 설현준
 
Large scale data processing pipelines at trivago
Large scale data processing pipelines at trivago Large scale data processing pipelines at trivago
Large scale data processing pipelines at trivago
 
Building a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSBuilding a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWS
 
[225]yarn 기반의 deep learning application cluster 구축 김제민
[225]yarn 기반의 deep learning application cluster 구축 김제민[225]yarn 기반의 deep learning application cluster 구축 김제민
[225]yarn 기반의 deep learning application cluster 구축 김제민
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
 
[115] clean fe development_윤지수
[115] clean fe development_윤지수[115] clean fe development_윤지수
[115] clean fe development_윤지수
 
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
 
[211]대규모 시스템 시각화 현동석김광림
[211]대규모 시스템 시각화 현동석김광림[211]대규모 시스템 시각화 현동석김광림
[211]대규모 시스템 시각화 현동석김광림
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민
[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민
[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민
 
[112]rest에서 graph ql과 relay로 갈아타기 이정우
[112]rest에서 graph ql과 relay로 갈아타기 이정우[112]rest에서 graph ql과 relay로 갈아타기 이정우
[112]rest에서 graph ql과 relay로 갈아타기 이정우
 
[236] 카카오의데이터파이프라인 윤도영
[236] 카카오의데이터파이프라인 윤도영[236] 카카오의데이터파이프라인 윤도영
[236] 카카오의데이터파이프라인 윤도영
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
 

Similar to Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka and Cassandra

Intro to R and H2O with Spencer Aiello
Intro to R and H2O with Spencer AielloIntro to R and H2O with Spencer Aiello
Intro to R and H2O with Spencer AielloSri Ambati
 
Docker Container As A Service - Mix-IT 2016
Docker Container As A Service - Mix-IT 2016Docker Container As A Service - Mix-IT 2016
Docker Container As A Service - Mix-IT 2016Patrick Chanezon
 
MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local Austin 2018: MongoDB Ops Manager + KubernetesMongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local Austin 2018: MongoDB Ops Manager + KubernetesMongoDB
 
'DOCKER' & CLOUD: ENABLERS For DEVOPS
'DOCKER' & CLOUD:  ENABLERS For DEVOPS'DOCKER' & CLOUD:  ENABLERS For DEVOPS
'DOCKER' & CLOUD: ENABLERS For DEVOPSACA IT-Solutions
 
Docker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-ITDocker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-ITStijn Wijndaele
 
Building a data warehouse with Pentaho and Docker
Building a data warehouse with Pentaho and DockerBuilding a data warehouse with Pentaho and Docker
Building a data warehouse with Pentaho and DockerWellington Marinho
 
MongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + KubernetesMongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + KubernetesMongoDB
 
Docker engine - Indroduc
Docker engine - IndroducDocker engine - Indroduc
Docker engine - IndroducAl Gifari
 
Getting started with Docker sandboxes for MariaDB
Getting started with Docker sandboxes for MariaDBGetting started with Docker sandboxes for MariaDB
Getting started with Docker sandboxes for MariaDBMariaDB plc
 
Dayta AI Seminar - Kubernetes, Docker and AI on Cloud
Dayta AI Seminar - Kubernetes, Docker and AI on CloudDayta AI Seminar - Kubernetes, Docker and AI on Cloud
Dayta AI Seminar - Kubernetes, Docker and AI on CloudJung-Hong Kim
 
Cloud-native .NET Microservices mit Kubernetes
Cloud-native .NET Microservices mit KubernetesCloud-native .NET Microservices mit Kubernetes
Cloud-native .NET Microservices mit KubernetesQAware GmbH
 
The App Developer's Kubernetes Toolbox
The App Developer's Kubernetes ToolboxThe App Developer's Kubernetes Toolbox
The App Developer's Kubernetes ToolboxNebulaworks
 
betterCode Workshop: Effizientes DevOps-Tooling mit Go
betterCode Workshop:  Effizientes DevOps-Tooling mit GobetterCode Workshop:  Effizientes DevOps-Tooling mit Go
betterCode Workshop: Effizientes DevOps-Tooling mit GoQAware GmbH
 
Into The Box 2018 Going live with commandbox and docker
Into The Box 2018 Going live with commandbox and dockerInto The Box 2018 Going live with commandbox and docker
Into The Box 2018 Going live with commandbox and dockerOrtus Solutions, Corp
 
Going live with BommandBox and docker Into The Box 2018
Going live with BommandBox and docker Into The Box 2018Going live with BommandBox and docker Into The Box 2018
Going live with BommandBox and docker Into The Box 2018Ortus Solutions, Corp
 
Containers as a Service with Docker
Containers as a Service with DockerContainers as a Service with Docker
Containers as a Service with DockerDocker, Inc.
 
Docker Container As A Service - March 2016
Docker Container As A Service - March 2016Docker Container As A Service - March 2016
Docker Container As A Service - March 2016Patrick Chanezon
 
Docker Enterprise Workshop - Technical
Docker Enterprise Workshop - TechnicalDocker Enterprise Workshop - Technical
Docker Enterprise Workshop - TechnicalPatrick Chanezon
 
Deploying applications to Windows Server 2016 and Windows Containers
Deploying applications to Windows Server 2016 and Windows ContainersDeploying applications to Windows Server 2016 and Windows Containers
Deploying applications to Windows Server 2016 and Windows ContainersBen Hall
 
Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019Radulescu Adina-Valentina
 

Similar to Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka and Cassandra (20)

Intro to R and H2O with Spencer Aiello
Intro to R and H2O with Spencer AielloIntro to R and H2O with Spencer Aiello
Intro to R and H2O with Spencer Aiello
 
Docker Container As A Service - Mix-IT 2016
Docker Container As A Service - Mix-IT 2016Docker Container As A Service - Mix-IT 2016
Docker Container As A Service - Mix-IT 2016
 
MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local Austin 2018: MongoDB Ops Manager + KubernetesMongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
 
'DOCKER' & CLOUD: ENABLERS For DEVOPS
'DOCKER' & CLOUD:  ENABLERS For DEVOPS'DOCKER' & CLOUD:  ENABLERS For DEVOPS
'DOCKER' & CLOUD: ENABLERS For DEVOPS
 
Docker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-ITDocker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-IT
 
Building a data warehouse with Pentaho and Docker
Building a data warehouse with Pentaho and DockerBuilding a data warehouse with Pentaho and Docker
Building a data warehouse with Pentaho and Docker
 
MongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + KubernetesMongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
 
Docker engine - Indroduc
Docker engine - IndroducDocker engine - Indroduc
Docker engine - Indroduc
 
Getting started with Docker sandboxes for MariaDB
Getting started with Docker sandboxes for MariaDBGetting started with Docker sandboxes for MariaDB
Getting started with Docker sandboxes for MariaDB
 
Dayta AI Seminar - Kubernetes, Docker and AI on Cloud
Dayta AI Seminar - Kubernetes, Docker and AI on CloudDayta AI Seminar - Kubernetes, Docker and AI on Cloud
Dayta AI Seminar - Kubernetes, Docker and AI on Cloud
 
Cloud-native .NET Microservices mit Kubernetes
Cloud-native .NET Microservices mit KubernetesCloud-native .NET Microservices mit Kubernetes
Cloud-native .NET Microservices mit Kubernetes
 
The App Developer's Kubernetes Toolbox
The App Developer's Kubernetes ToolboxThe App Developer's Kubernetes Toolbox
The App Developer's Kubernetes Toolbox
 
betterCode Workshop: Effizientes DevOps-Tooling mit Go
betterCode Workshop:  Effizientes DevOps-Tooling mit GobetterCode Workshop:  Effizientes DevOps-Tooling mit Go
betterCode Workshop: Effizientes DevOps-Tooling mit Go
 
Into The Box 2018 Going live with commandbox and docker
Into The Box 2018 Going live with commandbox and dockerInto The Box 2018 Going live with commandbox and docker
Into The Box 2018 Going live with commandbox and docker
 
Going live with BommandBox and docker Into The Box 2018
Going live with BommandBox and docker Into The Box 2018Going live with BommandBox and docker Into The Box 2018
Going live with BommandBox and docker Into The Box 2018
 
Containers as a Service with Docker
Containers as a Service with DockerContainers as a Service with Docker
Containers as a Service with Docker
 
Docker Container As A Service - March 2016
Docker Container As A Service - March 2016Docker Container As A Service - March 2016
Docker Container As A Service - March 2016
 
Docker Enterprise Workshop - Technical
Docker Enterprise Workshop - TechnicalDocker Enterprise Workshop - Technical
Docker Enterprise Workshop - Technical
 
Deploying applications to Windows Server 2016 and Windows Containers
Deploying applications to Windows Server 2016 and Windows ContainersDeploying applications to Windows Server 2016 and Windows Containers
Deploying applications to Windows Server 2016 and Windows Containers
 
Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019
 

Recently uploaded

2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 

Recently uploaded (20)

2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 

Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka and Cassandra

  • 1. Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka and Cassandra Roberto G. Hashioka – 2016-10-04 – TIAD – Paris
  • 2. Personal Information • Roberto Gandolfo Hashioka • @rogaha (Github) e @rhashioka (Twitter) • Finance -> Software Engineer • Growth & Data Engineer at Docker
  • 3. Summary • Background / Motivation • Project Goals • How to build it? • DEMO
  • 4. Background • Gather of data from multiple sources and process them in “real-time” • Transform raw data into meaningful and useful information used to enable more effective decision-making process • Provide more visibility into trends on: 1) user behavior 2) feature engagement 3) opportunities for future investments • Data transparency and standardization
  • 5. Project Goals • Create a data processing pipeline that can handle a huge amount of events per second • Automate the development environment — Docker compose. • Automate the remote machines management — Docker for AWS / Machine. • Reduce the time to market / time to development — New hires / new features.
  • 7. How to build it? • Step 1: Install Docker for Mac/Win and dockerize all the applications link: https://www.docker.com/products/docker
  • 8. Exemplo de Dockerfile ----------------------------------------------------------------------------------------------------------- FROM ubuntu:14.04 MAINTAINER Roberto Hashioka (roberto@docker.com) RUN apt-get update && apt-get install -y nginx RUN echo “Hello World! #TIAD” > /usr/share/nginx/html/index.html EXPOSE 80 ------------------------------------------------------------------------------------------------------------ $ docker build –t rogaha/web_demotiad2016 . $ docker run –d –p 80:80 –-name web_demotiad2016 rogaha/web_demotiad2016
  • 9. How to build it? • Step 2: Define your services stack with a docker-compose file
  • 10. Docker Compose containers: web: build: . command: python app.py ports: - "5000:5000" volumes: - .:/code links: - redis environment: - PYTHONUNBUFFERED=1 redis: image: redis:latest command: redis-server --appendonly yes
  • 11. How to build it? • Step 3: Test the applications locally from your laptop using containers
  • 13. How to build it? • Step 4: Provision your remote servers and deploy your containers
  • 15. How to build it? • Step 5: Scale your services with Docker swarm
  • 17. Open Source Projects Used • Docker (https://github.com/docker/docker) • An open platform for distributed applications for developers and sysadmins • Apache Spark / Spark SQL (https://github.com/apache/spark) • A fast, in-memory data processing engine. Spark SQL lets you query structured data as a resilient distributed dataset (RDD) • Apache Kafka (https://github.com/apache/kafka) • A fast and scalable pub-sub messaging service • Apache Zookeeper (https://github.com/apache/zookeeper) • A distributed configuration service, synchronization service, and naming registry for large distributed systems • Apache Cassandra (https://github.com/apache/cassandra) • Scalable, high-available and distributed columnar NoSQL database • D3 (https://github.com/mbostock/d3) • A JavaScript visualization library for HTML and SVG.