IAC 2024 - IA Fast Track to Search Focused AI Solutions
Data / Streaming / Microservices Platform with Devops
1. Data / Streaming / Microservices Platform
with DevOps
Kidong Lee
mykidong@gmail.com
2. Typical User Behavior Event Processing Platform
Collection
Collector
Data Bus
Unified Log
Store
Batch
Stream
Data
Workflow
Sink
- Hive
- Tez
- Kafka Connect
Management
Service
API
Platform
Monitoring
- InfluxDB
- JMXTrans
- Kafka Elasticsearch
- Netty
- Grafana
- Coda hale Metrics
- Spark
- SparkMLLib
Admin
- Tomcat
HDFS
Parquet
Interactive
Query
- Drill
- Vert.x
Resource
Management
- Nomad
- YARN
Configuration
Management
- Ansible
Service
Discovery
- Consul Scheduler - Azkaban
Streaming - Kafka Streams
gRPC
3. eCommerce Recommendation Service:
User Events(PageView, Cart, Order Event etc) from the commerce site, collected, realtime-
processed and batch-processed with recommendation algorithm to get recommended items.
Algorithm: Collaboriative Filtering, Item Similarity, etc are used.
Automated Keyword Search Bidding Service:
User Events(PageView, Contact, Cart, Order, KeywordSearch Event etc) from the ad site, collected,
realtime-processed and batch-processed to get conversions and conversion values.
Typical Services
4. Collection Layer:
Collect User Events and validate Invalid Messages.
Push the events to Unified Log System, Kafka.
Stream Layer:
In Steaming, Kafka Streams converts Json Events to Avro messages which will be sent to another topics in Kafka.
In Sink, the converted avro messages from the topics are saved as parquet onto HDFS and Elasticsearch with Kafka Connect.
Batch Layer:
Spark processes Parquet data to build data model.
Final results will be loaded onto Elasticsearch to expose API.
Service Layer:
API exposes the results from Elasticsearch.
Admin calls API via gRPC.
Store Layer:
HDFS saves all the data as Parquet.
Elasticsearch saves the final results.
Management Layer:
Monitoring, Service Discovery, Configuration Management, Resource Management, Scheduler, etc.
Platform Layers
5. DevOps Perspecitve of this Platform
Data Platform
Hadoop
DevOps
Jenkins as CI / CD
Ansible as Configuration
Management
Nexus as Docker Private
Registry
Spark
Hive
Drill
Streaming Platform
Kafka
Kafka Streams
Kafka Connect
Microservices Platform
Consul as Service Discovery
Nomad as Container Orchestrator
NGINX as Proxy
Docker
Git as Source Control
6. Streaming Platform:
KSQL can be added.
Microservices Platform:
OpenShift Origin can be used as a container orchestrator instead of Nomad.
Istio can be added as a service mesh.
Additional Components in Future
7. DevOps Perspecitve of this Platform in Future
Data Platform
Hadoop
DevOps
Jenkins as CI / CD
Ansible as Configuration
Management
Nexus as Docker Private
Registry
Spark
Hive
Drill
Streaming Platform
Kafka
Kafka Streams
Kafka Connect
Microservices Platform
Istio as Service Mesh
OpenShift Origin as Container
Orchestrator
Docker
Git as Source Control
KSQL