In this session, we will talk about two of the most promising incubating open source Projects, Apache Apex & Apache Geode and how together they attempt to solve shortcomings of existing big data analytics platforms.
Project Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream processing as well as batch processing. Apex processes big data-in-motion in a highly scalable, highly performant, fault-tolerant, stateful, secure, distributed, and an easily operable way.
Apache Geode provides a database-like consistency model, reliable transaction processing and a shared-nothing architecture to maintain very low latency performance with high concurrency processing.
We will also look at some use cases where how these two projects can be used together to form distributed, fault tolerant, reliable In memory data processing layer.
5. Directed Acyclic Graph (DAG)
Application Programming Model
• A Stream is a sequence of data tuples
• An Operator takes one or more input streams,performs computations & emits one or more output streams
• Each Operator is YOUR custom business logic in java, or built-in operator from our open source library
• Operator has many instances that run in parallel and each instance in single-threaded
• DirectedAcyclic Graph (DAG) is made up of operators and streams
Output StreamTuple Tuple
er
Operator
er
Operator
er
Operator
er
Operator
Application Programming Model
12. What are IMDG?
• IMDGs host data in memory and distribute it across a cluster of commodity servers
• The main access pattern is key/value access, MapReduce, various forms of HPC-like processing,
and a limited distributed querying and indexing capabilities.
Why they are important?
• Performance – using RAM is faster than using disk.
• Extremely High availability of data - by keeping it in memory and in highly distributed cluster.
• Data Structure – using a key/value store allows greater flexibility for the application developer.
object store similar in interface to a typical concurrent hash map.
• Scalable Data Partitioning
• Transactional ACID support
In Memory Data Grid - IMDG
15. Geode Features
Core Features
• Linear scalability & latency miniming data distribution
• Performance optimized persistence - High availability & durability
• Configurable consistency - region types { partitioned, replicated & local }
• Distributed transactions
• Cluster resilience & failover
Advanced Features
• Server Function Execution - Send computation to data
• Asynchronous Events - Deliver events to a receiver without impacting the
write path
• Continues Queries & Client subscriptions - Useful for refreshing client
cache
16. Geode Features
Core Features
• Linear scalability & latency miniming data distribution
• Performance optimized persistence - High availability & durability
• Configurable consistency - region types { partitioned, replicated & local }
• Distributed transactions
• Cluster resilience & failover
Advanced Features
• Server Function Execution - Send computation to data
• Asynchronous Events - Deliver events to a receiver without impacting the
write path
• Continues Queries & Client subscriptions - Useful for refreshing client
cache
17. Ÿ Caching for speed and scale
– Read-through, Write-through, Write-behind
Ÿ Geode as the OLTP system of record
– Data in-memory for low latency, on disk for durability
Ÿ Parallel compute engine
Ÿ Real-time analytics
Application Patterns