2. 2
About me
• Vice President, Products and Strategy @ GigaSpaces
• (ex) Director of Solutions Architecture
• Blogging at http://blog.gigaspaces.com
• @ahodroj
• Email: ali@gigaspaces.com
• Slides at http://slideshare.com/ahodroj
4. 4
Do we need to bridge
online transaction
processing with real-time
operational intelligence?
5. 5
Modern applications: the line is blurred between…
Transactional Analytical
Essential to operate the
business
Turning data into value:
insights, diagnosis, decision
making
&
14. In-Memory Computing 101
Distribute Cache
Partitioned cache
nodes
In-Memory Data
Grid
Scale-out system
of record
Increased Capacity
No support for write-heavy scenarios
Limited to ID-based reads
Reads are the only part latency path
In-Memory Database
Scale-up system of record
15. Heavy Read/Write – sharded/partitioned architecture
Horizontally scalable on commodity HW (or cloud)
Serves as system of record with querying & transaction
semantics
Requires modifying your application’s data access layer
Distribute Cache
Partitioned cache
nodes
In-Memory Data
Grid
Scale-out system
of record
In-Memory Database
Scale-up system of record
In-Memory Computing 101
16. Read/Write Scalability
Drop-in SQL database replacement
Often lacks horizontal scalability (Joins)
Requires replacing your database
Distribute Cache
Partitioned cache
nodes
In-Memory Data
Grid
Scale-out system
of record
In-Memory Database
Scale-up system of record
In-Memory Computing 101
30. 30
● Nope: Your data sources and applications are
often distributed.
● In-Memory or not, these databases aren’t
built for horizontal scale-out
Approach Challenge
Just an IMDB Thing….
Shove it all in one “Big Iron”?
31. 31
● Not when your apps requires polyglot
analytics
● Unless you want to write ML algorithms, MDX
engines…etc from scratch
Approach Challenge
One large In-
Memory
Data Grid to
Rule them
all?
32. 32
What we needed
Low-latency Scale-Out In-
Memory Data Grid
Large-scale distributed
analytics framework
Maximize Data-
Analytics Locality
Minimize
Application Latency
33. 33
Our approach to HTAP
Low-latency Scale-Out
In-Memory Data Grid
Large-scale distributed
analytics framework
+
41. 41
• List of parent RDDs – Empty
• An array of partitions that a dataset is divided to – IMDG Distributed Query
to get partitions and their hosts
• A compute function to do a computation on partitions – Iterator over portion
of data
• Optional preferred locations, i.e. hosts for a partition where the data will be
loaded – hosts from Distributed Query
Data Grid RDD: resilient distributed dataset
42. 42
node 1
Spark executor
Data Grid RDD: one-to-one partition
Spark
Partition
#1
Grid
Partition #1
Direct
connection
Simple, but
not enough
parallelism
for Spark
node 2
Spark executor
Spark
Partition
#2
Grid
Partition #2
node 3
Spark executor
Spark
Partition
#3
Grid
Partition #3
44. 44
Grid DataFrames: predicates pushdown & columns pruning
Aggregation in
Spark
Filtering and
columns pruning
in Data Grid
SELECT SUM(amount)
FROM order
WHERE city = ‘NY’ AND year > 2012
Spark SQL architecture:
• Pushing down predicates to Data Grid
• Leveraging indexes
• Transparent to user
• Enabling support for other languages - Python/R
Implementing DataSource API
54. 5454
In-Process HTAP
Read any POJO, JSON
Document, or
Transaction as a
DataFrame or RDD
Web services/apps can read
any DataFrame as POJO
True closed-loop analytics data pipeline
@SpaceClass
public class Product
{
private String name;
private String brand;
private Integer
quantity;
// …
}
55. 5555
In-Memory Data Grid
Realtime Replication
• Scoring models
• Trigger actions
• Events
Transactions Analytics
Point of Decision HTAP XAP + InsightEdge deployed on
different grid clusters with bi-
directional real-time data replication
56. 5656
Case Study: Fleet Geo-analytics
Challenge
• Stream data from 1,000s of Taxis
• Actively monitor and generate real-time notifications
• Real-time Route Optimization and Geo-Fencing
Solution
• Leverage unified in-memory data fabric as middleware for
geo-spatial analytics
• Elastically scale stream processing and transactional apps
together
• Location-based tracking, Geo-fencing
Edge components
Data Sources