With tremendous growth in big data, low latency and high throughput is the key ask for many big data application. The in-memory technology market is growing rapidly. We see that traditional database vendors are extending their platform to support in-memory capability and others are offering in-memory data grid and NoSQL solutions for high performance and scalability. In this talk, we will share our point of view on In-Memory Data Grid and NoSQL technology. It is all about how to build architecture that meets low latency and high throughput requirements. We will share our thoughts and experiences in implementing the use cases that demands low latency & high throughput with inherent scale-out features.
You will learn about how in-memory data grid and NoSQL is used to meet the low latency and high throughput needs and choosing in-memory technology that is good fit for your use case.
Anomaly detection and data imputation within time series
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid, In-Memory Data Fabric and NoSQL DB
1. DEMYSTIFYING IN-MEMORY DATA GRID, IN-
MEMORY DATA FABRIC AND NOSQL DB
PRADEEP NAIK
See all the presentations from the In-Memory Computing
Summit at http://imcsummit.org
2. SPEAKER INTRODUCTION
Pradeep Naik,
Principal Consultant, CTO Office, Wipro Technologies
Responsible for incubating next generation technology @ Wipro
20+ years of experience in Database Management
Cross-domain IT Solution Architecting and Managing database
solutions in Telecom, Healthcare and Financial Industries
3. ABOUT WIPRO
** Global leader in the Software & Services category, Member of Dow Jones Sustainability World Indices – 5th year in a row
Ranked 8th in the Best Companies for Leaders 2015 list in a study conducted by Chally Group
Honored as a World’s Most Ethical Companies by Ethisphere Institute for the Fourth Successive Year, 2015
Attracts the
Best Talent
Sustained
Growth
Partner to
Industry
Leaders
Global
Leaders
Global
Presence
$7.6 Bn 1071* No.1** 100+161,789*
Revenue
in FY 2014-15
(IT Services – $ 7.06 Bn,
IT Products – 0.55 Bn)
Active Global Clients Workforce
Leader in Software &
Services industry category
Serving clients in
100+ countries
*Figures based on Q1 results 2015–16 for Global IT Services business
5. WHAT BUSINESS DEMANDS?
High Throughput
Low Latency
Big Data Use
case
High Throughput Low Latency
Data
Ingestion
Data Storage
Data
Processing
Data Access
Data Analytics
Data
Visualization
6. TRADE OFF BETWEEN HIGH THROUGHPUT AND LOW
LATENCY
High Throughput Low Latency
In-memory
computing
Localized
Processing
Parallel
Computing
Stream
Processing
Parallel
Computing
Localized
processing
Eventual
Consistency
Auto scaling
7. IN-MEMORY WORLD
In-Memory Database (IMDB)
NoSQL
In-Memory Data Grid (IMDG)
In-Memory Data Fabric (IMDF)
Operation
al
Analytical HTAP
8. IN-MEMORY DATABASE (IMDB)
Architecture of In-Memory Database
Good ANSI SQL Support
Strong support for ACID transactions
Lack of co-location processing
Based on vertically scalable Symmetrical
Processing Architecture
Does not support distributed computing
Minimal application changes for upgrading to
IMDB
Cannot work directly with domain objects. Users
need to perform Object-To-Relational Mapping
which typically adds significant performance
overhead
Unit of movement is Data and not the Process
Source: Oracle Times Ten
9. NOSQL
Architecture of NoSQL
A distributed data storage with in-memory option
Most commonly used for high throughput requirements
Read latency in the range of millisecond to seconds
Low latency data access is achieved by caching the
table/document in memory
Data is always stored in disk and can be configured to cache in
memory
Achieves high availability via replication mechanism
Tunable consistency (Eventual and Immediate)
Limitation on size of table that can be cached
NoSQL Cluster
10. IN-MEMORY DATA GRID (IMDG)
Architecture of In-Memory Data Grid IMDG is a data structure that completely resides in
memory and distributed across multiple severs
Fault tolerance – uses master-master or master-slave
topology
Varied variety of data structures supported to map
domain objects
Distributed computing - Collocate processing to cluster
node where data is cached
Distributed Concurrency- Supports distributed
transaction locking
Persistence- Supports seamless synchronous read
through, write through or asynchronous write-behind to
other data sources
Support applications with low latency requirements
Database server
Application
Servers
Node1
Memory from each servers
K1,V
1
K2,V
2
K3,V
3
K4,V
4
Node2 Node3 Node4
In-Memory Data Grid
Scale horizontally
Data
Write-through or
Write behind
Read-through
persistence
11. IN-MEMORY DATA FABRIC (IMDF)
Application
Servers
Data/ Task/Query
Data Grid Compute Grid
Streaming CEP
Map Reduce
Hadoop Accl.
In-Memory Data Fabric
Distributed Cluster
Messaging
File SystemNoSQL/RDBMS
IMDF is a comprehensive in-memory data platform that
includes data grid, clustering, compute grid, Complex
Event Processing and real-time streaming
It is a superset of IMDG
Supports standard SQL for querying in-memory data
including support for distributed SQL joins.
Distribute computations and data processing across
multiple computers in a cluster in order to gain high
performance and low latency
Support multiple execution paths for same events
executing in parallel on one or more nodes
Works on underlying concept of MPP architecture
Converged platform to support multiple use cases
Architecture of In-Memory Data Fabric
12. TECHNOLOGY EVALUATION CRITERIA
High Throughput vs. Low Latency
High Availability
Scalability (Vertical vs. Horizontal)
Distributed disk based data storage vs.
distributed in-memory storage
Co-location of data processing
Distributed transactional ACID support
Eventual Consistency vs. Strong Consistency
Application change impacts
Reuse existing database technology stack vs.
migrate to new databases?
Platform support for in-memory computation and
storage
Support for flexible Data Structure
13. SUMMARIZING THE DIFFERENCES
Strong ACID Support
Structured Data
ANSI SQL Support
Low latency performance
Mixed workload
Leverage existing database Stack
Flexible Data Structure (document/columnar/key-
value)
Distributed data storage for high volume of data
Data Caching for low read latency (ms)
Varied variety of data structure
Distributed in-memory data store
Scalable and Fault tolerant need
Low latency performance (ns to ms )
High performance distributed processing
Data persistence for high availability
Converged platform to support multiple use cases
High performance distributed parallel processing
Distributed in-memory data store
Co-located data processing
Accelerate Hadoop ecosystem
In-Memory Database NoSQL
In-Memory Data Grid In-Memory Data Fabric
IMDB
IMDG
IMDF
NoSQL
14. IN-MEMORY SOLUTIONS – KEY PLAYERS
Technology
Vendors
In-Memory Database In-Memory Data Grid
In-Memory Data Fabric NoSQL
16. SEARCH ENGINE OPTIMIZATION
E-commerce online retail store dealing with the SEO across
different browser, countries and languages are implemented
usually using XML sitemap. Few search engines such as Baidu
and Yandex does not support the XML sitemap implementation
of “hreflang” which limits use of sitemap implementation of tag to
address multiple countries/languages
Does not consider the localization causing incorrect search
results
Yandex only supports the on-page “hreflang” tags
Problem Statement
17. ECOMMERCE SEARCH ENGINE OPTIMIZATION
The data in the sitemap XML needs to be populated in the head of pages for both canonical
urls and alternate URLs
For low latency access consider caching data from DB. The data in DB is not structured and
needs to be transformed and aggregated to minimize the size of the cache.
Use MapReduce framework for the data transformation and aggregation. The challenge was
refreshing the cache within 4 hours to avoid having stale data.
Design Decision
<link rel="alternate" hreflang="en-us" href=“https://webstore.online.com/us”>
<link rel="alternate" hreflang="en-mx" href=“https://webstore.online.com/mx”>
18. ECOMMERCE SEARCH ENGINE OPTIMIZATION
A. Store the data into NoSQL database such as MongoDB, run periodic map
reduce jobs for transformation and aggregation before the data getting
refreshed in the database. The challenge was to complete the Map
Reduce within 4 hours to avoid getting the stale data
B. Supplement data store with some sort of caching layer to read the data
from native memory rather from the disk. The solution was to use IMDB. ,
but then we need to upgrade the tech stack to implement the Map Reduce
jobs on the underlying IMDB.
C. Store the data in IMDG which can also facilitate the Map Reduce
framework, so that the Map Reduce jobs can be achieved in the stipulated
time
19. ECOMMERCE SEARCH ENGINE OPTIMIZATION
Implemented solution using IMDG such as Hazelcast
Two data center with 5 node each
Data is stored and reduced in the native memory using the Map Reduce
feature set/API(s)
Configured near cache option to fetch the data locally from the server
where the URLs is served.