IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid, In-Memory Data Fabric and NoSQL DB

DEMYSTIFYING IN-MEMORY DATA GRID, IN-
MEMORY DATA FABRIC AND NOSQL DB
PRADEEP NAIK
See all the presentations from the In-Memory Computing
Summit at http://imcsummit.org

SPEAKER INTRODUCTION
Pradeep Naik,
Principal Consultant, CTO Office, Wipro Technologies
 Responsible for incubating next generation technology @ Wipro
 20+ years of experience in Database Management
 Cross-domain IT Solution Architecting and Managing database
solutions in Telecom, Healthcare and Financial Industries

ABOUT WIPRO
** Global leader in the Software & Services category, Member of Dow Jones Sustainability World Indices – 5th year in a row
Ranked 8th in the Best Companies for Leaders 2015 list in a study conducted by Chally Group
Honored as a World’s Most Ethical Companies by Ethisphere Institute for the Fourth Successive Year, 2015
Attracts the
Best Talent
Sustained
Growth
Partner to
Industry
Leaders
Global
Leaders
Global
Presence
$7.6 Bn 1071* No.1** 100+161,789*
Revenue
in FY 2014-15
(IT Services – $ 7.06 Bn,
IT Products – 0.55 Bn)
Active Global Clients Workforce
Leader in Software &
Services industry category
Serving clients in
100+ countries
*Figures based on Q1 results 2015–16 for Global IT Services business

IN-MEMORY
TECHNOLOGY
OVERVIEW, CONCEPTS AND KEY
DIFFERENCES

WHAT BUSINESS DEMANDS?
 High Throughput
 Low Latency
Big Data Use
case
High Throughput Low Latency
Data
Ingestion
Data Storage
Data
Processing
Data Access
Data Analytics
Data
Visualization

TRADE OFF BETWEEN HIGH THROUGHPUT AND LOW
LATENCY
High Throughput Low Latency
In-memory
computing
Localized
Processing
Parallel
Computing
Stream
Processing
Parallel
Computing
Localized
processing
Eventual
Consistency
Auto scaling

IN-MEMORY WORLD
 In-Memory Database (IMDB)
 NoSQL
 In-Memory Data Grid (IMDG)
 In-Memory Data Fabric (IMDF)
Operation
al
Analytical HTAP

IN-MEMORY DATABASE (IMDB)
Architecture of In-Memory Database
 Good ANSI SQL Support
 Strong support for ACID transactions
 Lack of co-location processing
 Based on vertically scalable Symmetrical
Processing Architecture
 Does not support distributed computing
 Minimal application changes for upgrading to
IMDB
 Cannot work directly with domain objects. Users
need to perform Object-To-Relational Mapping
which typically adds significant performance
overhead
 Unit of movement is Data and not the Process
Source: Oracle Times Ten

NOSQL
Architecture of NoSQL
 A distributed data storage with in-memory option
 Most commonly used for high throughput requirements
 Read latency in the range of millisecond to seconds
 Low latency data access is achieved by caching the
table/document in memory
 Data is always stored in disk and can be configured to cache in
memory
 Achieves high availability via replication mechanism
 Tunable consistency (Eventual and Immediate)
 Limitation on size of table that can be cached
NoSQL Cluster

IN-MEMORY DATA GRID (IMDG)
Architecture of In-Memory Data Grid  IMDG is a data structure that completely resides in
memory and distributed across multiple severs
 Fault tolerance – uses master-master or master-slave
topology
 Varied variety of data structures supported to map
domain objects
 Distributed computing - Collocate processing to cluster
node where data is cached
 Distributed Concurrency- Supports distributed
transaction locking
 Persistence- Supports seamless synchronous read
through, write through or asynchronous write-behind to
other data sources
 Support applications with low latency requirements
Database server
Application
Servers
Node1
Memory from each servers
K1,V
1
K2,V
2
K3,V
3
K4,V
4
Node2 Node3 Node4
In-Memory Data Grid
Scale horizontally
Data
Write-through or
Write behind
Read-through
persistence

IN-MEMORY DATA FABRIC (IMDF)
Application
Servers
Data/ Task/Query
Data Grid Compute Grid
Streaming CEP
Map Reduce
Hadoop Accl.
In-Memory Data Fabric
Distributed Cluster
Messaging
File SystemNoSQL/RDBMS
 IMDF is a comprehensive in-memory data platform that
includes data grid, clustering, compute grid, Complex
Event Processing and real-time streaming
 It is a superset of IMDG
 Supports standard SQL for querying in-memory data
including support for distributed SQL joins.
 Distribute computations and data processing across
multiple computers in a cluster in order to gain high
performance and low latency
 Support multiple execution paths for same events
executing in parallel on one or more nodes
 Works on underlying concept of MPP architecture
 Converged platform to support multiple use cases
Architecture of In-Memory Data Fabric

TECHNOLOGY EVALUATION CRITERIA
 High Throughput vs. Low Latency
 High Availability
 Scalability (Vertical vs. Horizontal)
 Distributed disk based data storage vs.
distributed in-memory storage
 Co-location of data processing
 Distributed transactional ACID support
 Eventual Consistency vs. Strong Consistency
 Application change impacts
 Reuse existing database technology stack vs.
migrate to new databases?
 Platform support for in-memory computation and
storage
 Support for flexible Data Structure

SUMMARIZING THE DIFFERENCES
Strong ACID Support
Structured Data
ANSI SQL Support
Low latency performance
Mixed workload
Leverage existing database Stack
Flexible Data Structure (document/columnar/key-
value)
Distributed data storage for high volume of data
Data Caching for low read latency (ms)
Varied variety of data structure
Distributed in-memory data store
Scalable and Fault tolerant need
Low latency performance (ns to ms )
High performance distributed processing
Data persistence for high availability
Converged platform to support multiple use cases
High performance distributed parallel processing
Distributed in-memory data store
Co-located data processing
Accelerate Hadoop ecosystem
In-Memory Database NoSQL
In-Memory Data Grid In-Memory Data Fabric
IMDB
IMDG
IMDF
NoSQL

IN-MEMORY SOLUTIONS – KEY PLAYERS
Technology
Vendors
In-Memory Database In-Memory Data Grid
In-Memory Data Fabric NoSQL

CASE STUDY
EASY TO ADD THE LOGO AND TEXT

SEARCH ENGINE OPTIMIZATION
 E-commerce online retail store dealing with the SEO across
different browser, countries and languages are implemented
usually using XML sitemap. Few search engines such as Baidu
and Yandex does not support the XML sitemap implementation
of “hreflang” which limits use of sitemap implementation of tag to
address multiple countries/languages
 Does not consider the localization causing incorrect search
results
 Yandex only supports the on-page “hreflang” tags
Problem Statement

ECOMMERCE SEARCH ENGINE OPTIMIZATION
 The data in the sitemap XML needs to be populated in the head of pages for both canonical
urls and alternate URLs
 For low latency access consider caching data from DB. The data in DB is not structured and
needs to be transformed and aggregated to minimize the size of the cache.
 Use MapReduce framework for the data transformation and aggregation. The challenge was
refreshing the cache within 4 hours to avoid having stale data.
Design Decision
<link rel="alternate" hreflang="en-us" href=“https://webstore.online.com/us”>
<link rel="alternate" hreflang="en-mx" href=“https://webstore.online.com/mx”>

A. Store the data into NoSQL database such as MongoDB, run periodic map
reduce jobs for transformation and aggregation before the data getting
refreshed in the database. The challenge was to complete the Map
Reduce within 4 hours to avoid getting the stale data
B. Supplement data store with some sort of caching layer to read the data
from native memory rather from the disk. The solution was to use IMDB. ,
but then we need to upgrade the tech stack to implement the Map Reduce
jobs on the underlying IMDB.
C. Store the data in IMDG which can also facilitate the Map Reduce
framework, so that the Map Reduce jobs can be achieved in the stipulated
time

 Implemented solution using IMDG such as Hazelcast
 Two data center with 5 node each
 Data is stored and reduced in the native memory using the Map Reduce
feature set/API(s)
 Configured near cache option to fetch the data locally from the server
where the URLs is served.

THANK YOU
PRESENTER
Pradeep Naik (pradeep.naik@wipro.com)
TEAM MEMBERS:
Viresh Kumar,
Chandra Sekar K.R

IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid, In-Memory Data Fabric and NoSQL DB

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid, In-Memory Data Fabric and NoSQL DB

Ähnlich wie IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid, In-Memory Data Fabric and NoSQL DB (20)

Mehr von In-Memory Computing Summit

Mehr von In-Memory Computing Summit (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid, In-Memory Data Fabric and NoSQL DB