Apache Ignite is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies.
2. Š 2015 The Apache Software Foundation. Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are trademarks of The Apache Software Foundation.
In-Memory Computing
Platform
Built on
MANDHIR GIDDA
EMEA Solutions Architect
3. Agenda
⢠GridGain & Apache Ignite Project
⢠Ignite In-Memory Computing Platform
⢠Introduction to Clustering
⢠Data Grid
⢠Compute Grid, Service Grid & Streaming
⢠Hadoop & Spark Integration
⢠GridGain Roadmap
⢠Q & A
4. Š 2016 GridGain Systems, Inc.
What is Apache Ignite
High-performance distributed in-memory platform
for computing and transacting on large-scale data
sets in near real-time.
5. Š 2016 GridGain Systems, Inc.
Apache Ignite Project - Recap
⢠2007: First version of GridGain (compute
grid)
⢠Oct. 2014: GridGain contributes Ignite to
ASF
⢠Aug. 2015: Ignite is the second fastest
project to graduate after Spark
⢠Today vs. Feb. 2016:
⢠28 more contributors: 88+ contributors
⢠Huge development momentum -
Estimated 248 years of effort since the
first commit in February, 2014 vs 192 year
last Feb. [Openhub]
⢠200k more SLOC and 2.5k more commits:
900k+ SLOC & more than 18.5k commits
February 2017
6. Š 2016 GridGain Systems, Inc.
⢠What is GridGain Enterprise Edition?
⢠Is a binary build of Apache Ignite⢠created by GridGain
⢠Added enterprise features for enterprise deployments
⢠Earlier features and bug fixes by a few weeks
⢠Heavily tested
7. Š 2016 GridGain Systems, Inc.
Customer Use Cases
Automated Trading Systems
Real time analysis of trading positions & market
risk. High volume transactions, ultra low latencies.
Financial Services
Fraud Detection, Risk Analysis, Insurance rating
and modelling.
Online & Mobile Advertising
Stream processing, geo-targeting & personalisation
& segmentation
Big Data/Visual Analytics
Customer 360 view, real-time analysis of KPIs, up-
to-the-second operational BI.
Online Gaming
Real-time back-ends for mobile and massively
parallel games.
SaaS Platforms & Apps
High performance next-generation architectures for
Software as a Service Application vendors.
Travel & E-Commerce
High performance next-generation architectures for
online hotel booking.
8. Š 2016 GridGain Systems, Inc.
What In-Memory Capabilities are Supported?
⣠HPC
⣠Machine learning
⣠Risk analysis
⣠Grid computing
⣠HA API Services
⣠Scalable
Middleware
⣠Web-session
clustering
⣠Distributed caching
⣠In-Memory SQL
⣠Real-time Analytics
⣠Big Data
⣠Monitoring tools
⣠Big Data
⣠Realtime Analytics
⣠Batch processing
⣠Distributed In-
Memory File
System
⣠Node2Node &
Topic-based
Messaging
⣠Fault Tolerance
⣠Multiple backups
⣠Cluster groups
⣠Auto Rebalancing
⣠Complex event
processing
⣠Event driven
design
⣠Distributed queues
⣠Atomic variables
⣠Dist. Semaphore
10. Š 2016 GridGain Systems, Inc.
Definitions and Terminology
An Ignite cluster is a group of Ignite nodes
working together to accomplish tasks like
distributed compute and caching
An Ignite node is a single Ignite process
running in a JVM
Many Ignite nodes can live on one physical
server or JVM
Ignite nodes can be Clients or Servers .
Nodes can be named and logically grouped
into Cluster Groups
Server/VM/Container Server/VM/Container
JVM
Ignite
JVM
Ignite
Ignite
JVM
Ignite
âŚ
11. Š 2016 GridGain Systems, Inc.
Shared-nothing architecture involves multiple
identical nodes forming a cluster with no single
master or coordinator
All nodes in a shared-nothing cluster run the
exact same processes
Nodes communicate using message passing as
Peer to Peer
Unlike Master-Slave, No single point of failure or
bottleneck = Linearly scalable & highly available
Operational efficiency at scale
Ignite Clustering
Server/VM/Container
JVM
Ignite
Server/VM/Container
JVM
Ignite
Server/VM/Container
JVM
Ignite
Server/VM/Container
JVM
Ignite
12. Š 2016 GridGain Systems, Inc.
An Ignite node can be started as a client
or a server.
Server nodes participate in caching and
computations. Client nodes can also
participate in computations.
Client nodes are used for IgniteAPI
operations from the client side such as
cache operations, issuing SQL,
transactions, and data streaming.
Ignite Clients & Servers
Server Server
Server
Server
ClientClient Client
Data &
Compute
Nodes
Client
Connectors
15. Š 2016 GridGain Systems, Inc.
Data Grid: Cache Modes & Horizontal Scaling
Replicated Cache
All data replicated to each node
Highest availability topology
Every data update propagated to all
nodes, scaling can become an issue
Best for scenarios where dataset is
small, and high read activity > 80%
16. Š 2016 GridGain Systems, Inc.
Data Grid: Cache Modes & Horizontal Scaling
Partitioned Cache
Most scalable distributed topology
Dataset is divided into partitions, and
distributed equally amongst nodes (x TB)
Updates are cheap, 1x Primary partition and
optionally 1+ backup partition
Reads expensive if data is not collocated
Near-Cache use is more relevant
Best for large datasets, frequent updates
17. Š 2016 GridGain Systems, Inc.
⢠Vertical Scale
â OnHeap, OffHeap, OffHeap_Values,
Swap Space
⢠Avoid Java GC Collection
Pauses
⢠Off-Heap Indexes
â OffHeap, OffHeap_Values
⢠Full RAM Utilization
⢠I/O + Network Efficiency
⢠OffHeap to disk or network -> zero
copies. OS optimized
⢠Simple Configuration
Data Grid: Off-Heap Memory
18. Š 2016 GridGain Systems, Inc.
Data Grid: External Persistence
⢠Read-through & Write-
through
⢠Support for Write-behind
⢠Configurable eviction policies
⢠LRU, FIFO, Sorted, TTL
⢠DB schema mapping wizard:
⢠Generates all the XML configuration
and Java POJOs
19. Š 2016 GridGain Systems, Inc.
Data Grid: Cache APIs
⢠Predicate-based Scan Queries
⢠Text Queries based on Lucene
indexing
⢠Query configuration using annotations,
Spring XML or simple Java code
⢠SQL Queries: ANSI-99 Compliant
⢠Memcached (PHP, Java, Python,
Ruby)
⢠HTTP REST API
⢠JDBC & ODBC
20. Š 2016 GridGain Systems, Inc.
⢠ANSI-99 SQL
⢠In-Memory Indexes (On and Off-
Heap)
⢠Automatic Group By,
Aggregations, Sorting
⢠Cross-Cache Joins, Unions
⢠Use local H2 engine
Data Grid: SQL Support (ANSI 99)
21. Š 2017 GridGain Systems, Inc. GridGain Company Confidential
In-Memory SQL Grid (aka. IMDB)
⢠JDBC and ODBC as a connection point
⢠SQL for data access
⢠DML for Data Modification
⢠DDL for Cache and Index Management
⢠Advantages
â GridGain as a distributed SQL based storage
â No need to rewrite application logic
â Support of variety of tools and languages
â Ad-hoc queries (GeoSpatial)
⢠Available in Apache Ignite (open source)
Application Layer
GridGain Cluster
ANSI SQL-99 DML DDL
ODBC/JDBC
ACID Transactions
Data Layer
Persistent Storage
22. Š 2016 GridGain Systems, Inc.
Data Grid: Transactions
⢠Fully ACID
⢠Support for Transactional & Atomic
⢠Cross-cache transactions
⢠Optimistic and Pessimistic
concurrency modes with multiple
isolation levels
⢠Deadlock protection (Serializable)
⢠JTA Integration
23. Š 2016 GridGain Systems, Inc.
Distributed Java Structures
Use of java.util.Concurrent
⢠Distributed Map (cache)
⢠Distributed Set
⢠Distributed Queue
⢠CountDownLatch
⢠AtomicLong
⢠AtomicSequence
⢠AtomicReference
⢠Distributed ExecutorService
24. Š 2016 GridGain Systems, Inc.
Continuous Queries
⢠Execute a query and get
notified on data changes
captured in the filter
⢠Remote filter to evaluate
event and local listener to
receive notification
⢠Guarantees exactly once
delivery of an event
25. Š 2016 GridGain Systems, Inc.
⢠IgniteDataStreamer - Collocation &
Indexing, millions of objects/sec
⢠StreamReceiver/StreamVisitor
⢠Sliding Windows for
CEP/Continuous Query,
⢠JMS, Kafka, MQTT, Flume, Camel
data streamer integrations
⢠Real-time visual analytics/BI
Streaming and CEP
26. Š 2016 GridGain Systems, Inc.
⢠Create chains of event processors & transform an object through various states
⢠Synchronous or asynchronous execution of remote filters & listeners with thread
control
Payment Validator Payment Verifier Payment Processor
Ignite Cache
Event Processing using Ignite (SEDA)
28. Š 2016 GridGain Systems, Inc.
Data Grid: Tiered Memory & Local Store
⢠Tiered Memory
⢠On-Heap -> Off-Heap -> Swap
(volatile)
⢠Persistent On-Disk Store
⢠Fast Recovery
⢠Local Data Reload
⢠Eliminate Network and Db impacts
when reloading in-memory store
29. Š 2017 GridGain Systems, Inc. GridGain Company Confidential
Ultimate Edition
GridGain Disk Storage
30. Š 2017 GridGain Systems, Inc. GridGain Company Confidential
Current GridGain Model
Application Layer
Mobile Cloud/SaaS Social IoT Enterprise Applications
Data Layer
RDBMS NoSQL Hadoop
In-Memory Computing Layer
Data Grid Compute Grid Streaming Service Grid ANSI SQL-99 File System Spark Integration
Unified API
ACID Transactions
31. Š 2017 GridGain Systems, Inc. GridGain Company Confidential
Current GridGain Model
Application Layer
Mobile Cloud/SaaS Social IoT Enterprise Applications
Data Layer
RDBMS NoSQL Hadoop
In-Memory Computing Layer
Data Grid Compute Grid Streaming Service Grid ANSI SQL-99 File System DML/DDL Spark Integration
Unified API
ACID Transactions
Data Layer
Disk Storage
32. Š 2017 GridGain Systems, Inc. GridGain Company Confidential
⢠All the features of Enterprise Edition
Including
â Persistent Data Store
â Ability to query memory and disk
together
â Instantaneous Restarts
â Full and Incremental Cluster
Snapshots
â Point-In-Time Recovery
⢠Special license is required
Ultimate Edition: Features Set
33. Š 2016 GridGain Systems, Inc.
⢠Multiple (up to 32) Data Centres
⢠Complex Replication
Technologies
⢠Active-Active & Active-Passive
⢠Smart Conflict Resolution
⢠Durable Persistent Queues
⢠Automatic Throttling
⢠GridGain Enterprise
Data Grid: DC Replication
34. Š 2016 GridGain Systems, Inc.
In-Memory Compute Grid & the rest
35. Š 2016 GridGain Systems, Inc.
Client-Server vs. Affinity Colocation
1
2
4
3 Data 1
Job 1
2
3
Data 2
Job 2
Processing
Node 1
Processing
Node 2
Client
Node
Data
Node 1
Data
Node 2
Processing
Node 1
1
3
4
Data 1
Data 2
2
2
1. Initial Request
2. Fetch data from remote
nodes
3. Process entire data-set
4. Return to client
1. Initial Request
2. Co-locating processing with data
3. Return partial result
4. Reduce & return to client
36. Š 2016 GridGain Systems, Inc.
⢠Direct API for MapReduce
(ComputeTask, map(), result(), reduce() )
⢠Cron-like Task Scheduling
⢠State Checkpoints
⢠Load Balancing
⢠Round-robin
⢠Random & weighted
⢠Probe
⢠Automatic Failover - FailoverSPI
⢠Per-node Shared State
⢠Zero Deployment
⢠P2P distributed class loading
⢠Inter-node bytecode transfer
In-Memory Compute Grid
38. Š 2016 GridGain Systems, Inc.
Messaging & Events
⢠Topic-based messaging
⢠Ordered & Unordered messages
⢠Local & Remote message
listeners
⢠Local & Remote event listeners
⢠Trigger actions from any cluster
events or operations
⢠Query events via IgniteEvents
API
⢠Full Events logging not best practice
39. Š 2016 GridGain Systems, Inc.
⢠Deploy arbitrary user-defined services
on cluster
⢠Control (SLA) how many service
instances are deployed on each cluster
or node
⢠Automatically ensure deployment and
fault tolerance
⢠Singletons on the Cluster
â Cluster Singleton
â Node Singleton
â Key Singleton
⢠Guaranteed Availability
â Auto Redeployment in Case of
Failures
In-Memory Service Grid
40. Š 2016 GridGain Systems, Inc.
⢠Resilience - Build an in-memory
resilient service layer between
your client application and the
grid
⢠Shielding- Only expose
application APIs and not direct
grid APIs
⢠Continuations - Call services
internally via compute tasks to
create service chains
In-Memory Service Grid
46. Š 2016 GridGain Systems, Inc.
Thank You!
www.gridgain.com
https://ignite.apache.org
@gridgain
#gridgain
Thank you for joining us. Follow the
conversation.
Author: Mandhir Gidda
Hinweis der Redaktion
Collection of logical in-memory components that solve high performance and scalability challenges
Collection of logical in-memory components that solve high performance and scalability challenges
Collection of logical in-memory components that solve high performance and scalability challenges
application level soft-locking using versioning
deadlock protection
When it comes to querying and acting on data â including in Big Data/Fast Data environments â SQL still dominates. And no other database-agnostic in-memory solution handles SQL functionality like the Apache Ignite In-Memory Data Fabric.
application level soft-locking using versioning
deadlock protection (Serializable)
Apache Ignite allows for most of the data structures from the java.util.concurrent framework to be used in a distributed fashion
Collection of logical in-memory components that solve high performance and scalability challenges
ComputeTask with service injection
Direct API for Fork/Join
Collection of logical in-memory components that solve high performance and scalability challenges
Direct API for Fork/Join
Affinity run of spark jobs
- supports multiple infrastructure including Google Cloud, AWS & Docker