More Related Content Similar to IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire (20) More from In-Memory Computing Summit (20) IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire2. 2© Copyright 2015 Pivotal. All rights reserved. 2© Copyright 2015 Pivotal. All rights reserved.
Evolution of Pivotal Gemfire
Which way might the "Apache Way” take It?
Milind Bhandarkar
milind@ampool.io
CEO, Ampool Inc.
@techmilind
Roman Shaposhnik
rvs@apache.org
Director of Open Source, Pivotal Inc.
@rhatr
3. 3© Copyright 2015 Pivotal. All rights reserved.
Topics
Ÿ GemFire history, architecture, and use cases
Ÿ Geode as the open source core of GemFire
Ÿ Requirements of the modern data infrastructure
Ÿ Butterfly architecture
Ÿ Geode as an engine for in-memory data exchange layer
4. 4© Copyright 2015 Pivotal. All rights reserved.
It’s not the size of
DATAit’s how you use it!
5. 5© Copyright 2015 Pivotal. All rights reserved.
2004 2008 2014
• Massive increase in data
volumes
• Falling margins per
transaction
• Increasing cost of IT
maintenance
• Need for elasticity in
systems
• Financial Services
Providers (every major
Wall Street bank)
• Department of Defense
• Real Time response needs
• Time to market constraints
• Need for flexible data
models across enterprise
• Distributed development
• Persistence + In-memory
• Global data visibility needs
• Fast Ingest needs for data
• Need to allow devices to
hook into enterprise data
• Always on
• Largest travel Portal
• Airlines
• Trade clearing
• Online gambling
• Largest Telcos
• Large mfrers
• Largest Payroll processor
• Auto insurance giants
• Largest rail systems on
earth
Our GemFire Journey Over The Years
6. 6© Copyright 2015 Pivotal. All rights reserved.
Apps at Scale Have Unique Needs
Pivotal GemFire is the distributed, in-memory database for
apps that need:
1. Elastic scale-out performance
2. High performance database capabilities in distributed systems
3. Mission critical availability and resiliency
4. Flexibility for developers to create unique applications
5. Easy administration of distributed data grids
7. 7© Copyright 2015 Pivotal. All rights reserved.
1. Elastic Scale-Out Performance
Nodes
Ops /
SecLinear scalability
Elastic capacity +/-
Latency-minimizing
data distribution
China Railway
Corporation
“The system is operating with solid
performance and uptime. Now, we have a
reliable, economically sound production
system that supports record volumes and
has room to grow”
Dr. Jiansheng Zhu, Vice Director of China Academy
of Railway Sciences
• 4.5 million ticket purchases & 20
million users per day.
• Spikes of 15,000 tickets sold per
minute, 40,000 visits per second.
8. 8© Copyright 2015 Pivotal. All rights reserved.
2. High Performance Database Capabilities
Performance-
optimized
persistence
Configurable
consistency Partitioned Replicated Disabled
Distributed &
continuous queries
Distributed transactions, indexing, archival
Newedge
“Our global deployment of Pivotal
GemFire’s distributed cache gives me a
single version of the trade – resolving
hard-to-test-for synchronization issues
that exist within any globally distributed
business application architecture”
Michael Benillouche, Global Head of Data
Management
9. 9© Copyright 2015 Pivotal. All rights reserved.
3. Mission Critical Availability and Resiliency
Cluster to cluster
WAN connectivity
Cluster resilience
& failover
XX Gire
“We can track and collect money at our
4,000+ kiosks and branches – even without
a reliable Internet connection. GemFire
provides the core data grid and a significant
amount of related functionality to help us
handle this unreliable network problem”
Gustavo Valdez, Chief of Architecture and
Development
• 19 million payment transactions per
month
• 4000+ points of sale with intermittent
Internet connectivity
10. 10© Copyright 2015 Pivotal. All rights reserved.
4. Flexibility for Developers to Create Unique
Apps
• Data Structures:
– User-defined classes
– Documents (JSON)
• Native language support:
– Java, C++, C#
• API’s
– Java: Hashmap
– Memcache
– Spring Data GemFire
– C++ Serialization API’s
• Embedded query authoring
– Object Query Language (OQL)
• Publish & subscribe framework for
continous query & reliable
asynchronous event queues
• App-server embedded functionality:
– Web app session state caching
– L2 Hibernate
11. 11© Copyright 2015 Pivotal. All rights reserved.
What makes it fast?
Ÿ Design center is RAM not HDD
Ÿ Really demanding customers
Ÿ Avoid and minimize, particularly on the critical read/write paths:
– Network hops, copying data, contention, distributed locking, disk seeks, garbage
Ÿ Lots and lots of testing
– Establish and monitor performance baselines
– Distributed systems testing is difficult!
12. 12© Copyright 2015 Pivotal. All rights reserved.
Horizontal Scaling for GemFire (Geode) Reads
With Consistent Latency and CPU
• Scaled from 256 clients and 2 servers to 1280 clients and 10 servers
• Partitioned region with redundancy and 1K data size
0
2
4
6
8
10
12
14
16
18
0
1
2
3
4
5
6
2 4 6 8 10
Speedup
Server
Hosts
speedup
latency
(ms)
CPU
%
13. 13© Copyright 2015 Pivotal. All rights reserved.
GemFire (Geode) 3.5-4.5X Faster Than Cassandra
for YCSB
14. 14© Copyright 2015 Pivotal. All rights reserved. 14© Copyright 2015 Pivotal. All rights reserved.
Apache Geode
(incubating)
The open source core of GemFire
15. 15© Copyright 2015 Pivotal. All rights reserved.
Geode Will Be A Significant Apache Project
Ÿ Over a 1000 person years invested into cutting edge R&D
Ÿ Thousands of production customers in very demanding verticals
Ÿ Cutting edge use cases that have shaped product thinking
Ÿ Tens of thousands of distributed , scaled up tests that can randomize
every aspect of the product
Ÿ A core technology team that has stayed together since founding
Ÿ Performance differentiators that are baked into every aspect of the
product
16. 16© Copyright 2015 Pivotal. All rights reserved.
Geode or GemFire?
Ÿ Geode is a project, GemFire is a product
Ÿ We donated everything but the kitchen sink*
Ÿ More code drops imminent; going forward all development
happens OSS-style (“The Apache Way”)
* Multi-site WAN replication, continuous queries, and native (C/C++) client driver
18. 18© Copyright 2015 Pivotal. All rights reserved.
Why OSS? Why Now? Why Apache?
Ÿ Open Source Software is fundamentally changing buying patterns
– Developers have to endorse product selection (No longer CIO handshake)
– Open source credentials attract the best developers
– Open Source has replaced standards
Ÿ Align with the tides of history
– Customers increasingly asking to participate in product development
– Allow product development to happen with full transparency
Ÿ Apache Way
– “Community over code”
– Use cases far beyond Pivotal Gemfire’s
19. 19© Copyright 2015 Pivotal. All rights reserved. 19© Copyright 2015 Pivotal. All rights reserved.
Beyond Gemfire’s core
20. 20© Copyright 2015 Pivotal. All rights reserved.
Roadmap
Ÿ HDFS persistence
Ÿ Off-heap storage
Ÿ Lucene indexes
Ÿ Spark integration
Ÿ Cloud Foundry service
…and other ideas from the Geode community!
21. 21© Copyright 2015 Pivotal. All rights reserved. 21© Copyright 2015 Pivotal. All rights reserved.
Geode in modern data
infrastructure
22. 22© Copyright 2015 Pivotal. All rights reserved.
Infrastructure is increasingly Scale-Out
25. 25© Copyright 2015 Pivotal. All rights reserved.
One-size-fits-all Data Platform Era is Over
26. 26© Copyright 2015 Pivotal. All rights reserved.
Analytics moving from Batch to Real-Time
27. 27© Copyright 2015 Pivotal. All rights reserved. 27© Copyright 2015 Pivotal. All rights reserved.
What happened?
34. 34© Copyright 2015 Pivotal. All rights reserved.
And, then…
HDFS
ASF Projects FLOSS Projects Pivotal Products
MapReduce
35. 35© Copyright 2015 Pivotal. All rights reserved.
In a blink of an eye…
HDFS
Pig
Sqoop Flume
Coordination and
workflow
management
Zookeeper
Ambari &
Command Center
GemFire XD
Oozie
MapReduce
Hive
Tez
Giraph
Hadoop UI
Hue
SolrCloud
Phoenix
HBase
Crunch Mahout
Spark
Shark
Streaming
MLib
GraphX
HAWQ
SpringXD
MADlib
Hamster
PivotalR
YARN
ASF Projects FLOSS Projects Pivotal Products
42. 42© Copyright 2015 Pivotal. All rights reserved. 42© Copyright 2015 Pivotal. All rights reserved.
The missing building block
43. 43© Copyright 2015 Pivotal. All rights reserved.
Sharing operational data at the speed of RAM
In-Memory Data Exchange Layer
HDFS Isilon…backend archival stores…
Spark HAWQ…frontend processing frameworks…
44. 44© Copyright 2015 Pivotal. All rights reserved.
Spark’s view on how to lock you in
In-Memory Data Exchange Layer
HDFS Isilon…backend archival stores…
HAWQ…frontend processing frameworks…Spark
Spark
45. 45© Copyright 2015 Pivotal. All rights reserved.
HDFS’s view on how to lock you in
In-Memory Data Exchange Layer
HDFS Isilon…backend archival stores…
HAWQ…frontend processing frameworks…Spark
HDFS
46. 46© Copyright 2015 Pivotal. All rights reserved. 46© Copyright 2015 Pivotal. All rights reserved.
How is open source solving this?
47. 47© Copyright 2015 Pivotal. All rights reserved.
Short list of open source contenders
Ÿ Tachyon
Ÿ Infinispan
Ÿ Apache Ignite (incubating)
Ÿ Apache Geode (incubating)
48. 48© Copyright 2015 Pivotal. All rights reserved.
Short list of open source contenders
Ÿ Tachyon
Ÿ Infinispan
Ÿ Apache Ignite (incubating)
Ÿ Apache Geode (incubating)
49. 49© Copyright 2015 Pivotal. All rights reserved.
Geode’s secret sauce
Ÿ Community!
Ÿ Maturity
Ÿ Scalability
Ÿ Building blocks
50. 50© Copyright 2015 Pivotal. All rights reserved. 50© Copyright 2015 Pivotal. All rights reserved.
A few key building blocks
• PDX Serialization
• Asynchronous Events
51. 51© Copyright 2015 Pivotal. All rights reserved.
Fixed or flexible schema?
id name age pet_id
{
id : 1,
name : “Fred”,
age : 42,
pet : {
name : “Barney”,
type : “dino”
}
}
OR
52. 52© Copyright 2015 Pivotal. All rights reserved.
But how to serialize data?
http://blog.pivotal.io/pivotal/products/data-serialization-how-to-run-multiple-big-data-apps-at-once-with-gemfire
53. 53© Copyright 2015 Pivotal. All rights reserved.
Portable Data eXchange
C#, C++, Java, JSON
| header | data |
| pdx | length | dsid | typeId | fields | offsets |
54. 54© Copyright 2015 Pivotal. All rights reserved.
Efficient for queries
SELECT p.name from /Person p WHERE p.pet.type = “dino”
{
id : 1,
name : “Fred”,
age : 42,
pet : {
name : “Barney”,
type : “dino”
}
}
single field
deserialization
55. 55© Copyright 2015 Pivotal. All rights reserved.
Easy to use
Ÿ Access from Java, C#, C++, JSON
Ÿ Domain objects not required
Ÿ Automatic type definition
– No IDL compiler or schema required
– No hand-coded read/write methods
56. 56© Copyright 2015 Pivotal. All rights reserved.
Asynchronous Events – Design Goals
Ÿ High availability
Ÿ Low latency, high throughput
Ÿ Deliver events to a receiver without impacting the write path
57. 57© Copyright 2015 Pivotal. All rights reserved.
Questions?
Ÿ http://geode.incubator.apache.org
Ÿ dev@geode.incubator.apache.org
Ÿ user@geode.incubator.apache.org
Ÿ http://github.com/apache/incubator-geode
58. 58© Copyright 2015 Pivotal. All rights reserved. 58© Copyright 2013 Pivotal. All rights reserved.
59. 59© Copyright 2015 Pivotal. All rights reserved. 59© Copyright 2013 Pivotal. All rights reserved.
Bonus Content
60. 60© Copyright 2015 Pivotal. All rights reserved. 60© Copyright 2013 Pivotal. All rights reserved.
PDX
61. 61© Copyright 2015 Pivotal. All rights reserved.
Distributed type registry
Member A Member B
Distributed Type Definitions
Person p1 = …
region.put(“Fred”, p1);
62. 62© Copyright 2015 Pivotal. All rights reserved.
Distributed type registry
Member A Member B
Distributed Type Definitions
Person p1 = …
region.put(“Fred”, p1);
automatic
definition
63. 63© Copyright 2015 Pivotal. All rights reserved.
Distributed type registry
Member A Member B
Distributed Type Definitions
Person p1 = …
region.put(“Fred”, p1);
automatic
definition
64. 64© Copyright 2015 Pivotal. All rights reserved.
Distributed type registry
Member A Member B
Distributed Type Definitions
Person p1 = …
region.put(“Fred”, p1);
automatic
definition
replicate serialized
data containing typeId
65. 65© Copyright 2015 Pivotal. All rights reserved.
Schema evolution
PDX provides forwards and backwards
compatibility, no code required
Member A Member B
Distributed Type Definitions
v1 v2
Application
#2
Application
#1
v2 objects preserve data
from missing fields
v1 objects use default values
to fill in new fields
66. 66© Copyright 2015 Pivotal. All rights reserved. 66© Copyright 2013 Pivotal. All rights reserved.
Asynchronous Events
67. 67© Copyright 2015 Pivotal. All rights reserved.
Member 3
Member 1
Serial Queues
Member 2
LOL!!put
Primary Queue
Secondary Queue
Enqueue
sup
LOL!!
sup
sup
LOL!!
Replicate
68. 68© Copyright 2015 Pivotal. All rights reserved.
Member 3
Member 1
Member 2
Serial Queues
sup
LOL!!
sup
LOL!!
AsyncEventListener
{
processBatch()
}
LOL!!
sup
Dispatch Events
from Primary
69. 69© Copyright 2015 Pivotal. All rights reserved.
put
Secondary Queue
(Partition 1)
LOL!!
sup
sup
LOL!!
Primary Queue
(Partition 2)
Parallel Queues
Primary Queue
(Partition 1)
LOL!!
sup
sup
LOL!!
Word
Word
70. 70© Copyright 2015 Pivotal. All rights reserved.
LOL!!
sup
sup
LOL!!
Parallel Queues
LOL!!
sup
sup
LOL!!
Word
Word
AsyncEventListener
{
processBatch()
}
AsyncEventListener
{
processBatch()
}
Parallel
Dispatch
71. 71© Copyright 2015 Pivotal. All rights reserved.
“There are only two hard things in Computer Science: cache
invalidation, naming things, and off-by-one errors.”
– the internet
72. 72© Copyright 2015 Pivotal. All rights reserved.
Client driver
Ÿ Intelligent – understands data distribution for single hop
network access
Ÿ Caching – can be configured to locally cache data for even
faster access
Ÿ Events – registerInterest() in keys to receive push
notifications
73. 73© Copyright 2015 Pivotal. All rights reserved.
Client subscriptions
Ÿ Useful for refreshing client cache, pushing events (“topics”)
to multiple clients
Ÿ Highly available & scalable via in-memory replicated queues
Ÿ Events are ordered, at-least-once delivery
Ÿ Durable subscriptions, conflation optional
74. 74© Copyright 2015 Pivotal. All rights reserved.
Client subscriptions
Member A Member B
region.registerInterest(“Fred”);
Client
Client
75. 75© Copyright 2015 Pivotal. All rights reserved.
Client subscriptions
Member A Member B
Person p1 = …
region.put(“Fred”, p1);
region.registerInterest(“Fred”);
Client
Client
76. 76© Copyright 2015 Pivotal. All rights reserved.
Client subscriptions
Member A Member B
Person p1 = …
region.put(“Fred”, p1);
region.registerInterest(“Fred”);
Client
Client
77. 77© Copyright 2015 Pivotal. All rights reserved.
Client subscriptions
Member A Member B
Person p1 = …
region.put(“Fred”, p1);
region.registerInterest(“Fred”);
Client
Client
78. 78© Copyright 2015 Pivotal. All rights reserved.
Client subscriptions
Member A Member B
Person p1 = …
region.put(“Fred”, p1);
region.registerInterest(“Fred”);
Client
Client
79. 79© Copyright 2015 Pivotal. All rights reserved. 79© Copyright 2013 Pivotal. All rights reserved.
Use cases and patterns
80. 80© Copyright 2015 Pivotal. All rights reserved.
“Low touch” Usage Patterns
Simple template for TCServer, TC, App servers
Shared nothing persistence, Global session state
HTTP Session management
Set Cache in hibernate.cfg.xml
Support for query and entity caching
Hibernate L2 Cache plugin
Servers understand the memcached wire protocol
Use any memcached client
Memcached protocol
<bean id="cacheManager"
class="org.springframework.data.gemfire.support.GemfireCacheManager"Spring Cache Abstraction
81. 81© Copyright 2015 Pivotal. All rights reserved.
Application Patterns
Ÿ Caching for speed and scale
– Read-through, Write-through, Write-behind
Ÿ Geode as the OLTP system of record
– Data in-memory for low latency, on disk for durability
Ÿ Parallel compute engine
Ÿ Real-time analytics
82. 82© Copyright 2015 Pivotal. All rights reserved.
Development Patterns
Ÿ Configure the cluster programmatically or declaratively
using cache.xml, gfsh, or SpringDataGemFire
<cache>
<region
name="turbineSensorData"
refid="PARTITION_PERSISTENT">
<partition-‐attributes
redundant-‐copies="1"
total-‐num-‐buckets="43"/>
</region>
</cache>
83. 83© Copyright 2015 Pivotal. All rights reserved.
Development Patterns
Ÿ Write key-value data into a Region using { create | put |
putAll | remove }
– The value can be flat data, nested objects, JSON, …
Region
sensorData
=
cache.getRegion("TurbineSensorData");
SensorKey
key
=
new
SensorKey(31415926,
"2013-‐05-‐19T19:22Z");
//
turbineId,
timestamp
sensorData.put(key,
new
TurbineReading()
.setAmbientTemp(75)
.setOperatingTemp(80)
.setWindDirection(0)
.setWindSpeed(30)
.setPowerOutput(5000)
.setRPM(5));
84. 84© Copyright 2015 Pivotal. All rights reserved.
Development Patterns
Ÿ Read values from a Region by key using { get | getAll }
TurbineReading
data
=
sensorData.get(new
SensorKey(31415926,"2013-‐05-‐19T19:22Z"));
Ÿ Query values using OQL
//
finds
all
sensor
readings
for
the
given
turbine
SELECT
*
from
/TurbineSensorData.entrySet
WHERE
key.turbineId
=
31415926
//
finds
all
sensor
readings
where
the
operating
temp
exceeds
a
threshold
SELECT
*
from
/TurbineSensorData.entrySet
WHERE
value.operatingTemp
>
120
Ÿ Apply indexes to optimize queries
85. 85© Copyright 2015 Pivotal. All rights reserved.
Development Patterns
Ÿ Execute functions to operate on local data in parallel
Ÿ Respond to updates using CacheListeners and Events
Ÿ Automatic redundancy, partitioning, distribution, consistency,
network partition detection & recovery, load balancing, …