SlideShare a Scribd company logo
1 of 23
Download to read offline
INTRODUCING A NON-VOLATILE GENERIC OBJECT
PROGRAMMING MODEL FOR IN-MEMORY COMPUTING
(SPARK) PERFORMANCE IMPROVEMENT
YANPING WANG
APACHE MNEMONIC (INCUBATING) PROJECT LEADER
YANPINGW@APACHE.ORG
STORAGE MEDIA
EVOLUTION IS
CHANGING FUTURE SW
PROGRAMING
STORAGE MEDIA EVOLUTION
 Storage Media
Evolution: NAND
and 3D XPoint are
filling the Speed
and Capacity
gap between
DRAM and Hard
Disk Drives
 Software is facing
the challenge on
how to use them
STORAGE MEDIA EVOLUTION IS CHANGING FUTURE
SW PROGRAMING
FROM IN-MEMORY TO IN-PLACE
 Moving from In-Memory
Computing to In-Place
Computing
 Mnemonic is a
pioneering project to
extend in-memory
computing to in-place
computing using next
generation NVM
storage media.
(Picture was taken from Ken
Gibson’s Storage Industry Summit
January, 20, 2016)
A NON-VOLATILE
PROGRAMMING MODEL
FOR JAVA APPLICATION
DEVELOPERS
WHY JAVA?
• There are 300+ open source initiatives at the ASF. 221 of them are
Java based, accounts for 59.9%.
• Top Level Big Data Projects are Java based projects, include
Hadoop, CloudStack, Spark, Cassandra, Hive, HBase, Flink, Solr…..
• Java related optimization has significantly contributed to Big Data
applications’ performance improvement.
TRADITIONAL JAVA BASED BIG DATA
FRAMEWORKS AND APPLICATIONS
OUR PROPOSAL: OFF-HEAP, LEAVE DATA IN-PLACE
NON-VOLATILE JAVA OBJECT MODEL
• Provides Java programmers a
mechanism to define objects and
structures as Mnemonic-managed
objects and structures.
• Objects identified as @DurableEntity
store their durable fields in volatile/non-
volatile memory space.
• @DurableGetter and @DurableSetter are
used to access those fields.
• Handlers are used to refer their
connected durable object.
• https://docs.oracle.com/javase/tutorial
/java/annotations/basics.html
AN EXAMPLE OF NON-VOLATILE LINKED LIST
(LAZY LOADING)
APACHE MNEMONIC INFRASTRUCTURE
APACHE MNEMONIC STRUCTURE
• Performance oriented architecture
• Unified platform enabling Framework
• Unique non-volatile object model
• Flexible & extensible focal point for
optimization
MNEMONIC SEARCH EXAMPLE
• Traditional search:
SerDes, caching,
pack/unpack, burst
temp objects
creation and
destruction cost can
be as high as 70-80%
• Mnemonic Search: in-
place creation &
update, no SerDes,
load on demand, no
need to normalize
MNEMONIC PROJECT POSITIONING
• Apache Mnemonic (Incubating) is a
Java based non-volatile memory library
(API) for in-place structured data
processing and computing
• Addresses Big Data performance issues
• High cost of Serialization/De-serialization
• Frequent unpredicted long Garbage Collection
pauses
• Burst temporary object creation/destruction
• Frequent high cost JNI call with stack
marshalling
• System paging and kernel structure updating
due to massive data processing
USE CASE EXAMPLE:
SPARK + MNEMONIC
=IN-PLACE MACHINE
LEARNING COMPUTING
NON-VOLATILE RDD CODE FOR SPARK
• ~600 line of Non-Volatile
RDD code is added into
Spark, to alter data flow
from Spark RDD via JVM
heap to Mnemonic.
• Spill RDDs to off-heap and
disk can be avoid.
• RDD data cache and
local checkpoints can also
be avoided when using
Nonvolatile or Durable
objects.
INTERNAL STRUCTURED DATA REPRESENTATION
 Spark experiment uses: 2D Linked List
data structure to store data in
persistent memory and storage.
 N nodes: represent the first level of
linked list.
 V nodes: represent the values of first
level of linked list that still is a linked
list.
 Only N2 and V1 (methods & volatile
fields) are instantiated on heap.
SPARK MLLIB KMEANS
SPARK MLLIB KMEANS PERFORMANCE IMPROVEMENT
• SPARK ML Kmeans implements one of the most
popular clustering algorithms
• Exp1: spark.storage.memoryFraction 0.6
• Exp2: spark.storage.memoryFraction 0.2
CONFIGURATION FOR PERFORMANCE MEASUREMENT
 Platform Software Configuration
 OS: CentOS 7
 JVM: JDK8 update 60
 ParallelOldGC is used for Spark ML throughput
workload Kmeans
 Spark1.5.0
 Platform Hardware
 Haswell EP (72 cores, E5-2699 v3 @ 2.3GHz)
 256GB DDR4-2133 on-board Memory
 Storage media: SATA SSD
 Device Model: INTEL SSDSC2BB016T4
 Serial Number: BTWD442601311P6HGN
 User Capacity: 1.60 TB
 SATA Version is: SATA 3.0, 6.0 Gb/s
 Spark Kmeans Experiments Setting:
– System memory is reduced to 128G via
static dummy ramfile occupancy
– 20 executors with 4 GB JVM heap, 3 cores
– Both of datasets are stored on SSD
– Experiment 1
– spark.storage.memoryFraction 0.6
– 100,000,000 records with vector in 12
dimension (20 partitions)
– Experiment 2
– spark.storage.memoryFraction 0.2
– 100,000,000 records with vector in 6
dimension (20 partitions)
DISCLAIMERS
Apache Mnemonic is an effort undergoing incubation at The Apache Software Foundation
(ASF), sponsored by Apache Incubator PMC. Incubation is required of all newly accepted
projects until a further review indicates that the infrastructure, communications, and
decision making process have stabilized in a manner consistent with other successful ASF
projects. While incubation status is not necessarily a reflection of the completeness or
stability of the code, it does indicate that the project has yet to be fully endorsed by the
ASF.
While Mnemonic is operating within the time period where the application is processing
their durable data, Mnemonic guarantees the persistency once Mnemonic is closed
gracefully.
PROJECT LINKS
Mnemonic Community: dev@mnemonic.incubator.apache.org
Source Code: https://github.com/apache/incubator-mnemonic
JIRA Tracking: https://issues.apache.org/jira/browse/MNEMONIC
Website: http://mnemonic.incubator.apache.org/
Contact:
yanpingw@apache.org; garyw@apache.org

More Related Content

What's hot

RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-ML
RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-MLRedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-ML
RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-ML
Redis Labs
 
RedisConf17 - Home Depot - Turbo charging existing applications with Redis
RedisConf17 - Home Depot - Turbo charging existing applications with RedisRedisConf17 - Home Depot - Turbo charging existing applications with Redis
RedisConf17 - Home Depot - Turbo charging existing applications with Redis
Redis Labs
 
Aerospike: Maximizing Performance
Aerospike: Maximizing PerformanceAerospike: Maximizing Performance
Aerospike: Maximizing Performance
Aerospike, Inc.
 

What's hot (20)

Powering Interactive Analytics with Alluxio and Presto
Powering Interactive Analytics with Alluxio and PrestoPowering Interactive Analytics with Alluxio and Presto
Powering Interactive Analytics with Alluxio and Presto
 
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
 
IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...
IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...
IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...
 
Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...
Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...
Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla Cluster
 
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
 
RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-ML
RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-MLRedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-ML
RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-ML
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
 
Scylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
Scylla Summit 2019 Keynote - Dor Laor - Beyond CassandraScylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
Scylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
 
RedisConf17 - IoT Backend with Redis and Node.js
RedisConf17 - IoT Backend with Redis and Node.jsRedisConf17 - IoT Backend with Redis and Node.js
RedisConf17 - IoT Backend with Redis and Node.js
 
RedisConf17 - Home Depot - Turbo charging existing applications with Redis
RedisConf17 - Home Depot - Turbo charging existing applications with RedisRedisConf17 - Home Depot - Turbo charging existing applications with Redis
RedisConf17 - Home Depot - Turbo charging existing applications with Redis
 
Aerospike: Maximizing Performance
Aerospike: Maximizing PerformanceAerospike: Maximizing Performance
Aerospike: Maximizing Performance
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
Introduction to SQream and the IoT environment
Introduction to SQream and the IoT environmentIntroduction to SQream and the IoT environment
Introduction to SQream and the IoT environment
 
Scylla Summit 2018: Scylla 3.0 and Beyond
Scylla Summit 2018: Scylla 3.0 and BeyondScylla Summit 2018: Scylla 3.0 and Beyond
Scylla Summit 2018: Scylla 3.0 and Beyond
 
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
 
Webinar: Building Blocks for the Future of Television
Webinar: Building Blocks for the Future of TelevisionWebinar: Building Blocks for the Future of Television
Webinar: Building Blocks for the Future of Television
 
Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0
 
Using Databases and Containers From Development to Deployment
Using Databases and Containers  From Development to DeploymentUsing Databases and Containers  From Development to Deployment
Using Databases and Containers From Development to Deployment
 
Cisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStackCisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStack
 

Viewers also liked

IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X PlatformIMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
In-Memory Computing Summit
 

Viewers also liked (10)

IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
 
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent MemoryIMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
 
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X PlatformIMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
 
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
 
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
 
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
 
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
 
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
 
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
 
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing HubIMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
 

Similar to IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Programming Model for In-Memory Computing (Spark) Performance Improvement

Storage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, WhiptailStorage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, Whiptail
Internet World
 
Deview 2013 rise of the wimpy machines - john mao
Deview 2013   rise of the wimpy machines - john maoDeview 2013   rise of the wimpy machines - john mao
Deview 2013 rise of the wimpy machines - john mao
NAVER D2
 
QCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AIQCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AI
Lex Yu
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red_Hat_Storage
 
IMCSummit 2015 - Day 2 General Session - Flash-Extending In-Memory Computing
IMCSummit 2015 - Day 2 General Session - Flash-Extending In-Memory ComputingIMCSummit 2015 - Day 2 General Session - Flash-Extending In-Memory Computing
IMCSummit 2015 - Day 2 General Session - Flash-Extending In-Memory Computing
In-Memory Computing Summit
 
W jak sposób architektura hipekonwergentna cisco simplivity usprawni działani...
W jak sposób architektura hipekonwergentna cisco simplivity usprawni działani...W jak sposób architektura hipekonwergentna cisco simplivity usprawni działani...
W jak sposób architektura hipekonwergentna cisco simplivity usprawni działani...
Pawel Serwan
 

Similar to IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Programming Model for In-Memory Computing (Spark) Performance Improvement (20)

Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK
 
Boosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesBoosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of Techniques
 
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
 
Ceph
CephCeph
Ceph
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Virtualizing Apache Spark with Justin Murray
Virtualizing Apache Spark with Justin MurrayVirtualizing Apache Spark with Justin Murray
Virtualizing Apache Spark with Justin Murray
 
Storage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, WhiptailStorage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, Whiptail
 
Deview 2013 rise of the wimpy machines - john mao
Deview 2013   rise of the wimpy machines - john maoDeview 2013   rise of the wimpy machines - john mao
Deview 2013 rise of the wimpy machines - john mao
 
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Node Architecture Implications for In-Memory Data Analytics on Scale-in ClustersNode Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
 
Představení produktové řady Oracle SPARC S7
Představení produktové řady Oracle SPARC S7Představení produktové řady Oracle SPARC S7
Představení produktové řady Oracle SPARC S7
 
QCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AIQCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AI
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)
 
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
 
IMCSummit 2015 - Day 2 General Session - Flash-Extending In-Memory Computing
IMCSummit 2015 - Day 2 General Session - Flash-Extending In-Memory ComputingIMCSummit 2015 - Day 2 General Session - Flash-Extending In-Memory Computing
IMCSummit 2015 - Day 2 General Session - Flash-Extending In-Memory Computing
 
W jak sposób architektura hipekonwergentna cisco simplivity usprawni działani...
W jak sposób architektura hipekonwergentna cisco simplivity usprawni działani...W jak sposób architektura hipekonwergentna cisco simplivity usprawni działani...
W jak sposób architektura hipekonwergentna cisco simplivity usprawni działani...
 
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
 

More from In-Memory Computing Summit

IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
In-Memory Computing Summit
 

More from In-Memory Computing Summit (14)

IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise GradeIMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
 
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of StatelessnessIMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
 
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
 
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
 
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
 
IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...
IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...
IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...
 
IMC Summit 2016 Keynote - Robert Barr - In Memory Computing for Financial Ser...
IMC Summit 2016 Keynote - Robert Barr - In Memory Computing for Financial Ser...IMC Summit 2016 Keynote - Robert Barr - In Memory Computing for Financial Ser...
IMC Summit 2016 Keynote - Robert Barr - In Memory Computing for Financial Ser...
 
IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Li...
IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Li...IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Li...
IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Li...
 
IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...
IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...
IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...
 
IMC Summit 2016 Keynote - Jason Stamper - In-Memory: The Foundation of the In...
IMC Summit 2016 Keynote - Jason Stamper - In-Memory: The Foundation of the In...IMC Summit 2016 Keynote - Jason Stamper - In-Memory: The Foundation of the In...
IMC Summit 2016 Keynote - Jason Stamper - In-Memory: The Foundation of the In...
 
IMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory Easy
IMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory EasyIMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory Easy
IMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory Easy
 
IMCSummit 2016 Keynote - Abe Kleinfeld - The In-Memory Computing Landscape: L...
IMCSummit 2016 Keynote - Abe Kleinfeld - The In-Memory Computing Landscape: L...IMCSummit 2016 Keynote - Abe Kleinfeld - The In-Memory Computing Landscape: L...
IMCSummit 2016 Keynote - Abe Kleinfeld - The In-Memory Computing Landscape: L...
 
Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
Accelerating the Hadoop data stack with Apache Ignite, Spark and BigtopAccelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Programming Model for In-Memory Computing (Spark) Performance Improvement

  • 1. INTRODUCING A NON-VOLATILE GENERIC OBJECT PROGRAMMING MODEL FOR IN-MEMORY COMPUTING (SPARK) PERFORMANCE IMPROVEMENT YANPING WANG APACHE MNEMONIC (INCUBATING) PROJECT LEADER YANPINGW@APACHE.ORG
  • 2. STORAGE MEDIA EVOLUTION IS CHANGING FUTURE SW PROGRAMING
  • 3. STORAGE MEDIA EVOLUTION  Storage Media Evolution: NAND and 3D XPoint are filling the Speed and Capacity gap between DRAM and Hard Disk Drives  Software is facing the challenge on how to use them
  • 4. STORAGE MEDIA EVOLUTION IS CHANGING FUTURE SW PROGRAMING
  • 5. FROM IN-MEMORY TO IN-PLACE  Moving from In-Memory Computing to In-Place Computing  Mnemonic is a pioneering project to extend in-memory computing to in-place computing using next generation NVM storage media. (Picture was taken from Ken Gibson’s Storage Industry Summit January, 20, 2016)
  • 6. A NON-VOLATILE PROGRAMMING MODEL FOR JAVA APPLICATION DEVELOPERS
  • 7. WHY JAVA? • There are 300+ open source initiatives at the ASF. 221 of them are Java based, accounts for 59.9%. • Top Level Big Data Projects are Java based projects, include Hadoop, CloudStack, Spark, Cassandra, Hive, HBase, Flink, Solr….. • Java related optimization has significantly contributed to Big Data applications’ performance improvement.
  • 8. TRADITIONAL JAVA BASED BIG DATA FRAMEWORKS AND APPLICATIONS
  • 9. OUR PROPOSAL: OFF-HEAP, LEAVE DATA IN-PLACE
  • 10. NON-VOLATILE JAVA OBJECT MODEL • Provides Java programmers a mechanism to define objects and structures as Mnemonic-managed objects and structures. • Objects identified as @DurableEntity store their durable fields in volatile/non- volatile memory space. • @DurableGetter and @DurableSetter are used to access those fields. • Handlers are used to refer their connected durable object. • https://docs.oracle.com/javase/tutorial /java/annotations/basics.html
  • 11. AN EXAMPLE OF NON-VOLATILE LINKED LIST (LAZY LOADING)
  • 13. APACHE MNEMONIC STRUCTURE • Performance oriented architecture • Unified platform enabling Framework • Unique non-volatile object model • Flexible & extensible focal point for optimization
  • 14. MNEMONIC SEARCH EXAMPLE • Traditional search: SerDes, caching, pack/unpack, burst temp objects creation and destruction cost can be as high as 70-80% • Mnemonic Search: in- place creation & update, no SerDes, load on demand, no need to normalize
  • 15. MNEMONIC PROJECT POSITIONING • Apache Mnemonic (Incubating) is a Java based non-volatile memory library (API) for in-place structured data processing and computing • Addresses Big Data performance issues • High cost of Serialization/De-serialization • Frequent unpredicted long Garbage Collection pauses • Burst temporary object creation/destruction • Frequent high cost JNI call with stack marshalling • System paging and kernel structure updating due to massive data processing
  • 16. USE CASE EXAMPLE: SPARK + MNEMONIC =IN-PLACE MACHINE LEARNING COMPUTING
  • 17. NON-VOLATILE RDD CODE FOR SPARK • ~600 line of Non-Volatile RDD code is added into Spark, to alter data flow from Spark RDD via JVM heap to Mnemonic. • Spill RDDs to off-heap and disk can be avoid. • RDD data cache and local checkpoints can also be avoided when using Nonvolatile or Durable objects.
  • 18. INTERNAL STRUCTURED DATA REPRESENTATION  Spark experiment uses: 2D Linked List data structure to store data in persistent memory and storage.  N nodes: represent the first level of linked list.  V nodes: represent the values of first level of linked list that still is a linked list.  Only N2 and V1 (methods & volatile fields) are instantiated on heap.
  • 20. SPARK MLLIB KMEANS PERFORMANCE IMPROVEMENT • SPARK ML Kmeans implements one of the most popular clustering algorithms • Exp1: spark.storage.memoryFraction 0.6 • Exp2: spark.storage.memoryFraction 0.2
  • 21. CONFIGURATION FOR PERFORMANCE MEASUREMENT  Platform Software Configuration  OS: CentOS 7  JVM: JDK8 update 60  ParallelOldGC is used for Spark ML throughput workload Kmeans  Spark1.5.0  Platform Hardware  Haswell EP (72 cores, E5-2699 v3 @ 2.3GHz)  256GB DDR4-2133 on-board Memory  Storage media: SATA SSD  Device Model: INTEL SSDSC2BB016T4  Serial Number: BTWD442601311P6HGN  User Capacity: 1.60 TB  SATA Version is: SATA 3.0, 6.0 Gb/s  Spark Kmeans Experiments Setting: – System memory is reduced to 128G via static dummy ramfile occupancy – 20 executors with 4 GB JVM heap, 3 cores – Both of datasets are stored on SSD – Experiment 1 – spark.storage.memoryFraction 0.6 – 100,000,000 records with vector in 12 dimension (20 partitions) – Experiment 2 – spark.storage.memoryFraction 0.2 – 100,000,000 records with vector in 6 dimension (20 partitions)
  • 22. DISCLAIMERS Apache Mnemonic is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by Apache Incubator PMC. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. While Mnemonic is operating within the time period where the application is processing their durable data, Mnemonic guarantees the persistency once Mnemonic is closed gracefully.
  • 23. PROJECT LINKS Mnemonic Community: dev@mnemonic.incubator.apache.org Source Code: https://github.com/apache/incubator-mnemonic JIRA Tracking: https://issues.apache.org/jira/browse/MNEMONIC Website: http://mnemonic.incubator.apache.org/ Contact: yanpingw@apache.org; garyw@apache.org