SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
A H i g h Pe r f o r m a n c e D a t a P l a t f o r m U s i n g F l a s h
M A R C H 2 0 1 8
S u p e r C o m p u t i n g A s i a
• Edit Master text styles
• Second level
• Third level
• Fourth level
• Fifth level
$48M in Funding20+ Deployments
Industrial IoT
Financial Services
Telecommunications
Cyber Security
2
100+ Employees
NA
APAC
EMEA
iguazio
• Disk and Network were still slow
• Network was in general slower than disk
When Hadoop started
Data at scale
Compute
Data
Storage
Distribute data
• Disk is faster than Network
Compute is where data is
Compute
Data
StorageDistribute compute
• Scalable but inherently batch and slow
• Disk and Network I/O (Map, reduce, shuffle … )
• Runs on JVM
• Not suitable for all type of workloads (real-time, interactive, iterative M/L)
• Cannot scale compute independent of storage and vice versa
Big Data Workloads –
Performance Characteristics
Big Data Workloads –
Performance Characteristics
Big Data Workload Disk I/O Network I/O Compute
Iterative Machine Learning
Jobs
Interactive Analytics at
scale
Lambda & Kappa
Architecture at scale
Typically Disk or Network maxes out during a massively parallel job, before Compute
High
Medium
Low
Big Data Workloads –
Performance Characteristics
Big Data Workload Disk I/O Network I/O Compute
Iterative Machine Learning
Jobs
Interactive Analytics at
scale
Lambda & Kappa
Architecture at scale
Traditional Database
Workload
Compare this to traditional RDBBS – CPU or Disk I/O maxes out
Typically Disk or Network maxes out during a massively parallel job, before Compute
High
Medium
Low
Big Data Workloads –
Performance Characteristics
Big Data Workload Disk I/O Network I/O Compute
Iterative Machine Learning
Jobs
Interactive Analytics at
scale
Lambda & Kappa
Architecture at scale
Traditional Database
Workload
in-memory and high-speed networking ?
High
Medium
Low
• L1 cache reference 0.5 ns
• Branch mispredict 5 ns
• L2 cache reference 7 ns
• Mutex lock/unlock 100 ns
• Main memory reference 100 ns
• Send 2K bytes over 1 Gbps network 20,000 ns
• SSD seek 80,000 ns
• Read 1 MB sequentially from memory 250,000 ns
• Round trip within same datacenter 500,000 ns
• Disk seek 10,000,000 ns
• Read 1 MB sequentially from network 10,000,000 ns
• Read 1 MB sequentially from disk 30,000,000 ns
• Send packet CA->Netherlands->CA 150,000,000 ns
Designs, Lessons and Advice from Building Large Distributed Systems by Dr Jeff Dean of Google Source http://www.slideshare.net/ikewu83/dean-
keynoteladis2009-4885081
Optimizing Big Data
Performance
RAM is 10-20x more expensive than Flash in $Cost / GB
RAM is expensive
SSDs prices are falling
• L1 cache reference 0.5 ns
• Branch mispredict 5 ns
• L2 cache reference 7 ns
• Mutex lock/unlock 100 ns
• Main memory reference 100 ns
• Send 2K bytes over 1 Gbps network 20,000 ns
• SSD seek 80,000 ns
• Read 1 MB sequentially from memory 250,000 ns
• Round trip within same datacenter 500,000 ns
• Disk seek 10,000,000 ns
• Read 1 MB sequentially from network 10,000,000 ns
• Read 1 MB sequentially from disk 30,000,000 ns
• Send packet CA->Netherlands->CA 150,000,000 ns
Designs, Lessons and Advice from Building Large Distributed Systems by Dr Jeff Dean of Google Source http://www.slideshare.net/ikewu83/dean-
keynoteladis2009-4885081
Optimizing Big Data
Performance
0
1
2
3
4
5
6
7
8
9
DRAM NVMe SSD SATA SSD SAS HDD
8.5
1.2
0.85
0.04
Cost in $ per GB (Raw)
Cost in $ per GB (Raw)
Price of Memory
Technology 2016/17
• NVMe is a software based standard that was specifically optimized for SSDs connected through the
PCIe interface.
What is NVMe?
• Shorter hardware data access path – directly connected via PCIe, Faster compared to SATA
• NVMe Completely redesigned software - bypasses conventional block layer request queue –
• Asynchronous Submission Queue for requests
• Asynchronous Completion Queue
Why is NVMe faster
(than SATA) ?
Source
http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=2CA58A08BFB2821F59A7996DAC8742F1?doi=10.1.1.697.1493&rep=rep1&type=pdf
17
Micro-Services &
Serverless Functions
Unified DB for
Real-Time & Analytics
EXTERNAL
EVENTS AND
SOURCES
IMMEDIATE
INSIGHTS &
ACTIONS
EXTERNAL
DATA
Enrich
Reinforce
Learning
iguazio
• Combined fresh data
• and historical data
• Immediate Insights
• Rapid time to production
• Supporting common APIs
• Deployment Everywhere
Unified DB for Real-Time & Analytics
FUNCTIONS &
MICRO-SERVICES
SQL & TS
QUERY APIS
ANALYTICS &
AI TOOLS AWS APIS
iguazio Data platform
for real time and analytics
Single platform
All in One
S3
Kinesis
DynamoDB
Secure Intelligent Multi-model
UNIFIED & REAL-TIME DATABASE ENGINE
EXTERNAL DATA LAKES & CLOUDS
Architecture
• In-mem DB performance with
Flash economies and density
• Access data concurrently through
multiple standard APIs
• Fully integrated PaaS
19
LOCAL NVMe
AND OS BYPASS
Tr a d i t i o n a l L a y e r e d A p p r o a c h i g u a z i o
How to take full
advantage of NVMe?
And Advanced
Networking
iguazio © 2016
21
The tests were performed using the Yahoo! Cloud Serving Benchmark (YCSB), an open
source framework for evaluating and comparing the performance of multiple types of
NoSQL database management systems, the de facto industry standard for this purpose.
Performance Results
Independently scaling
compute and storage
iguazio Scale Outx2
Storage
Data nodes
(Processing)
CPU/Drive ratio
Performance
Capacity
Smart Rebuild
Scale Out System
CPU/Drive ratio
Performance
Capacity
Smart Rebuild
Processing + Storage
Scale Up System
Processing
CPU/Drive ratio
Performance
Capacity
Smart Rebuild
Storage
iguazio © 2016
24
Why iguazio?
Very high ingestion rate
Low latency for faster dashboards – in memory speed at the cost of SSD
Real time analytics on both fresh and historical data
Scale out architecture – enabling real time analytics on large data sets
Platform as a service providing cloud experience
iguazio © 2016
25
Business benefits - what’s in it for me?
Increase operational efficiency
Reduce TCO
Faster time to market for new services
Reduce cost of traditional data center operations while constraining growth of expensive cloud services such as
Amazon DynamoDB, Kinesis , S3 , redshift and EMR
Bring new services on-line in less time with greater reliability & security
Optimize business processes
Increase data engineering efficiency
Simplify the overall data pipe-line helping engineering to focus on building applications
Thank You
@Santanu_Dey

Weitere ähnliche Inhalte

Was ist angesagt?

IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...In-Memory Computing Summit
 
Hardware Provisioning for MongoDB
Hardware Provisioning for MongoDBHardware Provisioning for MongoDB
Hardware Provisioning for MongoDBMongoDB
 
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...Redis Labs
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File Systemtutchiio
 
WebLogic Stability; Detect and Analyse Stuck Threads
WebLogic Stability; Detect and Analyse Stuck ThreadsWebLogic Stability; Detect and Analyse Stuck Threads
WebLogic Stability; Detect and Analyse Stuck ThreadsMaarten Smeets
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterMongoDB
 
Redis for horizontally scaled data processing at jFrog bintray
Redis for horizontally scaled data processing at jFrog bintrayRedis for horizontally scaled data processing at jFrog bintray
Redis for horizontally scaled data processing at jFrog bintrayRedis Labs
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performancePostgreSQL-Consulting
 
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GCHadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GCErik Krogen
 
Loadays MySQL
Loadays MySQLLoadays MySQL
Loadays MySQLlefredbe
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionSplunk
 
Володимир Цап "Constraint driven infrastructure - scale or tune?"
Володимир Цап "Constraint driven infrastructure - scale or tune?"Володимир Цап "Constraint driven infrastructure - scale or tune?"
Володимир Цап "Constraint driven infrastructure - scale or tune?"Fwdays
 
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at Huawei
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at HuaweiHBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at Huawei
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at HuaweiMichael Stack
 
Performance Analysis and Troubleshooting Methodologies for Databases
Performance Analysis and Troubleshooting Methodologies for DatabasesPerformance Analysis and Troubleshooting Methodologies for Databases
Performance Analysis and Troubleshooting Methodologies for DatabasesScyllaDB
 
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair UpdatesScylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair UpdatesScyllaDB
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationScott Miao
 

Was ist angesagt? (20)

Spark Tips & Tricks
Spark Tips & TricksSpark Tips & Tricks
Spark Tips & Tricks
 
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
 
Hardware Provisioning for MongoDB
Hardware Provisioning for MongoDBHardware Provisioning for MongoDB
Hardware Provisioning for MongoDB
 
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File System
 
WebLogic Stability; Detect and Analyse Stuck Threads
WebLogic Stability; Detect and Analyse Stuck ThreadsWebLogic Stability; Detect and Analyse Stuck Threads
WebLogic Stability; Detect and Analyse Stuck Threads
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB Cluster
 
Redis for horizontally scaled data processing at jFrog bintray
Redis for horizontally scaled data processing at jFrog bintrayRedis for horizontally scaled data processing at jFrog bintray
Redis for horizontally scaled data processing at jFrog bintray
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GCHadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
 
Loadays MySQL
Loadays MySQLLoadays MySQL
Loadays MySQL
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
25 snowflake
25 snowflake25 snowflake
25 snowflake
 
Володимир Цап "Constraint driven infrastructure - scale or tune?"
Володимир Цап "Constraint driven infrastructure - scale or tune?"Володимир Цап "Constraint driven infrastructure - scale or tune?"
Володимир Цап "Constraint driven infrastructure - scale or tune?"
 
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at Huawei
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at HuaweiHBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at Huawei
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at Huawei
 
Accordion HBaseCon 2017
Accordion HBaseCon 2017Accordion HBaseCon 2017
Accordion HBaseCon 2017
 
Performance Analysis and Troubleshooting Methodologies for Databases
Performance Analysis and Troubleshooting Methodologies for DatabasesPerformance Analysis and Troubleshooting Methodologies for Databases
Performance Analysis and Troubleshooting Methodologies for Databases
 
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair UpdatesScylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter Migration
 
GFS
GFSGFS
GFS
 

Ähnlich wie Building a High Performance Analytics Platform

Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed_Hat_Storage
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarKognitio
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcturesabnees
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutionssolarisyougood
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax
 
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...Splunk
 
Best storage engine for MySQL
Best storage engine for MySQLBest storage engine for MySQL
Best storage engine for MySQLtomflemingh2
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_clusterPrabhat gangwar
 
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, SuccessesSQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, SuccessesArnon Shimoni
 
How to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data ProjectHow to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data ProjectPeak Hosting
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkDatabricks
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8MongoDB
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Boni Bruno
 
The Last Frontier- Virtualization, Hybrid Management and the Cloud
The Last Frontier-  Virtualization, Hybrid Management and the CloudThe Last Frontier-  Virtualization, Hybrid Management and the Cloud
The Last Frontier- Virtualization, Hybrid Management and the CloudKellyn Pot'Vin-Gorman
 

Ähnlich wie Building a High Performance Analytics Platform (20)

Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutions
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
 
Best storage engine for MySQL
Best storage engine for MySQLBest storage engine for MySQL
Best storage engine for MySQL
 
Galaxy Big Data with MariaDB
Galaxy Big Data with MariaDBGalaxy Big Data with MariaDB
Galaxy Big Data with MariaDB
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_cluster
 
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, SuccessesSQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
 
How to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data ProjectHow to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data Project
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache Spark
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810
 
The Last Frontier- Virtualization, Hybrid Management and the Cloud
The Last Frontier-  Virtualization, Hybrid Management and the CloudThe Last Frontier-  Virtualization, Hybrid Management and the Cloud
The Last Frontier- Virtualization, Hybrid Management and the Cloud
 

Kürzlich hochgeladen

Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...kalichargn70th171
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdfAndrey Devyatkin
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfkalichargn70th171
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfmaor17
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxRTS corp
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 

Kürzlich hochgeladen (20)

Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdf
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptx
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 

Building a High Performance Analytics Platform

  • 1. A H i g h Pe r f o r m a n c e D a t a P l a t f o r m U s i n g F l a s h M A R C H 2 0 1 8 S u p e r C o m p u t i n g A s i a
  • 2. • Edit Master text styles • Second level • Third level • Fourth level • Fifth level $48M in Funding20+ Deployments Industrial IoT Financial Services Telecommunications Cyber Security 2 100+ Employees NA APAC EMEA iguazio
  • 3. • Disk and Network were still slow • Network was in general slower than disk When Hadoop started
  • 5. • Disk is faster than Network Compute is where data is Compute Data StorageDistribute compute
  • 6. • Scalable but inherently batch and slow • Disk and Network I/O (Map, reduce, shuffle … ) • Runs on JVM • Not suitable for all type of workloads (real-time, interactive, iterative M/L) • Cannot scale compute independent of storage and vice versa Big Data Workloads – Performance Characteristics
  • 7. Big Data Workloads – Performance Characteristics Big Data Workload Disk I/O Network I/O Compute Iterative Machine Learning Jobs Interactive Analytics at scale Lambda & Kappa Architecture at scale Typically Disk or Network maxes out during a massively parallel job, before Compute High Medium Low
  • 8. Big Data Workloads – Performance Characteristics Big Data Workload Disk I/O Network I/O Compute Iterative Machine Learning Jobs Interactive Analytics at scale Lambda & Kappa Architecture at scale Traditional Database Workload Compare this to traditional RDBBS – CPU or Disk I/O maxes out Typically Disk or Network maxes out during a massively parallel job, before Compute High Medium Low
  • 9. Big Data Workloads – Performance Characteristics Big Data Workload Disk I/O Network I/O Compute Iterative Machine Learning Jobs Interactive Analytics at scale Lambda & Kappa Architecture at scale Traditional Database Workload in-memory and high-speed networking ? High Medium Low
  • 10. • L1 cache reference 0.5 ns • Branch mispredict 5 ns • L2 cache reference 7 ns • Mutex lock/unlock 100 ns • Main memory reference 100 ns • Send 2K bytes over 1 Gbps network 20,000 ns • SSD seek 80,000 ns • Read 1 MB sequentially from memory 250,000 ns • Round trip within same datacenter 500,000 ns • Disk seek 10,000,000 ns • Read 1 MB sequentially from network 10,000,000 ns • Read 1 MB sequentially from disk 30,000,000 ns • Send packet CA->Netherlands->CA 150,000,000 ns Designs, Lessons and Advice from Building Large Distributed Systems by Dr Jeff Dean of Google Source http://www.slideshare.net/ikewu83/dean- keynoteladis2009-4885081 Optimizing Big Data Performance
  • 11. RAM is 10-20x more expensive than Flash in $Cost / GB RAM is expensive
  • 12. SSDs prices are falling
  • 13. • L1 cache reference 0.5 ns • Branch mispredict 5 ns • L2 cache reference 7 ns • Mutex lock/unlock 100 ns • Main memory reference 100 ns • Send 2K bytes over 1 Gbps network 20,000 ns • SSD seek 80,000 ns • Read 1 MB sequentially from memory 250,000 ns • Round trip within same datacenter 500,000 ns • Disk seek 10,000,000 ns • Read 1 MB sequentially from network 10,000,000 ns • Read 1 MB sequentially from disk 30,000,000 ns • Send packet CA->Netherlands->CA 150,000,000 ns Designs, Lessons and Advice from Building Large Distributed Systems by Dr Jeff Dean of Google Source http://www.slideshare.net/ikewu83/dean- keynoteladis2009-4885081 Optimizing Big Data Performance
  • 14. 0 1 2 3 4 5 6 7 8 9 DRAM NVMe SSD SATA SSD SAS HDD 8.5 1.2 0.85 0.04 Cost in $ per GB (Raw) Cost in $ per GB (Raw) Price of Memory Technology 2016/17
  • 15. • NVMe is a software based standard that was specifically optimized for SSDs connected through the PCIe interface. What is NVMe?
  • 16. • Shorter hardware data access path – directly connected via PCIe, Faster compared to SATA • NVMe Completely redesigned software - bypasses conventional block layer request queue – • Asynchronous Submission Queue for requests • Asynchronous Completion Queue Why is NVMe faster (than SATA) ? Source http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=2CA58A08BFB2821F59A7996DAC8742F1?doi=10.1.1.697.1493&rep=rep1&type=pdf
  • 17. 17 Micro-Services & Serverless Functions Unified DB for Real-Time & Analytics EXTERNAL EVENTS AND SOURCES IMMEDIATE INSIGHTS & ACTIONS EXTERNAL DATA Enrich Reinforce Learning iguazio • Combined fresh data • and historical data • Immediate Insights • Rapid time to production • Supporting common APIs • Deployment Everywhere Unified DB for Real-Time & Analytics
  • 18. FUNCTIONS & MICRO-SERVICES SQL & TS QUERY APIS ANALYTICS & AI TOOLS AWS APIS iguazio Data platform for real time and analytics Single platform All in One S3 Kinesis DynamoDB Secure Intelligent Multi-model UNIFIED & REAL-TIME DATABASE ENGINE EXTERNAL DATA LAKES & CLOUDS Architecture • In-mem DB performance with Flash economies and density • Access data concurrently through multiple standard APIs • Fully integrated PaaS
  • 19. 19 LOCAL NVMe AND OS BYPASS Tr a d i t i o n a l L a y e r e d A p p r o a c h i g u a z i o How to take full advantage of NVMe?
  • 21. iguazio © 2016 21 The tests were performed using the Yahoo! Cloud Serving Benchmark (YCSB), an open source framework for evaluating and comparing the performance of multiple types of NoSQL database management systems, the de facto industry standard for this purpose. Performance Results
  • 22. Independently scaling compute and storage iguazio Scale Outx2 Storage Data nodes (Processing) CPU/Drive ratio Performance Capacity Smart Rebuild Scale Out System CPU/Drive ratio Performance Capacity Smart Rebuild Processing + Storage Scale Up System Processing CPU/Drive ratio Performance Capacity Smart Rebuild Storage
  • 23. iguazio © 2016 24 Why iguazio? Very high ingestion rate Low latency for faster dashboards – in memory speed at the cost of SSD Real time analytics on both fresh and historical data Scale out architecture – enabling real time analytics on large data sets Platform as a service providing cloud experience
  • 24. iguazio © 2016 25 Business benefits - what’s in it for me? Increase operational efficiency Reduce TCO Faster time to market for new services Reduce cost of traditional data center operations while constraining growth of expensive cloud services such as Amazon DynamoDB, Kinesis , S3 , redshift and EMR Bring new services on-line in less time with greater reliability & security Optimize business processes Increase data engineering efficiency Simplify the overall data pipe-line helping engineering to focus on building applications