SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Haytham ElFadeel
Researcher in Computer Sciences
Level 300
Agenda
• Introduction
– Glance at the Scalable systems.
– What the available storage solution.
– The problem with the current solutions.
– The problem with the Database
• The Next-Generation of Storage System
– Key-Value store systems.
– Performance comparison.
• How it’s works
• Discussions, Q/A
Glance at Scalable Systems
• Scalable systems
– Scalability is the ability to provide better
performance when you add more computing
power.
– This performance gained should be relevant to the
added computing power.
– Examples: Google, Yahoo, Facebook, Amazon, eBay,
Orkut, Google App Engine, etc.
Glance at Scalable Systems
• Scalable types
– Vertical Scalability: Adding resource within the
same logical unit to increase the capacity. For
example: Add more CPUs, or expanding the storage
or the memory.
– Horizontal Scalability: Add multiple logical units of
resources and make them together work as a single
unit. You can think about it like: Clustering,
Distributed, and Load-Balancing.
Vertical Scalability vs. Horizontal
Scalability
Vertical Scaling
Limited Not limited
Software and HardwareHardware only
Horizontal Scaling
Vertical Scalability vs. Horizontal
Scalability
Haytham ElFadeel Quote:
If you need scalability, urgently, going to
vertical scaling is probably will to be the
easiest, but be sure that Vertical scaling, gets
more and more expensive as you grow, and
While infinite horizontal linear scalability is
difficult to achieve, infinite vertical scalability
is impossible.
Vertical Scalability vs. Horizontal
Scalability
Haytham ElFadeel Quote:
On the other hand Horizontal scalability
doesn’t require you to buy more and more
expensive hardware. It’s meant to be scaled
using commodity storage and server solutions.
But Horizontal scalability isn’t cheap either.
The application has to be built ground up to
run on multiple servers as a single application.
Glance at Scalable Systems
• Facebook
– More than 200,000,000 active user.
– 50,000 photo uploaded per minute.
– The most active social-network in the Web.
• Facebook chat
– The main challenge is maintain the users status.
– Distribute the load should depend on the users,
and they friends to avoid the traveling.
– Building a system that should scale from that start
to serve 100,000,000 user is really hard.
Glance at Scalable Systems
• Amazon
– More than 10,000,000 transition in every holidays.
– The Reliability of the user shopping cart is not
option.
• Google, Yahoo, Microsoft, Kngine, etc
– Processing huge amount of data, more than 1TB.
– Sorting the index by the rank value. Which means,
sort more than 1TB of data.
– Save the Crawled Web pages.
The Available Storage Solutions
• Memory:
– Just a Data Structure :)
• Disk:
– Text File: { XML, Protocol Buffer, Json }
– Binary File: { Serialized, custom format }
– Database: { MySQL, SQL Server, SQLLite, Oracle }
The Available Storage Solutions
• Memory:
– Just a Data Structure :)
• Disk:
– Text File: { XML, Protocol Buffer, Json }
– Binary File: { Serialized, custom format }
– Database: { MySQL, SQL Server, SQLLite, Oracle }
Bad performance
Not portable, questions
about performance
Bad performance,
Complex, huge latency.
What about capacity
The Problem with the Database
• Causes
– Old and Very complex system.
– Many wasted features.
– Many steps to process the SQL query.
– Need administration, and others.
The Problem with the Database
• Causes
– Old and Very complex system.
• The RDMS is very complex system, just like Operating
System:
– Thread Scheduling, Deadlock monitor, Resource manager.
– I/O Manager, Pages Manager, Execution Plan Manager.
– Case Manager, Memory Manager, Transaction Manager, etc.
• Most of DBMS architecture, designs, algorithms came up
around 1970s:
– Different hardware, platform properties.
– Old architecture, design, and algorithms.
Please review resource #1
The Problem with the Database
• Causes
– Many wasted features.
• Today systems have very rich features, simply because
they think that ‘one size fits all’:
– CLR Types, CLR Integration, Replication, Functions.
– Policy, Relations, Transaction, Stored procedure, ACID, etc.
• You can even call a Web Service from SQL Server! All this
mess, make the database appear like a platform and
development environment.
The problem with the Database
• Causes
– Many Steps to process the query.
• Parse the Query.
• Build the expression tree, and resolve the relational
algebra expression.
• Optimize the expression tree.
• Choice the execution plan.
• Start execute.
Please review resource #2, #3
The problem with the Database
• Effects
– Bad Performance: Throughput, Resource usage,
Latency.
– Not Scalable.
The problem with the Database
• Effects
– Bad Performance: Throughput, Resource usage,
Latency:
• Even the faster DBMS ‘MySQL’ can’t provide more than
5,000 query per second*.
• Add to this the consumed resource, and the big latency.
* Depend on the configuration
The problem with the Database
• Effects
– Not Scale:
• The Database is not designed to scale.
• Even if you get a new PC and partition the Database you
will never get (accepted) good performance
improvement.
Please review resource #1
The problem with the Database
The Database give us ACID:
• Atomicity: A transaction is all or nothing.
• Consistency: Only valid data is written to the
database.
• Isolation: pretend all transactions are
happening serially and the data is correct.
• Durability: What you write is what you get.
The problem with the Database
The problem with ACID is that it gives you too
much, it trips you up when you are trying to
scale a system across multiple nodes.
Down time is unacceptable. So your system
needs to be reliable. Reliability requires
multiple nodes to handle machine failures.
To make a scalable systems that can handle lots
and lots of reads and writes you need many
more nodes.
The problem with the Database
Once you try to scale ACID across many
machines you hit problems with network
failures and delays. The algorithms don't work
in a distributed environment at any acceptable
speed.
It’s a dead end
The Next generation of Storage
Systems
From long time ago many researches teams
and companies discovered that the database is
main bottleneck.
Many wasted features, bad performance, and
not designed for scale systems.
The Next generation of Storage
Systems
Building large systems on top of a traditional
RDBMS data storage layer is no longer good
enough.
This talk explores the landscape of new
technologies available today to augment your
data layer to improve performance and
reliability.
Please review resource #4
Key-Value Storage Systems
• Simple data-model, just key-value pairs.
• Every Value Assigned to Key.
• No complex stuff, such as: Relations, ACID, or
SQL quires.
• Simple interface:
– Get(key)
– Put(key, value)
– Delete(key) < Optional
Key-Value Storage Systems
• Designed from the start to scale to hundreds of
machines.
• Designed to be reliable, even if 50% of the
machines crashed.
• No extra work require to add new machine,
just plug the machine and it will work in
harmony.
• Many open source projects (C++, Java, Lisp).
Key-Value Storage Systems
• Who use such systems:
– Facebook.
– Google Orkut, Analysis.
– Google Web Crawling.
– Amazon.
– Powerset.
– eBay.
– Kngine.
– Yahoo.
– General using.
– Storing, and huge data analysis.
– Transactions, and huge data analysis.
Key-Value Storage Systems
You may wonder, can we really live without
Relations, ACID ?!
– The short answer: Absolutely Yes.
– The long answer: Absolutely Yes, But nothing for
free.
Key-Value Storage Systems
Now
You should make your decide
Take the blue pill
And see the truth
Or, Take the red pill
And stay in
wonderland
Key-Value Storage Systems
Key-Value Storage System, and other systems
built around CAP concept:
Consistency: your data is correct all the time.
What you write is what you read.
Availability: you can read and write and write
your data all the time.
Partition Tolerance: if one or more nodes fails
the system still works and becomes consistent
when the system comes on-line.
Key-Value Storage Systems
One Node - Performance Comparison (Web)
• MySql
– 3,030 sets/second.
– 4,670 gets/second.
• Redis
– 11,200 sets/second. (3.7x MySQL)
– 9,840 gets/second. (2.1x MySQL)
• Tokyo Tyrant
– 9,030 sets/second. (3.0x MySQL)
– 9,250 gets/second. (2.0x MySQL)
Please review resource #5
Key-Value Storage Systems
Two High-End Nodes - Performance Comparison
(Web)
• Redis
– 89,230 sets/second.
– 85,840 gets/second.
Key-Value Storage Systems
One Node - Performance Comparison
• SQL Server
– 2,900 sets/second.
– 3,500 gets/second.
• Vina*
– 10,100 sets/second. (3.4x SQL Server)
– 9,970 gets/second. (2.8x SQL Server)
* Vina : Key-Value Storage System used inside Kngine.
How it’s Works
Any Key-Value storage system, consist of two
primary layers:
– Aggregation Layer
– Storing Layer
How it’s Works
Any Key-Value storage system, consist of two
primary layers:
– Aggregation Layer
• Manage the instances, replication and distribution.
– Storing Layer
• One or many Disk-based Hash-Table.
How it’s Works (Storing Layer)
On the board
How it’s Works (Aggregation Layer)
• Received the requests.
• Route it to the target node.
• Manage Partitioning, and Replicas.
• The Partitioning, Replication done by
Consistence Hashing algorithm.
On the board
Please review resource #6
Key-Value Storage Systems
• Amazon Dynamo. < Paper
• Facebook Cassandra. < Open source
• Tokyo Cabinet/Tyrant. < Open source
• Redis < Open source
• MongoDB < Open source
Q / A
References
1. The End of an Architectural Era (It’s Time for a Complete
Rewrite). Paper.
2. Database Systems - Paul Beynon-Davies. Book.
3. Inside SQL Server engine - MS Press. Book.
4. Drop ACID and Think About Data. Highscalability.com.
5. Redis vs MySQL vs Tokyo Tyrant. Colin Howe’s Blog.
6. Consistent Hashing and Random Trees: Distributed
Caching Protocols for Relieving Hot Spots on the World
Wide Web. Paper.
7. Dynamo: Amazon’s Highly Available Key-value Store.
Paper.
8. Redis, Tokyo Tyrant project.
9. Consistent Hashing. Tom white Blog.
Resources
1. High Scalability blog.
Highscalability.com
1. It’s all about innovation blog.
Hfadeel.com/blog.
2. All Things Distributed.
Allthingsdistributed.com
3. Tom White blog
lexemetech.com
Thanks…
Dear all,
All of my presentation content it's open source.
Please feel free to use, copy, and re-distribute it.

Weitere ähnliche Inhalte

Was ist angesagt?

Sql vs NO-SQL database differences explained
Sql vs NO-SQL database differences explainedSql vs NO-SQL database differences explained
Sql vs NO-SQL database differences explainedSatya Pal
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational DatabasesUdi Bauman
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture OverviewChristopher Foot
 
NoSQL databases pros and cons
NoSQL databases pros and consNoSQL databases pros and cons
NoSQL databases pros and consFabio Fumarola
 
To SQL or NoSQL, that is the question
To SQL or NoSQL, that is the questionTo SQL or NoSQL, that is the question
To SQL or NoSQL, that is the questionKrishnakumar S
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureVenu Anuganti
 
Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesMaynooth University
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQLRTigger
 
NoSQL Data Architecture Patterns
NoSQL Data ArchitecturePatternsNoSQL Data ArchitecturePatterns
NoSQL Data Architecture PatternsMaynooth University
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databasesguestdfd1ec
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sqlRam kumar
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraJeff Smoley
 
Polyglot Database - Linuxcon North America 2016
Polyglot Database - Linuxcon North America 2016Polyglot Database - Linuxcon North America 2016
Polyglot Database - Linuxcon North America 2016Dave Stokes
 

Was ist angesagt? (20)

Sql vs NO-SQL database differences explained
Sql vs NO-SQL database differences explainedSql vs NO-SQL database differences explained
Sql vs NO-SQL database differences explained
 
No sql3 rmoug
No sql3 rmougNo sql3 rmoug
No sql3 rmoug
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
NoSQL databases pros and cons
NoSQL databases pros and consNoSQL databases pros and cons
NoSQL databases pros and cons
 
To SQL or NoSQL, that is the question
To SQL or NoSQL, that is the questionTo SQL or NoSQL, that is the question
To SQL or NoSQL, that is the question
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choices
 
Relational vs. Non-Relational
Relational vs. Non-RelationalRelational vs. Non-Relational
Relational vs. Non-Relational
 
SQL vs. NoSQL
SQL vs. NoSQLSQL vs. NoSQL
SQL vs. NoSQL
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
 
RDBMS vs NoSQL
RDBMS vs NoSQLRDBMS vs NoSQL
RDBMS vs NoSQL
 
NoSQL Data Architecture Patterns
NoSQL Data ArchitecturePatternsNoSQL Data ArchitecturePatterns
NoSQL Data Architecture Patterns
 
Big data rmoug
Big data rmougBig data rmoug
Big data rmoug
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with Cassandra
 
Nosql
NosqlNosql
Nosql
 
Polyglot Database - Linuxcon North America 2016
Polyglot Database - Linuxcon North America 2016Polyglot Database - Linuxcon North America 2016
Polyglot Database - Linuxcon North America 2016
 

Andere mochten auch

Ibm tivoli storage area network manager a practical introduction sg246848
Ibm tivoli storage area network manager a practical introduction sg246848Ibm tivoli storage area network manager a practical introduction sg246848
Ibm tivoli storage area network manager a practical introduction sg246848Banking at Ho Chi Minh city
 
BSc Applied Computing
BSc Applied ComputingBSc Applied Computing
BSc Applied Computingbwcelearning
 
LANDESK ITAM Review Tools Day Presentation 2015
LANDESK ITAM Review Tools Day Presentation 2015LANDESK ITAM Review Tools Day Presentation 2015
LANDESK ITAM Review Tools Day Presentation 2015Martin Thompson
 
Introduction to near real time computing
Introduction to near real time computingIntroduction to near real time computing
Introduction to near real time computingTao Li
 
Disrete mathematics and_its application_by_rosen _7th edition_lecture_1
Disrete mathematics and_its application_by_rosen _7th edition_lecture_1Disrete mathematics and_its application_by_rosen _7th edition_lecture_1
Disrete mathematics and_its application_by_rosen _7th edition_lecture_1taimoor iftikhar
 
Theory of Computation
Theory of ComputationTheory of Computation
Theory of ComputationShiraz316
 
Computer system and network configuration
Computer system and network configurationComputer system and network configuration
Computer system and network configurationVon Alvarez
 
Lecture on graphics
Lecture on graphicsLecture on graphics
Lecture on graphicsRafi_Dar
 
Real Time Systems &amp; RTOS
Real Time Systems &amp; RTOSReal Time Systems &amp; RTOS
Real Time Systems &amp; RTOSVishwa Mohan
 
Storage Technology Overview
Storage Technology OverviewStorage Technology Overview
Storage Technology Overviewnomathjobs
 
Integrating It Frameworks, Methodologies And Best Practices Into It Delivery ...
Integrating It Frameworks, Methodologies And Best Practices Into It Delivery ...Integrating It Frameworks, Methodologies And Best Practices Into It Delivery ...
Integrating It Frameworks, Methodologies And Best Practices Into It Delivery ...Alan McSweeney
 
Citrix XenDesktop and XenApp 7.5 Architecture Deployment
Citrix XenDesktop and XenApp 7.5 Architecture DeploymentCitrix XenDesktop and XenApp 7.5 Architecture Deployment
Citrix XenDesktop and XenApp 7.5 Architecture DeploymentHuy Pham
 
VMware vSphere 5.1 Overview
VMware vSphere 5.1 OverviewVMware vSphere 5.1 Overview
VMware vSphere 5.1 OverviewESXLab
 

Andere mochten auch (20)

Ibm tivoli storage area network manager a practical introduction sg246848
Ibm tivoli storage area network manager a practical introduction sg246848Ibm tivoli storage area network manager a practical introduction sg246848
Ibm tivoli storage area network manager a practical introduction sg246848
 
BSc Applied Computing
BSc Applied ComputingBSc Applied Computing
BSc Applied Computing
 
LANDESK ITAM Review Tools Day Presentation 2015
LANDESK ITAM Review Tools Day Presentation 2015LANDESK ITAM Review Tools Day Presentation 2015
LANDESK ITAM Review Tools Day Presentation 2015
 
IT ASSET MANAGEMENT
IT ASSET MANAGEMENTIT ASSET MANAGEMENT
IT ASSET MANAGEMENT
 
Introduction to near real time computing
Introduction to near real time computingIntroduction to near real time computing
Introduction to near real time computing
 
Disrete mathematics and_its application_by_rosen _7th edition_lecture_1
Disrete mathematics and_its application_by_rosen _7th edition_lecture_1Disrete mathematics and_its application_by_rosen _7th edition_lecture_1
Disrete mathematics and_its application_by_rosen _7th edition_lecture_1
 
Theory of Computation
Theory of ComputationTheory of Computation
Theory of Computation
 
Storage Technologies
Storage TechnologiesStorage Technologies
Storage Technologies
 
E book-the evolution of storage technologies
E book-the evolution of storage technologiesE book-the evolution of storage technologies
E book-the evolution of storage technologies
 
network storage
network storagenetwork storage
network storage
 
Applications of computer graphics
Applications of computer graphicsApplications of computer graphics
Applications of computer graphics
 
Computer system and network configuration
Computer system and network configurationComputer system and network configuration
Computer system and network configuration
 
Lecture on graphics
Lecture on graphicsLecture on graphics
Lecture on graphics
 
Real Time Systems &amp; RTOS
Real Time Systems &amp; RTOSReal Time Systems &amp; RTOS
Real Time Systems &amp; RTOS
 
Storage Technology Overview
Storage Technology OverviewStorage Technology Overview
Storage Technology Overview
 
Integrating It Frameworks, Methodologies And Best Practices Into It Delivery ...
Integrating It Frameworks, Methodologies And Best Practices Into It Delivery ...Integrating It Frameworks, Methodologies And Best Practices Into It Delivery ...
Integrating It Frameworks, Methodologies And Best Practices Into It Delivery ...
 
Introduction to compilers
Introduction to compilersIntroduction to compilers
Introduction to compilers
 
Router and Switches Cisco
Router and Switches CiscoRouter and Switches Cisco
Router and Switches Cisco
 
Citrix XenDesktop and XenApp 7.5 Architecture Deployment
Citrix XenDesktop and XenApp 7.5 Architecture DeploymentCitrix XenDesktop and XenApp 7.5 Architecture Deployment
Citrix XenDesktop and XenApp 7.5 Architecture Deployment
 
VMware vSphere 5.1 Overview
VMware vSphere 5.1 OverviewVMware vSphere 5.1 Overview
VMware vSphere 5.1 Overview
 

Ähnlich wie Storage Systems for High Scalable Systems Presentation

NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
 
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Amazon Web Services
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
VMworld 2014: Virtualizing Databases
VMworld 2014: Virtualizing DatabasesVMworld 2014: Virtualizing Databases
VMworld 2014: Virtualizing DatabasesVMworld
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyershuguk
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Archroyans
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Archguest18a0f1
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Archmclee
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics PlatformSantanu Dey
 
AWS Summit 2013 | Singapore - Understanding Databases Options
AWS Summit 2013 | Singapore - Understanding Databases OptionsAWS Summit 2013 | Singapore - Understanding Databases Options
AWS Summit 2013 | Singapore - Understanding Databases OptionsAmazon Web Services
 
Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 NotesRoss Lawley
 
AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...
AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...
AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...Amazon Web Services
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld
 
Architecture and Design MySQL powered applications by Peter Zaitsev Meetup Sa...
Architecture and Design MySQL powered applications by Peter Zaitsev Meetup Sa...Architecture and Design MySQL powered applications by Peter Zaitsev Meetup Sa...
Architecture and Design MySQL powered applications by Peter Zaitsev Meetup Sa...MySQL Brasil
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!Andraz Tori
 

Ähnlich wie Storage Systems for High Scalable Systems Presentation (20)

NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
MongoDB
MongoDBMongoDB
MongoDB
 
20080611accel
20080611accel20080611accel
20080611accel
 
VMworld 2014: Virtualizing Databases
VMworld 2014: Virtualizing DatabasesVMworld 2014: Virtualizing Databases
VMworld 2014: Virtualizing Databases
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyers
 
Amazon Aurora (Debanjan Saha) - AWS DB Day
Amazon Aurora (Debanjan Saha) - AWS DB DayAmazon Aurora (Debanjan Saha) - AWS DB Day
Amazon Aurora (Debanjan Saha) - AWS DB Day
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
 
Master.pptx
Master.pptxMaster.pptx
Master.pptx
 
AWS Summit 2013 | Singapore - Understanding Databases Options
AWS Summit 2013 | Singapore - Understanding Databases OptionsAWS Summit 2013 | Singapore - Understanding Databases Options
AWS Summit 2013 | Singapore - Understanding Databases Options
 
Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 Notes
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...
AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...
AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
 
Architecture and Design MySQL powered applications by Peter Zaitsev Meetup Sa...
Architecture and Design MySQL powered applications by Peter Zaitsev Meetup Sa...Architecture and Design MySQL powered applications by Peter Zaitsev Meetup Sa...
Architecture and Design MySQL powered applications by Peter Zaitsev Meetup Sa...
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!
 

Kürzlich hochgeladen

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Kürzlich hochgeladen (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Storage Systems for High Scalable Systems Presentation

  • 1. Haytham ElFadeel Researcher in Computer Sciences Level 300
  • 2. Agenda • Introduction – Glance at the Scalable systems. – What the available storage solution. – The problem with the current solutions. – The problem with the Database • The Next-Generation of Storage System – Key-Value store systems. – Performance comparison. • How it’s works • Discussions, Q/A
  • 3. Glance at Scalable Systems • Scalable systems – Scalability is the ability to provide better performance when you add more computing power. – This performance gained should be relevant to the added computing power. – Examples: Google, Yahoo, Facebook, Amazon, eBay, Orkut, Google App Engine, etc.
  • 4. Glance at Scalable Systems • Scalable types – Vertical Scalability: Adding resource within the same logical unit to increase the capacity. For example: Add more CPUs, or expanding the storage or the memory. – Horizontal Scalability: Add multiple logical units of resources and make them together work as a single unit. You can think about it like: Clustering, Distributed, and Load-Balancing.
  • 5. Vertical Scalability vs. Horizontal Scalability Vertical Scaling Limited Not limited Software and HardwareHardware only Horizontal Scaling
  • 6. Vertical Scalability vs. Horizontal Scalability Haytham ElFadeel Quote: If you need scalability, urgently, going to vertical scaling is probably will to be the easiest, but be sure that Vertical scaling, gets more and more expensive as you grow, and While infinite horizontal linear scalability is difficult to achieve, infinite vertical scalability is impossible.
  • 7. Vertical Scalability vs. Horizontal Scalability Haytham ElFadeel Quote: On the other hand Horizontal scalability doesn’t require you to buy more and more expensive hardware. It’s meant to be scaled using commodity storage and server solutions. But Horizontal scalability isn’t cheap either. The application has to be built ground up to run on multiple servers as a single application.
  • 8. Glance at Scalable Systems • Facebook – More than 200,000,000 active user. – 50,000 photo uploaded per minute. – The most active social-network in the Web. • Facebook chat – The main challenge is maintain the users status. – Distribute the load should depend on the users, and they friends to avoid the traveling. – Building a system that should scale from that start to serve 100,000,000 user is really hard.
  • 9. Glance at Scalable Systems • Amazon – More than 10,000,000 transition in every holidays. – The Reliability of the user shopping cart is not option. • Google, Yahoo, Microsoft, Kngine, etc – Processing huge amount of data, more than 1TB. – Sorting the index by the rank value. Which means, sort more than 1TB of data. – Save the Crawled Web pages.
  • 10. The Available Storage Solutions • Memory: – Just a Data Structure :) • Disk: – Text File: { XML, Protocol Buffer, Json } – Binary File: { Serialized, custom format } – Database: { MySQL, SQL Server, SQLLite, Oracle }
  • 11. The Available Storage Solutions • Memory: – Just a Data Structure :) • Disk: – Text File: { XML, Protocol Buffer, Json } – Binary File: { Serialized, custom format } – Database: { MySQL, SQL Server, SQLLite, Oracle } Bad performance Not portable, questions about performance Bad performance, Complex, huge latency. What about capacity
  • 12. The Problem with the Database • Causes – Old and Very complex system. – Many wasted features. – Many steps to process the SQL query. – Need administration, and others.
  • 13. The Problem with the Database • Causes – Old and Very complex system. • The RDMS is very complex system, just like Operating System: – Thread Scheduling, Deadlock monitor, Resource manager. – I/O Manager, Pages Manager, Execution Plan Manager. – Case Manager, Memory Manager, Transaction Manager, etc. • Most of DBMS architecture, designs, algorithms came up around 1970s: – Different hardware, platform properties. – Old architecture, design, and algorithms. Please review resource #1
  • 14. The Problem with the Database • Causes – Many wasted features. • Today systems have very rich features, simply because they think that ‘one size fits all’: – CLR Types, CLR Integration, Replication, Functions. – Policy, Relations, Transaction, Stored procedure, ACID, etc. • You can even call a Web Service from SQL Server! All this mess, make the database appear like a platform and development environment.
  • 15. The problem with the Database • Causes – Many Steps to process the query. • Parse the Query. • Build the expression tree, and resolve the relational algebra expression. • Optimize the expression tree. • Choice the execution plan. • Start execute. Please review resource #2, #3
  • 16. The problem with the Database • Effects – Bad Performance: Throughput, Resource usage, Latency. – Not Scalable.
  • 17. The problem with the Database • Effects – Bad Performance: Throughput, Resource usage, Latency: • Even the faster DBMS ‘MySQL’ can’t provide more than 5,000 query per second*. • Add to this the consumed resource, and the big latency. * Depend on the configuration
  • 18. The problem with the Database • Effects – Not Scale: • The Database is not designed to scale. • Even if you get a new PC and partition the Database you will never get (accepted) good performance improvement. Please review resource #1
  • 19. The problem with the Database The Database give us ACID: • Atomicity: A transaction is all or nothing. • Consistency: Only valid data is written to the database. • Isolation: pretend all transactions are happening serially and the data is correct. • Durability: What you write is what you get.
  • 20. The problem with the Database The problem with ACID is that it gives you too much, it trips you up when you are trying to scale a system across multiple nodes. Down time is unacceptable. So your system needs to be reliable. Reliability requires multiple nodes to handle machine failures. To make a scalable systems that can handle lots and lots of reads and writes you need many more nodes.
  • 21. The problem with the Database Once you try to scale ACID across many machines you hit problems with network failures and delays. The algorithms don't work in a distributed environment at any acceptable speed. It’s a dead end
  • 22. The Next generation of Storage Systems From long time ago many researches teams and companies discovered that the database is main bottleneck. Many wasted features, bad performance, and not designed for scale systems.
  • 23. The Next generation of Storage Systems Building large systems on top of a traditional RDBMS data storage layer is no longer good enough. This talk explores the landscape of new technologies available today to augment your data layer to improve performance and reliability. Please review resource #4
  • 24. Key-Value Storage Systems • Simple data-model, just key-value pairs. • Every Value Assigned to Key. • No complex stuff, such as: Relations, ACID, or SQL quires. • Simple interface: – Get(key) – Put(key, value) – Delete(key) < Optional
  • 25. Key-Value Storage Systems • Designed from the start to scale to hundreds of machines. • Designed to be reliable, even if 50% of the machines crashed. • No extra work require to add new machine, just plug the machine and it will work in harmony. • Many open source projects (C++, Java, Lisp).
  • 26. Key-Value Storage Systems • Who use such systems: – Facebook. – Google Orkut, Analysis. – Google Web Crawling. – Amazon. – Powerset. – eBay. – Kngine. – Yahoo. – General using. – Storing, and huge data analysis. – Transactions, and huge data analysis.
  • 27. Key-Value Storage Systems You may wonder, can we really live without Relations, ACID ?! – The short answer: Absolutely Yes. – The long answer: Absolutely Yes, But nothing for free.
  • 28. Key-Value Storage Systems Now You should make your decide Take the blue pill And see the truth Or, Take the red pill And stay in wonderland
  • 29. Key-Value Storage Systems Key-Value Storage System, and other systems built around CAP concept: Consistency: your data is correct all the time. What you write is what you read. Availability: you can read and write and write your data all the time. Partition Tolerance: if one or more nodes fails the system still works and becomes consistent when the system comes on-line.
  • 30. Key-Value Storage Systems One Node - Performance Comparison (Web) • MySql – 3,030 sets/second. – 4,670 gets/second. • Redis – 11,200 sets/second. (3.7x MySQL) – 9,840 gets/second. (2.1x MySQL) • Tokyo Tyrant – 9,030 sets/second. (3.0x MySQL) – 9,250 gets/second. (2.0x MySQL) Please review resource #5
  • 31. Key-Value Storage Systems Two High-End Nodes - Performance Comparison (Web) • Redis – 89,230 sets/second. – 85,840 gets/second.
  • 32. Key-Value Storage Systems One Node - Performance Comparison • SQL Server – 2,900 sets/second. – 3,500 gets/second. • Vina* – 10,100 sets/second. (3.4x SQL Server) – 9,970 gets/second. (2.8x SQL Server) * Vina : Key-Value Storage System used inside Kngine.
  • 33. How it’s Works Any Key-Value storage system, consist of two primary layers: – Aggregation Layer – Storing Layer
  • 34. How it’s Works Any Key-Value storage system, consist of two primary layers: – Aggregation Layer • Manage the instances, replication and distribution. – Storing Layer • One or many Disk-based Hash-Table.
  • 35. How it’s Works (Storing Layer) On the board
  • 36. How it’s Works (Aggregation Layer) • Received the requests. • Route it to the target node. • Manage Partitioning, and Replicas. • The Partitioning, Replication done by Consistence Hashing algorithm. On the board Please review resource #6
  • 37. Key-Value Storage Systems • Amazon Dynamo. < Paper • Facebook Cassandra. < Open source • Tokyo Cabinet/Tyrant. < Open source • Redis < Open source • MongoDB < Open source
  • 38. Q / A
  • 39. References 1. The End of an Architectural Era (It’s Time for a Complete Rewrite). Paper. 2. Database Systems - Paul Beynon-Davies. Book. 3. Inside SQL Server engine - MS Press. Book. 4. Drop ACID and Think About Data. Highscalability.com. 5. Redis vs MySQL vs Tokyo Tyrant. Colin Howe’s Blog. 6. Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web. Paper. 7. Dynamo: Amazon’s Highly Available Key-value Store. Paper. 8. Redis, Tokyo Tyrant project. 9. Consistent Hashing. Tom white Blog.
  • 40. Resources 1. High Scalability blog. Highscalability.com 1. It’s all about innovation blog. Hfadeel.com/blog. 2. All Things Distributed. Allthingsdistributed.com 3. Tom White blog lexemetech.com
  • 41. Thanks… Dear all, All of my presentation content it's open source. Please feel free to use, copy, and re-distribute it.