Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Aerospike meetup july 2019 | Big Data Demystified

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 45 Anzeige

Aerospike meetup july 2019 | Big Data Demystified

Herunterladen, um offline zu lesen

Building a low latency (sub millisecond), high throughput database that can handle big data AND linearly scale is not easy - but we did it anyway...
In this session we will get to know Aerospike, an enterprise distributed primary key database solution.

- We will do an introduction to Aerospike - basic terms, how it works and why is it widely used in mission critical systems deployments.
- We will understand the 'magic' behind Aerospike ability to handle small, medium and even Petabyte scale data, and still guarantee predictable performance of sub-millisecond latency
- We will learn how Aerospike devops is different than other solutions in the market, and see how easy it is to run it on cloud environments as well as on premise.

We will also run a demo - showing a live example of the performance and self-healing technologies the database have to offer.

Building a low latency (sub millisecond), high throughput database that can handle big data AND linearly scale is not easy - but we did it anyway...
In this session we will get to know Aerospike, an enterprise distributed primary key database solution.

- We will do an introduction to Aerospike - basic terms, how it works and why is it widely used in mission critical systems deployments.
- We will understand the 'magic' behind Aerospike ability to handle small, medium and even Petabyte scale data, and still guarantee predictable performance of sub-millisecond latency
- We will learn how Aerospike devops is different than other solutions in the market, and see how easy it is to run it on cloud environments as well as on premise.

We will also run a demo - showing a live example of the performance and self-healing technologies the database have to offer.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Aerospike meetup july 2019 | Big Data Demystified (20)

Anzeige

Weitere von Omid Vahdaty (20)

Aktuellste (20)

Anzeige

Aerospike meetup july 2019 | Big Data Demystified

  1. 1. Extreme Performance, take NoSQL to the Next Level Zohar Elkayam, Solutions Architect Aerospike
  2. 2. 2 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. ▪ Takes ~20ms to flap its wings, 50-70 wing flaps every second ▪ The smallest bird in the world, weighs less than a penny (2 grams) Hummingbird
  3. 3. 3 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. ▪ ~300K read/writes every second, 99.9% are <1ms latency ▪ Unmatched reliability and uptime, deployable anywhere, lowest TCO Aerospike
  4. 4. 5 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. Company • Founded: 2009, Silicon Valley • Employees: 100 employees worldwide • Customers: 200+ Enterprise Key Differentiators • Patented Hybrid Memory Architecture • Significant data storage benefits at scale • High Performance with Strong Consistency • Significant TCO reduction 5 - 30x Products • Aerospike Enterprise Edition • Enterprise-grade, internet scale database solution • Powers real-time, mission critical applications and analysis • Integrates Spark, Hadoop, Kafka • Aerospike Community Edition • For prototyping, testing, evaluating About Aerospike ADTECH ECOMMERCE FINANCIAL NEWTECH TELCO/MEDIA GAMING
  5. 5. 6 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. ▪ From RDBMS to NoSQL ▪ Types of NoSQL ▪ Introduction to Aerospike ▪ Predictable Performance ▪ Scale ▪ DevOps ▪ What About Programming? ▪ Editions and Where To Start Agenda
  6. 6. From RDBMS to NoSQL
  7. 7. 8 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. ▪ We want scalable, durable, high volume, high velocity, distributed data storage that can handle non-structured data and that will fit our specific need ▪ RDBMS is too generic and doesn’t cut it any more – it can do the job but it is not cost effective to our usages Why NoSQL? The Challenge
  8. 8. 9 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. ▪ Let’s take some parts of the standard RDBMS out to and design the solution to our specific uses ▪ NoSQL databases have been around for ages under different names/solutions ▪ Over 150 different brands and solutions (http://nosql-database.org/). The Solution: NoSQL
  9. 9. 10 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. ▪ Some applications need very few database features, but need high scale. ▪ Desire to avoid data/schema pre-design altogether for simple applications. ▪ A need for a low-latency, low-overhead API to access data. ▪ Simplicity - do not need fancy indexing – just fast lookup by primary key. Why Would We Choose NoSQL?
  10. 10. 11 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. ▪ Developer friendly, DBAs are less needed ▪ Agile and Schema-less: semi-structured or non-structured ▪ Might be In-Memory ▪ No (or loose) Transactions ▪ No joins Why NoSQL? (cont.)
  11. 11. Types of NoSQL
  12. 12. 13 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. Type Examples Key-Value Store Document Store DWH and MPP Database Graph Store Others Basic NoSQL Taxonomy
  13. 13. 14 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. ▪ Distributed hash tables – primary key databases ▪ Very fast to get a single value ▪ Examples: – Aerospike – Amazon DynamoDB – Berkeley DB – Redis Key Value Store
  14. 14. 15 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. ▪ Similar to Key/Value, but value is a document ▪ JSON or something similar, flexible schema ▪ Agile technology ▪ Examples: – MongoDB – Couchbase Document Store
  15. 15. 16 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. Relational Data Modeling – Table Centric Schema, 3rdNF
  16. 16. 17 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. ▪ De-normalization implies duplication of data – Queries required dictate Data Model – No “Joins” across Tables (No View Table generation) ▪ Aggregation (Multiple Data Entry) vs Association (Single Data Entry) – “Consists of” vs “related to” NoSQL Modeling: Record Centric Data Model
  17. 17. Aerospike Enterprise-Grade, Real-time Database Platform
  18. 18. 19 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. NoSQL – Getting to Scale Aerospike Delivers Speed at Scale, Predictable Performance, Highest Availability, and Lowest TCO NoSQL Market TCO ($) Scale TB NoSQL Market Speed TPS Scale TB Significant functional overlap - Commodity DB problem set Alternative TCO Unique Functional Capabilities and High Value Problem Set Aerospike TCO
  19. 19. 20 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. ▪ Hybrid Memory Architecture – index in DRAM, Data on SSD. ▪ Unlimited Key Value pairs, record size up to 8MB. ▪ Scalar & Complex Data Types. ▪ Distributed Queries on secondary indices (exact match, integer range, geospatial queries). ▪ User Defined Functions extend the database. ▪ Patented Indexed Map-Reduce – distributed queries can be filtered, transformed, aggregated, and reduced. What Is Aerospike? High Performance Distributed NoSQL Database
  20. 20. 21 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. Architecture Delivers Scale DEFRAG HYBRID DRAM / FLASH DRAM INDEX EXPIRY DIGEST & TREE INFO RECORD METADATA STORAGE POINTER BIN1 BIN 2 BIN 3 WRITE QUEUE STORAGE OPERATIONS DATA IN FLASH READS Highlights DEFRAG ALL FLASH EXPIRY DIGEST & TREE INFO RECORD METADATA STORAGE POINTER BIN1 BIN 2 BIN 3 WRITE QUEUE FLASH OPERATIONS READS DATA IN FLASH DEFRAG PMEM INDEX EXPIRY DIGEST & TREE INFO RECORD METADATA STORAGE POINTER BIN1 BIN 2 BIN 3 WRITE QUEUE STORAGE OPERATIONS READS DATA IN FLASH HYBRID PMEM / FLASHRAM NAMESPACE DRAM EXPIRY DIGEST & TREE INFO RECORD METADATA STORAGE POINTER BIN1 BIN 2 BIN 3 WRITE QUEUE STORAGE OPERATIONS OPTIONAL PERSISTENCE DATA BACKED UP IN ROTATIONAL DISK OR FLASH SSDS INDEX & DATA IN MEMORY
  21. 21. 22 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. Attributes of a Hybrid Memory Architecture Lowest TCO Predictable Performance High Uptime Low Management Indexes in DRAM Data on SSD Massively parallel
  22. 22. 23 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. ▪ Small amount of DRAM – Avoid cost and server sprawl ▪ No cache, so no cache misses – Predictable, low latency performance on NVMe/SSD ▪ Optimized for SSDs – Reads done in parallel – Writes done optimally for SSD to reduce wear-and-tear Indexes in DRAM, Data on SSD
  23. 23. 24 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. CLUSTER DATA 5% 5% 5% 5% 5% % OF CLUSTER DATA CLUSTER DATA CLUSTER DATA CLUSTER DATA 25% 25% 25% ▪ Automatic Distribution of Data using Smart PartitionsTM Algorithm – Even amount data on every node and flash device – All hardware used equally – Load on all servers is balanced – No “hot spots” – No config changes as workload or use case changes ▪ Smart Clients – Single “hop” from client to server – Cluster-spanning operations (scan, query, batch) sent to all processing nodes for parallel processing Massively Parallel
  24. 24. 25 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. Data is distributed evenly across nodes in a cluster using the Aerospike Smart Partitions™ algorithm. ▪ Automatic Sharding ▪ 4096 Data Partitions ▪ Even distribution of ▪ Partitions across nodes ▪ Records across Partitions ▪ Data across Flash devices ▪ Primary and Replica Partitions Even Data Distribution
  25. 25. 26 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. ▪ Distributed Hash Table with No Hotspots ▪ Every key hashed with RIPEMD160 into an ultra efficient 20 byte (fixed length) string ▪ Hash + additional (fixed 64 bytes) data forms index entry in RAM ▪ Some bits from hash value are used to calculate the Partition ID (4096 partitions) ▪ Partition ID maps to Node ID in the cluster ▪ 1 Hop to data ▪ Smart Client simply calculates Partition ID to determine Node ID ▪ No Load Balancers required Distributed Hash Based Partitioning
  26. 26. 27 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. Massively Parallel Scaling Up Take full advantage of all the hardware Scaling Out Scale linearly with number of nodes
  27. 27. 28 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. ▪ Performance Built In – Written in C with memory-optimized libraries => No garbage collection – Continual defragmentation of storage => No compactions – Known master for any piece of data => No quorum reads – Designed as a distributed database => Networking primary consideration ▪ Storage Optimizations – Writes done to memory buffer => Avoid storage slowdown – Storage used in “block” mode => No file system overhead – Reads and writes striped across devices => Concurrent use of hardware ▪ Smart Clients – Single “hop” from client to server – Partition map stored on client – Automatic load balancing – no external load balancers! Aerospike’s Predictable Performance
  28. 28. 29 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. DIGEST & TREE INFO RECORD METADATA STORAGE POINTER Reads Single hop DRAM Read OWNING SERVER PRIMARY INDEX STORAGE Writes Single hop DRAM Write OWNING SERVER PRIMARY INDEX MEMORY BUFFER Flush ASYNC STORAGE DIGEST & TREE INFO RECORD METADATA STORAGE POINTER DRAM REPLICA SERVER PRIMARY INDEX Synchronous Replica Write, Single hop Predictable Performance CLIENT CLIENT Write MEMORY BUFFER Flush ASYNC STORAGE DIGEST & TREE INFO RECORD METADATA STORAGE POINTER
  29. 29. 30 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. Predictable Performance Performance should be predictable irrespective of workload Indexes in DRAM Data on SSD Massively parallel
  30. 30. 31 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. High Uptime, Low Management High Uptime • “Shared Nothing” Architecture • No single points of failure • No cascading failures • Seamless loss of nodes with self-heal capability Low Management • Automatic sharding of data • No re-tuning of cluster for use-case changes • No requirement for caches • Smaller number of nodes for easier management • “Set and forget” DevOps management
  31. 31. 32 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. Adding, or removing a node, the cluster automatically rebalances 1. Cluster discovers new node via gossip protocol 2. Paxos vote determines new data organization 3. Partition migrations occur After migration is complete, the cluster is evenly balanced. Clients keep working during rebalancing. Automatic Rebalancing
  32. 32. 33 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. XDR Topologies Star Replication Simple Active-Passive Simple Active-Active More Complex Topology
  33. 33. 34 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. XDR Architecture Each node in the clusterDistributed clusters
  34. 34. 35 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. ▪ Node failure within a cluster – nodes with replica data will continue ▪ Link failure XDR keeps track of link failures and data to be shipped over that link. It will recover when the link comes up. XDR Failure Handling Node failure in a Cluster Link failure between Clusters
  35. 35. 36 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. What About Programming?
  36. 36. 37 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. How Data is Organized
  37. 37. 38 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. Aerospike is a Primary Key Database Objects stored in Aerospike are called records A bin holds the value of a supported data type: integer, double, string, bytes, list, map, geospatial Every record is uniquely identified by the 3-tuple (namespace, set, primary-key) A record contains one or more bins (namespace, set, primary-key) EXP – Expiration Timestamp LUT – Last Update Time GEN – Generation RECORD EXP LUT GEN BIN1 BIN2
  38. 38. 39 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. ▪ Aerospike is a row-oriented distributed database ▪ Rows (records) contain one or more columns (bins) ▪ Similar to an RDBMS with primary-key table lookups ▪ Single record transactions ▪ Namespaces can be configured for strong consistency What About Datatypes? Aerospike RDBMS Namespace Tablespace or Database Set Table Record Row Bin Column Bin type Integer Double String Bytes List (Unordered, Ordered) Map (Unordered, K-Ordered, KV-Ordered) GeoJSON
  39. 39. 40 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. ▪ Common type framework ▪ Native language bindings ▪ Internal msgpack format (efficient) ▪ C for performance ▪ Data layout “in record” (copy on write) ▪ List ▪ Push, Pop (number, collection) ▪ Store any type as entry (including list/map) ▪ Select range by position ▪ Integrated with secondary index ▪ Sorted Map ▪ Select by key or value (1st level only) ▪ Selected ranges List and Map
  40. 40. 41 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. ▪ Map operations supported by the server. Method names in the clients might be different. ▪ set_type() (unordered, k-ordered or kv-ordered) ▪ add(), add_items(), increment(), decrement() ▪ clear() ▪ remove_by_key(), remove_by_index(), remove_by_rank() ▪ remove_by_key_interval(), remove_by_index_range() ▪ remove_by_value_interval(), remove_by_rank_range(), remove_all_by_value() ▪ remove_all_by_key_list(), remove_all_by_value_list() ▪ size() ▪ get_by_key(), get_by_index(), get_by_rank() ▪ get_by_key_interval(), get_by_index_range() ▪ get_by_value_interval(), get_by_rank_range(), get_all_by_value() ▪ get_all_by_key_list(), get_all_by_value_list() CDT: Map Operations
  41. 41. 42 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. ▪ List operations supported by the server. Method names in the clients might be different. ▪ set_type() (unordered, ordered) ▪ append(), append_items(), insert(), insert_items(), set() ▪ increment() ▪ sort(), clear(), size() ▪ remove_by_index(), remove_by_index_range() ▪ remove_by_rank(), remove_by_rank_range() ▪ remove_by_value(), remove_by_value_interval(), remove_all_by_value() ▪ remove_all_by_value_list(), remove_by_value_rel_rank_range() ▪ get_by_index(), get_by_index_range() ▪ get_by_rank(), get_by_rank_range() ▪ get_by_value(), get_by_value_interval(), get_all_by_value() ▪ get_all_by_value_list(), get_by_value_rel_rank_range() CDT: List Operations
  42. 42. 43 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. Aerospike Database: Licensing, Limitations ENTERPRISE COMMUNITY Aerospike Server License Type Commercial License AGPL Aerospike Client License Type Apache v2 Apache v2 Binaries Tested & Verified Available Transactions or Queries per Second, max Unlimited Unlimited Namespaces, max 2 32 2 Objects per Namespace per Node, max 2 32 Billion 4 Billion Cost of Development Servers Free with Commercial License Free Subscriptions Licensed by volume of unique data managed and active production clusters Free aerospike.com/products/product-matrix/2 See Known Limitations for more details
  43. 43. 44 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. Resources aerospike.com/download/ aerospike.com/lp/aerospike-community-edition/
  44. 44. 45 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. Time for Q&A!
  45. 45. 46 Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc. Thank You! zelkayam@aerospike.com

×