SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
Advanced Non-Relational Schemas
For Big Data
by Victor Smirnov
Non-Relational Schema
● Is just a data structure
● That uses some Memory Model
● Typically, Key->Value mapping
● Where Key is an Integer ID
● And Value is an arbitrary array of a limited size or
memory block
● It's assumed that operations on memory blocks
are atomic.
Storage Options
Partial (Prefix) Sums Tree
● Given a sequence of S[0, N) = s0...sn-1 of non-
negative integers
● Sum(i) returns X = s0+s1+...+si.
● FindLT(X) returns position i of largest Sum(i) < X
● FindLE(X) is the same, but Sum(i) <= X
● We can also define range versions of Sum(i, j) and
FindLT(j, X)
● All operations perform in O(log N) time.
Packing Perfect Balanced Tree into an Array
Some Performance Bits
0
5e+06
1e+07
1.5e+07
2e+07
2.5e+07
3e+07
3.5e+07
4e+07
4.5e+07
5e+07
1 4 16 64 256 1024 4096 16384 65536 262144
Performance,operations/sec
Memory Block Size, Kb
PackedTree random read performance,
1 million random reads
PackedTree<BigInt>, 2 children
PackedTree<BigInt>, 32 children
std::set<BigInt>, 2 children
L1 L2 L3 RAM
Dynamic Vector
● An ordered sequence of elements (bytes, integers, strings)
of size N
● Acess(i) is O(log N)
● Insert(i, value) is O(log N)
● Delete(i) is O(log N)
● We can also define batch operations:
● Insert(i, value[])
● Delete(i, j)
● Split(i); Merge(AnotherVector);...
Dynamic Vector
Dynamic Vector Operations
● FindLT(i) returns the B where i bounds and
offset j in the block B for i
● Acces(i) is O(log N)
● Insert(i, value) and Delete(i) are also O(log N)
because the tree is balanced.
File System: Map<ID, Vector<T>>
● Maps ID to Vector<T>
● Merge all values into one large Dynamic Vector, in ID
order
● Create separate “index” sequence from pairs <ID, Offset>
in ID order
● We can represent this “index” sequence as two partial
sums tree, for ID and for Offset
● We can merge both these trees to one because they have
exactly the same structure: multi-index balanced partial
sums tree.
Map<ID, Vector<T>>
Sharing Tree Structures
● Tree structure sharing saves both space and time:
SPMD principle (single program, multiple data)
● We can align partial sum trees with different structures
using interpolation (padding with zeroes)
● We can merge index and data streams (index and
data) of Map<ID, Vector<T>> in one multi-stream tree.
● Merging the trees, we will try to fix index pairs and
corresponding data into the same leaf node of multi-
stream tree.
Multistream Tree Node Layout
Multistream Balanced Tree
ACID
● Atomic block operations are not enough
● Even simple tree update affects several blocks
● So, ACID is mandatory for advanced non-
relational schemas
● We can get ACID for free with Multi-Version
Concurrency Control (MVCC)
● We need Version History over data blocks
● Where each each transaction is a version.
Transaction History via MVCC
Version History Implementation
● Version History maps pair <ID, Version> to an ID of real
data block for that version and given ID
● We have Map<ID, Vector<Version, ID>>
● We can turn it to Version History by sorting each
Vector<Version, ID> (less sapce, slower)
● Or by creating additional partial sums tree index on top of it
(more space, but much faster)
● We can do it in just one multi-stream balanced tree
● MVCC requires some other data structures but they can be
designed by analogy.
Concurrency Handling
● Version History is a
complicated data
structure
● Concurrent access to it
must be restricted
● Split whole Version
History to shards
● And shard blocks by ID
to reduce lock
contention on Version
History
Distributed Storage and Processing
● MVCC is very
Raft/Paxos-friendly
● Because of Version
History and MVCC
● So we can join storage
nodes to Raft groups
● And join Raft groups
to larger groups with
2PC
● Using split/merge
model to map data to
nodes.
Bonus Slides
Searchable Bitmaps
● rank1(n) = number of ones in [0, n)
● select1(i) = position of i-th 1 in the bitmap
● rank0(n) = number of zeroes in [0, n)
● select0(i) = position of i-th 0 in the bitmap
Searchable Bitmap: Structure
Searchable Bitmaps: Views
LOUDS Tree
LOUDS Tree: Parent()
Wavelet Tree
● Searchable sequence [0...N) for large alphabets
● Rank(i, s) returns number of symbols s in [0, i)
● Select(k, s) returns position i of k-th symbol s
● Insert(i, s), Delere(i), Access(i) – insert, remove and
access the symbol at position i respectively
● All these operations have O(log N) time complexity
● By mapping numbers to symbols we can perform the
following lookup operations: >, >=, <, <=, <> in O(log N)
time.
Wavelet Tree: Structure
Wavelet Tree: Rank
Wavelet Tree: Inverted Index
Inverted Index Lookup
Thanks!
More details are at:
https://bitbucket.org/vsmirnov/memoria/wiki/MemoriaForBigData

Weitere ähnliche Inhalte

Was ist angesagt?

Linked list using Dynamic Memory Allocation
Linked list using Dynamic Memory AllocationLinked list using Dynamic Memory Allocation
Linked list using Dynamic Memory Allocationkiran Patel
 
DATA STRUCTURE
DATA STRUCTUREDATA STRUCTURE
DATA STRUCTURERohit Rai
 
Storage Management - Lecture 8 - Introduction to Databases (1007156ANR)
Storage Management - Lecture 8 - Introduction to Databases (1007156ANR)Storage Management - Lecture 8 - Introduction to Databases (1007156ANR)
Storage Management - Lecture 8 - Introduction to Databases (1007156ANR)Beat Signer
 
Introduction to data structure by anil dutt
Introduction to data structure by anil duttIntroduction to data structure by anil dutt
Introduction to data structure by anil duttAnil Dutt
 
Unsupervised Learning with Apache Spark
Unsupervised Learning with Apache SparkUnsupervised Learning with Apache Spark
Unsupervised Learning with Apache SparkDB Tsai
 
Ch11 - Operating Systems
Ch11 - Operating SystemsCh11 - Operating Systems
Ch11 - Operating SystemsBala Krish
 
Bca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structureBca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structureRai University
 
Introduction to data structure
Introduction to data structureIntroduction to data structure
Introduction to data structureVivek Kumar Sinha
 
DATA STRUCTURE IN C LANGUAGE
DATA STRUCTURE IN C LANGUAGEDATA STRUCTURE IN C LANGUAGE
DATA STRUCTURE IN C LANGUAGEshubhamrohiwal6
 
Bsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structureBsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structureRai University
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMartin Zapletal
 
Three steps to untangle data traffic jams
Three steps to untangle data traffic jamsThree steps to untangle data traffic jams
Three steps to untangle data traffic jamsBol.com Techlab
 
Path compression
Path compressionPath compression
Path compressionDEEPIKA T
 
Advanced data structures vol. 1
Advanced data structures   vol. 1Advanced data structures   vol. 1
Advanced data structures vol. 1Christalin Nelson
 
Data structure power point presentation
Data structure power point presentation Data structure power point presentation
Data structure power point presentation Anil Kumar Prajapati
 
Austin_SIAMCSE15
Austin_SIAMCSE15Austin_SIAMCSE15
Austin_SIAMCSE15Karen Pao
 
data structure
data structuredata structure
data structurehashim102
 
Data structure
Data structureData structure
Data structureNida Ahmed
 

Was ist angesagt? (20)

Linked list using Dynamic Memory Allocation
Linked list using Dynamic Memory AllocationLinked list using Dynamic Memory Allocation
Linked list using Dynamic Memory Allocation
 
DATA STRUCTURE
DATA STRUCTUREDATA STRUCTURE
DATA STRUCTURE
 
Storage Management - Lecture 8 - Introduction to Databases (1007156ANR)
Storage Management - Lecture 8 - Introduction to Databases (1007156ANR)Storage Management - Lecture 8 - Introduction to Databases (1007156ANR)
Storage Management - Lecture 8 - Introduction to Databases (1007156ANR)
 
Introduction to data structure by anil dutt
Introduction to data structure by anil duttIntroduction to data structure by anil dutt
Introduction to data structure by anil dutt
 
Unsupervised Learning with Apache Spark
Unsupervised Learning with Apache SparkUnsupervised Learning with Apache Spark
Unsupervised Learning with Apache Spark
 
Ch11 - Operating Systems
Ch11 - Operating SystemsCh11 - Operating Systems
Ch11 - Operating Systems
 
Introduction to data structure and algorithms
Introduction to data structure and algorithmsIntroduction to data structure and algorithms
Introduction to data structure and algorithms
 
Bca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structureBca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structure
 
Introduction to data structure
Introduction to data structureIntroduction to data structure
Introduction to data structure
 
DATA STRUCTURE IN C LANGUAGE
DATA STRUCTURE IN C LANGUAGEDATA STRUCTURE IN C LANGUAGE
DATA STRUCTURE IN C LANGUAGE
 
Bsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structureBsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structure
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache Spark
 
Three steps to untangle data traffic jams
Three steps to untangle data traffic jamsThree steps to untangle data traffic jams
Three steps to untangle data traffic jams
 
Path compression
Path compressionPath compression
Path compression
 
Data structure
Data structureData structure
Data structure
 
Advanced data structures vol. 1
Advanced data structures   vol. 1Advanced data structures   vol. 1
Advanced data structures vol. 1
 
Data structure power point presentation
Data structure power point presentation Data structure power point presentation
Data structure power point presentation
 
Austin_SIAMCSE15
Austin_SIAMCSE15Austin_SIAMCSE15
Austin_SIAMCSE15
 
data structure
data structuredata structure
data structure
 
Data structure
Data structureData structure
Data structure
 

Ähnlich wie Advanced Non-Relational Schemas For Big Data

Big Data processing with Apache Spark
Big Data processing with Apache SparkBig Data processing with Apache Spark
Big Data processing with Apache SparkLucian Neghina
 
High Dimensional Indexing using MongoDB (MongoSV 2012)
High Dimensional Indexing using MongoDB (MongoSV 2012)High Dimensional Indexing using MongoDB (MongoSV 2012)
High Dimensional Indexing using MongoDB (MongoSV 2012)Nicholas Knize, Ph.D., GISP
 
An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache CassandraSaeid Zebardast
 
Chromatic Sparse Learning
Chromatic Sparse LearningChromatic Sparse Learning
Chromatic Sparse LearningDatabricks
 
Spark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest CórdobaSpark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest CórdobaJose Mº Muñoz
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive Omid Vahdaty
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartMukesh Singh
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingPetr Zapletal
 
Data processing platforms with SMACK: Spark and Mesos internals
Data processing platforms with SMACK:  Spark and Mesos internalsData processing platforms with SMACK:  Spark and Mesos internals
Data processing platforms with SMACK: Spark and Mesos internalsAnton Kirillov
 
Apache Hive for modern DBAs
Apache Hive for modern DBAsApache Hive for modern DBAs
Apache Hive for modern DBAsLuis Marques
 
ICDE2015 Research 3: Distributed Storage and Processing
ICDE2015 Research 3: Distributed Storage and ProcessingICDE2015 Research 3: Distributed Storage and Processing
ICDE2015 Research 3: Distributed Storage and ProcessingTakuma Wakamori
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriDemi Ben-Ari
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introductionHektor Jacynycz García
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresqlZaid Shabbir
 
Distributed Decision Tree Induction
Distributed Decision Tree InductionDistributed Decision Tree Induction
Distributed Decision Tree Inductiongregoryg
 

Ähnlich wie Advanced Non-Relational Schemas For Big Data (20)

Big Data processing with Apache Spark
Big Data processing with Apache SparkBig Data processing with Apache Spark
Big Data processing with Apache Spark
 
High Dimensional Indexing using MongoDB (MongoSV 2012)
High Dimensional Indexing using MongoDB (MongoSV 2012)High Dimensional Indexing using MongoDB (MongoSV 2012)
High Dimensional Indexing using MongoDB (MongoSV 2012)
 
Neo4j: Graph-like power
Neo4j: Graph-like powerNeo4j: Graph-like power
Neo4j: Graph-like power
 
An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache Cassandra
 
Chromatic Sparse Learning
Chromatic Sparse LearningChromatic Sparse Learning
Chromatic Sparse Learning
 
Spark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest CórdobaSpark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest Córdoba
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Plyr
PlyrPlyr
Plyr
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, Streaming
 
Data processing platforms with SMACK: Spark and Mesos internals
Data processing platforms with SMACK:  Spark and Mesos internalsData processing platforms with SMACK:  Spark and Mesos internals
Data processing platforms with SMACK: Spark and Mesos internals
 
Apache Hive for modern DBAs
Apache Hive for modern DBAsApache Hive for modern DBAs
Apache Hive for modern DBAs
 
SQL Windowing
SQL WindowingSQL Windowing
SQL Windowing
 
Task and Data Parallelism
Task and Data ParallelismTask and Data Parallelism
Task and Data Parallelism
 
ICDE2015 Research 3: Distributed Storage and Processing
ICDE2015 Research 3: Distributed Storage and ProcessingICDE2015 Research 3: Distributed Storage and Processing
ICDE2015 Research 3: Distributed Storage and Processing
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introduction
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresql
 
User biglm
User biglmUser biglm
User biglm
 
Distributed Decision Tree Induction
Distributed Decision Tree InductionDistributed Decision Tree Induction
Distributed Decision Tree Induction
 

Kürzlich hochgeladen

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 

Kürzlich hochgeladen (20)

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 

Advanced Non-Relational Schemas For Big Data

  • 1. Advanced Non-Relational Schemas For Big Data by Victor Smirnov
  • 2. Non-Relational Schema ● Is just a data structure ● That uses some Memory Model ● Typically, Key->Value mapping ● Where Key is an Integer ID ● And Value is an arbitrary array of a limited size or memory block ● It's assumed that operations on memory blocks are atomic.
  • 4. Partial (Prefix) Sums Tree ● Given a sequence of S[0, N) = s0...sn-1 of non- negative integers ● Sum(i) returns X = s0+s1+...+si. ● FindLT(X) returns position i of largest Sum(i) < X ● FindLE(X) is the same, but Sum(i) <= X ● We can also define range versions of Sum(i, j) and FindLT(j, X) ● All operations perform in O(log N) time.
  • 5. Packing Perfect Balanced Tree into an Array
  • 6. Some Performance Bits 0 5e+06 1e+07 1.5e+07 2e+07 2.5e+07 3e+07 3.5e+07 4e+07 4.5e+07 5e+07 1 4 16 64 256 1024 4096 16384 65536 262144 Performance,operations/sec Memory Block Size, Kb PackedTree random read performance, 1 million random reads PackedTree<BigInt>, 2 children PackedTree<BigInt>, 32 children std::set<BigInt>, 2 children L1 L2 L3 RAM
  • 7. Dynamic Vector ● An ordered sequence of elements (bytes, integers, strings) of size N ● Acess(i) is O(log N) ● Insert(i, value) is O(log N) ● Delete(i) is O(log N) ● We can also define batch operations: ● Insert(i, value[]) ● Delete(i, j) ● Split(i); Merge(AnotherVector);...
  • 9. Dynamic Vector Operations ● FindLT(i) returns the B where i bounds and offset j in the block B for i ● Acces(i) is O(log N) ● Insert(i, value) and Delete(i) are also O(log N) because the tree is balanced.
  • 10. File System: Map<ID, Vector<T>> ● Maps ID to Vector<T> ● Merge all values into one large Dynamic Vector, in ID order ● Create separate “index” sequence from pairs <ID, Offset> in ID order ● We can represent this “index” sequence as two partial sums tree, for ID and for Offset ● We can merge both these trees to one because they have exactly the same structure: multi-index balanced partial sums tree.
  • 12. Sharing Tree Structures ● Tree structure sharing saves both space and time: SPMD principle (single program, multiple data) ● We can align partial sum trees with different structures using interpolation (padding with zeroes) ● We can merge index and data streams (index and data) of Map<ID, Vector<T>> in one multi-stream tree. ● Merging the trees, we will try to fix index pairs and corresponding data into the same leaf node of multi- stream tree.
  • 15. ACID ● Atomic block operations are not enough ● Even simple tree update affects several blocks ● So, ACID is mandatory for advanced non- relational schemas ● We can get ACID for free with Multi-Version Concurrency Control (MVCC) ● We need Version History over data blocks ● Where each each transaction is a version.
  • 17. Version History Implementation ● Version History maps pair <ID, Version> to an ID of real data block for that version and given ID ● We have Map<ID, Vector<Version, ID>> ● We can turn it to Version History by sorting each Vector<Version, ID> (less sapce, slower) ● Or by creating additional partial sums tree index on top of it (more space, but much faster) ● We can do it in just one multi-stream balanced tree ● MVCC requires some other data structures but they can be designed by analogy.
  • 18. Concurrency Handling ● Version History is a complicated data structure ● Concurrent access to it must be restricted ● Split whole Version History to shards ● And shard blocks by ID to reduce lock contention on Version History
  • 19. Distributed Storage and Processing ● MVCC is very Raft/Paxos-friendly ● Because of Version History and MVCC ● So we can join storage nodes to Raft groups ● And join Raft groups to larger groups with 2PC ● Using split/merge model to map data to nodes.
  • 21. Searchable Bitmaps ● rank1(n) = number of ones in [0, n) ● select1(i) = position of i-th 1 in the bitmap ● rank0(n) = number of zeroes in [0, n) ● select0(i) = position of i-th 0 in the bitmap
  • 26. Wavelet Tree ● Searchable sequence [0...N) for large alphabets ● Rank(i, s) returns number of symbols s in [0, i) ● Select(k, s) returns position i of k-th symbol s ● Insert(i, s), Delere(i), Access(i) – insert, remove and access the symbol at position i respectively ● All these operations have O(log N) time complexity ● By mapping numbers to symbols we can perform the following lookup operations: >, >=, <, <=, <> in O(log N) time.
  • 31. Thanks! More details are at: https://bitbucket.org/vsmirnov/memoria/wiki/MemoriaForBigData