Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Poster for NSDI
1. PlanetenWachHundNetz: Locality-Aware MapReduce for Peer-to-Peer
Robust Systems Group
Vitus Lorenz-Meyer, Eric Freudenthal University of
Texas at El Paso
Other Approaches
From distributed databases
Key-based Tree
Routing based on node keys
• Children of node n with key k are
nodes with keys closest to k with n-1
prefix bits in common with k
• Node n is its own child!
Aggregation tree Stochastically
balanced
Motivation Prefix tree
• Efficient reduction of data from self-
organized (P2P) sources
Problem
• Dynamic membership
• Static tree (for parallel prefix) Non-existent node 001 (A)
inappropriate •Not locality aware!
•Only 1 global tree.
Problems w/ Known Techniques
Finger-table based tree
• Do not localize network traffic 000 takes the role • Tree edges selected from nodes
• Branching factor not controlled of its parent 001 finger tables (towards query root).
• State lost when parent nodes fail/leave
Paths to 111 Non-optimized
• Details on right side of figure fan-out!
Message arrives
Our Goals at closest node 000 (B)
• Exploit locality, minimize remote
communication Resulting Aggre-
gation tree
• Efficient parallelism: Control of Message ends up at self:
branching factor No more nodes -> discard
• Minimal state-loss when nodes
enter/leave
• Self-healing continuous queries
Implementation •Unique tree for every query
•Locality awareness from DHT
• Left-tree
•but does not force locality
• Node role determined by key •Challenge: what to do if nodes
• Locality/clustering: Coral or Oasis enter/leave
Goal: Efficient Aggregation Tree
Our Solution: Key-based MapReduce (KMR)
Overview Finding nodes
Locality-aware Clustering
MapReduce (Google)
• Associate aggregation tree with a operation-specific root key. n = key of some node
Characteristics t = root-id of aggregation tree
Generalized Parallel Prefix Coral (locality-aware DsHT)
π = n’s position among leaves (from left)
• Rooted at host whose key is closest to root key
• Associative & communitive • π=n⊕t
Coral partitions its members in • One tree per query, distributes load over all nodes for multiple
operations mapped to an key of n’s parent at height h: π / 2h ⊕ t
clusters based on connection queries at a time
aggregation tree latency • Nodes that are roots of sub-trees are their own left-children all the
To find parent, n searches for smallest height such
way down to their appropriate place among leaves.
that parent is in the tree.
• Google‘s map-reduce framework Aggregation tree at each • Each node is leaf and root of the sub-tree at depth d if it has the d’th
Data is mapped to low-dimensional
Collecting data
bit of root-ID flipped; node may also substitute for missing parents.
cluster
tuples • Set of nodes and MR-specific root-key fully defines a unique tree (complete
• Each node reduces messages from children, sends result(s) to parent
•
Tuples are recursively combined Build cluster aggregation trees knowledge not required: lookup parent, sibling, children)
•
using an associative reduction A single member of a cluster-tree
Challenges Approach
algorithm that emits a (summary) serves as aggregation
tuple representative • Find children DHT lookup
• Find parent DHT lookup
Example: Challenges • Establishing new tree at exit/departure DHT lookup (key-space determines topology)
• Counting occurrence of a particular
• Choosing sub-cluster
word in a document:
Building the Tree
representatives
Provide map algorithm that
• Constructing aggregation tree for
emits the tuple ‘(1)’ for every Send build-message to all siblings down the tree
sub-cluster
Continuous Queries
occurrence Each node forwards build-message to all its siblings all the way down
• Avoiding inefficient setup broadcast
Provide a reduce algorithm • Recall that node is its own child: siblings include its children!
Maintaining tree consistency under churn
that sums these tuples
Non-existing parent
• Each node periodically checks roles of self and immediate relatives
• Closest node fulfills its role
• Churn (membership change) must result in tree-node role change
• Therefore, search for parent will discover this node
• Change propagated by DHT finger tables
Non-existing child
• If child not in DHT, then node is a leaf
Challenges Approach
Finding children
• Use DHT • What roles to check when periodically check (lookup) self, children,
parent only when its assuming role
parent → child, old → new
• Who notifies who