SlideShare a Scribd company logo
1 of 82
Behavioral Simulations in MapReduce 
Guozhang Wang, Marcos Vaz Salles, 
Benjamin Sowell, Xun Wang, Tuan Cao, Alan Demers, 
Johannes Gehrke, Walker White 
Cornell University 
1
What are Behavioral Simulations? 
• Simulations of individuals that interact to 
create emerging behavior in complex systems 
• Application Areas 
– Traffic networks 
– Ecology systems 
– Sociology systems 
– etc 
2
Why Behavioral Simulations? 
• Traffic 
– Congestion cost $87.2 billion 
in the U.S. in 2007 
– More people killed by air 
pollution than accidents 
– Detailed models: micro-simulators 
not scale to NYC! 
• Ecology 
– Hard to scale to large fish 
schools or locust swarms 3
Challenges of Behavioral Simulations 
• Easy to program  not scalable 
– Examples: Swarm, Mason 
– Typically one thread per agent, lots of contention 
• Scalable  hard to program 
– Examples: TRANSIMS, DynaMIT (traffic), GPU 
implementation of fish simulation (ecology) 
– Hard-coded models, compromise level of detail 
4
Challenges of Behavioral Simulations 
• Easy to program  not scalable 
– Examples: Swarm, Mason 
– Typically one thread per agent, lots of contention 
• Scalable  hard to program 
– Examples: TRANSIMS, DynaMIT (traffic), GPU 
implementation of fish simulation (ecology) 
– Hard-coded models, compromise level of detail 
4 
Can we do better?
Our Contribution 
• A new simulation platform that combines: 
– Ease of programming 
• Program simulations in State-Effect pattern 
• BRASIL: Scripting language for domain scientists 
– Scalability 
• Execute simulations in the MapReduce model 
• BRACE: Special-purpose MapReduce engine 
5
Talk Outline 
• Motivation 
• Ease of Programming 
– Program Simulations in State-Effect Pattern 
– BRASIL 
• Scalability 
– Execute Simulations in MapReduce Model 
– BRACE 
• Experiments 
• Conclusion 
6
A Running Example: Fish Schools 
• Adapted from Couzin et al., Nature 2005 
7 
• Fish Behavior 
– Avoidance: if too 
close, repel other fish 
– Attraction: if seen 
within range, attract 
other fish
A Running Example: Fish Schools 
• Adapted from Couzin et al., Nature 2005 
7 
α 
• Fish Behavior 
– Avoidance: if too 
close, repel other fish 
– Attraction: if seen 
within range, attract 
other fish
A Running Example: Fish Schools 
• Adapted from Couzin et al., Nature 2005 
7 
α 
ρ 
• Fish Behavior 
– Avoidance: if too 
close, repel other fish 
– Attraction: if seen 
within range, attract 
other fish
A Running Example: Fish Schools 
• Concurrency: agents are 
concurrent within a tick 
• Interactions: agents 
continuously interact 
• Spatial Locality: agents 
have limited visibility 
8 
• Time-stepping: agents proceed in ticks 
α 
ρ
Classic Solutions for Concurrency 
9 
• Preempt conflicts  locking 
• Rollback in case of conflicts  optimistic 
concurrency control 
• Problems: 
– Strong iterations  many conflicts 
• Either lots of lock contention 
• Or lots of rollbacks 
– Does not scale well
State-Effect Pattern 
• Programming pattern to deal with 
concurrency 
• Follows time-stepped model 
• Core Idea: Make all actions inside of a tick 
order-independent 
10
States and Effects 
• States: 
– Snapshot of agents at the beginning of the tick 
• position, velocity vector 
11 
• Effects: 
– Intermediate results 
from interaction, used 
to calculate new states 
• sets of forces from other 
fish 
α 
ρ
States and Effects 
• States: 
– Snapshot of agents at the beginning of the tick 
• position, velocity vector 
11 
• Effects: 
– Intermediate results 
from interaction, used 
to calculate new states 
• sets of forces from other 
fish 
α 
ρ
Two Phases of a Tick 
• Query: capture agent interaction 
– Read states  write effects 
– Each effect set is associated with 
combinator function 
– Effect writes are order-independent 
• Update: refresh world for next tick 
– Read effects  write states 
– Reads and writes are totally local 
– State writes are order-independent 
Tick 
Query 
Update 
12
A Tick in State-Effect 
• Query 
– For fish f in visibility α: 
• Write repulsion to f’s effects 
– For fish f in visibility ρ: 
• Write attraction to f’s effects 
• Update 
α 
ρ 
– new velocity = combined 
repulsion + combined 
attraction + old velocity 
– new position = old position + 
old velocity 13
A Tick in State-Effect 
• Query 
– For fish f in visibility α: 
• Write repulsion to f’s effects 
– For fish f in visibility ρ: 
• Write attraction to f’s effects 
• Update 
α 
ρ 
– new velocity = combined 
repulsion + combined 
attraction + old velocity 
– new position = old position + 
old velocity 13
A Tick in State-Effect 
• Query 
– For fish f in visibility α: 
• Write repulsion to f’s effects 
– For fish f in visibility ρ: 
• Write attraction to f’s effects 
• Update 
α 
ρ 
– new velocity = combined 
repulsion + combined 
attraction + old velocity 
– new position = old position + 
old velocity 13
A Tick in State-Effect 
• Query 
– For fish f in visibility α: 
• Write repulsion to f’s effects 
– For fish f in visibility ρ: 
• Write attraction to f’s effects 
• Update 
α 
ρ 
– new velocity = combined 
repulsion + combined 
attraction + old velocity 
– new position = old position + 
old velocity 13
A Tick in State-Effect 
• Query 
– For fish f in visibility α: 
• Write repulsion to f’s effects 
– For fish f in visibility ρ: 
• Write attraction to f’s effects 
• Update 
α 
ρ 
– new velocity = combined 
repulsion + combined 
attraction + old velocity 
– new position = old position + 
old velocity 13
Fish in State-Effect 
• Query 
– For fish f in visibility α: 
• Write repulsion to f’s effects 
– For fish f in visibility ρ: 
• Write attraction to f’s effects 
• Update 
– new velocity = combined 
repulsion + combined 
attraction + old velocity 
– new position = old position + 
old velocity 14
BRASIL (Big Red Agent SImulation Language) 
15 
• High-level language for domain scientists 
• Object-oriented style 
• Programs specify behavior logic of 
individual agents
Fish in BRASIL 
16 
class Fish { 
// The fish location & velocity (x) 
public state float x : x + vx; #range[-1,1]; 
public state float vx : vx + rand() + avoidx / count * vx; 
// Used to update our velocity (x) 
private effect float avoidx : sum; 
private effect int count : sum; 
/** The query-phase for this fish. */ 
public void run() { 
// Use "forces" to repel fish too close 
foreach(Fish p : Extent<Fish>) { 
p.avoidx <- 1 / abs(x - p.x); 
... 
p.count <- 1; 
}}}
Fish in BRASIL 
16 
class Fish { 
// The fish location & velocity (x) 
public state float x : x + vx; #range[-1,1]; 
public state float vx : vx + rand() + avoidx / count * vx; 
// Used to update our velocity (x) 
private effect float avoidx : sum; 
private effect int count : sum; 
/** The query-phase for this fish. */ 
public void run() { 
•Syntax enforces state-effect pattern 
// Use "forces" to repel fish too close 
foreach(Fish p : Extent<Fish>) { 
p.avoidx <- 1 / abs(x - p.x); 
... 
p.count <- 1; 
}}}
Fish in BRASIL 
16 
class Fish { 
// The fish location & velocity (x) 
public state float x : x + vx; #range[-1,1]; 
public state float vx : vx + rand() + avoidx / count * vx; 
// Used to update our velocity (x) 
private effect float avoidx : sum; 
private effect int count : sum; 
/** The query-phase for this fish. */ 
public void run() { 
•Syntax enforces state-effect pattern 
•Translates to Monad Algebra 
// Use "forces" to repel fish too close 
foreach(Fish p : Extent<Fish>) { 
p.avoidx <- 1 / abs(x - p.x); 
... 
p.count <- 1; 
}}}
Fish in BRASIL 
16 
class Fish { 
// The fish location & velocity (x) 
public state float x : x + vx; #range[-1,1]; 
public state float vx : vx + rand() + avoidx / count * vx; 
// Used to update our velocity (x) 
private effect float avoidx : sum; 
private effect int count : sum; 
/** The query-phase for this fish. */ 
•Syntax enforces state-effect pattern 
•Translates to Monad Algebra 
P = <1:훱1 훰 훱p,2:훱2> 훰 PAIRWITH2 훰 휎 훱1= 훱2 훰 훱key 
}}} 
훰 GET 훰 훱x 
E1 = <1:훱1 훰 훱p,2:휌(avoidx),3:1 / (훱1 훰 훱x – P)> 
E2 = <1:훱1 훰 훱p,2:휌(count),3:1> 
B = <1:훱1,2:훱2,3:훱2 ⊕ (E1 훰 SNG) ⊕ (E2 훰 SNG)> 
F = <1:훱1,2:훱2,3:<1:훱1 훰 푥p(훱2) 훰 PAIRWITHp, 2:훱2, 3:훱3> 훰 FLATMAP(B 훰 훱3)>
Fish in BRASIL 
16 
•Syntax enforces state-effect pattern 
•Translates to Monad Algebra 
•Can reuse classic DB optimization techniques 
P = <1:훱1 훰 훱p,2:훱2> 훰 PAIRWITH2 훰 휎 훱1= 훱2 훰 훱key 
훰 GET 훰 훱x 
E1 = <1:훱1 훰 훱p,2:휌(avoidx),3:1 / (훱1 훰 훱x – P)> 
E2 = <1:훱1 훰 훱p,2:휌(count),3:1> 
B = <1:훱1,2:훱2,3:훱2 ⊕ (E1 훰 SNG) ⊕ (E2 훰 SNG)> 
F = <1:훱1,2:훱2,3:<1:훱1 훰 푥p(훱2) 훰 PAIRWITHp, 2:훱2, 3:훱3> 훰 FLATMAP(B 훰 훱3)> 
•Details of translation in our VLDB 2010 paper
Talk Outline 
17 
• Motivation 
• Ease of Programming 
– Program Simulations in State-Effect Pattern 
– BRASIL 
• Scalability 
– Execute Simulations in MapReduce Model 
– BRACE 
• Experiments 
• Conclusion
How to Scale to Millions of Fish? 
18 
• Use multiple nodes in a cluster of 
machines for large simulation scenarios 
• Need to efficiently parallelize 
computations of state-effect pattern
State-Effect Revisited 
19 
• Agent partitioning with 
replications across nodes 
• Communicate new states 
before next tick’s query 
phase 
Tick 
Query 
state effects 
Communicate 
Effects 
Update 
effects  new state 
Communicate 
New State
State-Effect Revisited 
20 
• Agent partitioning with 
replications across nodes 
• Communicate new states 
before next tick’s query 
phase 
• Communicate effect 
assignments before 
update phase 
Tick 
Query 
state effects 
Communicate 
Effects 
Update 
effects  new state 
Communicate 
New State
From State-Effect to Map-Reduce 
21 
Tick 
Query 
state effects 
Communicate 
Effects 
Update 
effects  new state 
Communicate 
New State
From State-Effect to Map-Reduce 
Map1 t 
… 
Distribute data 
… 
21 
Tick 
Query 
state effects 
Communicate 
Effects 
Update 
effects  new state 
Communicate 
New State
From State-Effect to Map-Reduce 
Map1 t 
… 
Distribute data 
Reduce1 t Assign 
effects (partial) 
… 
21 
Tick 
Query 
state effects 
Communicate 
Effects 
Update 
effects  new state 
Communicate 
New State
From State-Effect to Map-Reduce 
Map1 t 
Reduce1 t 
Map2 t 
Reduce2 t 
… 
Distribute data 
Assign 
effects (partial) 
Forward data 
Aggregate 
effects 
… 
21 
Tick 
Query 
state effects 
Communicate 
Effects 
Update 
effects  new state 
Communicate 
New State
From State-Effect to Map-Reduce 
Map1 t 
Reduce1 t 
Map2 t 
Reduce2 t 
Map1 t+1 
… 
… 
Distribute data 
Assign 
effects (partial) 
Forward data 
Aggregate 
effects 
Update 
Redistribute data 
… 
21 
Tick 
Query 
state effects 
Communicate 
Effects 
Update 
effects  new state 
Communicate 
New State
BRACE (Big Red Agent Computation Engine) 
22 
• Special-purpose MapReduce engine for 
behavioral simulations 
• Basic Optimizations 
– Keep data in main memory 
– Do Not checkpoint every iteration 
• Optimizations based on Spatial Properties: 
– Collocate tasks 
– Minimize communication overhead
Spatial Partitioning 
• Partition simulation space into regions, each 
handled by a separate node 
23
Communication Between Partitions 
• Owned Region: agents in it are owned by the 
node 
24 Owned
Communication Between Partitions 
• Visible Region: agents in it are not owned, but 
need to be seen by the node 
25 Owned Visible
Communication Between Partitions 
• Visible Region: agents in it are not owned, but 
need to be seen by the node 
25 Owned Visible
Communication Between Partitions 
• Visible Region: agents in it are not owned, but 
need to be seen by the node 
25 Owned Visible 
State Communication
Communication Between Partitions 
• Visible Region: agents in it are not owned, but 
need to be seen by the node 
25 Owned Visible 
Query
Communication Between Partitions 
• Visible Region: agents in it are not owned, but 
need to be seen by the node 
25 Owned Visible 
Effect communication
Communication Between Partitions 
• Visible Region: agents in it are not owned, but 
need to be seen by the node 
25 Owned Visible 
Update
Communication Between Partitions 
• Visible Region: agents in it are not owned, but 
need to be seen by the node 
26 Owned Visible
Communication Between Partitions 
• Visible Region: agents in it are not owned, but 
need to be seen by the node 
26 Owned Visible 
•Only need to com-municate 
with 
neighbors to 
•refresh states 
•forward assigned 
effects
Effect Inversion 
• In case of local effects only, can save one 
round of communication in each tick 
Map1 t 
Reduce1 t 
27 
… 
Distribute data 
Assign 
effects (partial) 
Map2 t 
Reduce2 t 
Forward data 
Aggregate 
effects
Effect Inversion 
• In case of local effects only, can save one 
round of communication in each tick 
Map1 t 
Reduce1 t 
Do not have non-local effects 
27 
… 
Distribute data 
Assign and 
Aggregate effects
Effect Inversion Is Always Possible 
• Theorem: Every behavioral simulation written 
in BRASIL that uses non-local effects can be 
rewritten to an equivalent simulation that 
uses local effects only 
– Proof in the VLDB 2010 paper
Intuition of Effect Inversion Theorem 
α 
Non-local 
Effect Writes
Intuition of Effect Inversion Theorem 
α 2α 
Non-local 
Effect Writes 
Non-local 
State Reads 
Local 
Effect Writes 
+
Talk Outline 
30 
• Motivation 
• Ease of Programming 
– Program Simulations in Time-stepped Pattern 
– BRASIL 
• Scalability 
– Execute Simulations in Dataflow Model 
– BRACE 
• Experiments 
• Conclusion
Experimental Setup 
• BRACE prototype 
– Grid partitioning 
– KD-Tree spatial indexing 
– Basic load balancing 
• Hardware: Cornell WebLab Cluster (60 nodes, 
2xQuadCore Xeon 2.66GHz, 4MB cache, 16GB 
RAM) 
31
Implemented Simulations 
• Traffic Simulation 
– Best-effort reimplementation of MITSIM lane 
changing and car following 
– Large segment of highway 
• Bacteria Simulation 
– Simple artificial society simulation 
• Fish School Simulation 
– Model of collective animal motion by Couzin et al., 
Nature, 2005 
32
Scalability: Traffic 
• Scale up the size of the highway with the number of 
the nodes 
• Notch consequence of multi-switch architecture 
33
Optimization: Bacteria 
• 16-node with indexing and effect inversion 
• 10,000 epochs of bacteria simulation 
34
Load Balancing: Fish 
• 16-node with load balancing turned on 
• Fish simulation of two independent schools that swim 
in opposite directions 
35
Conclusions 
• Behavioral Simulations can have huge impact, 
but need to be run at large-scale 
• New programming environment for 
behavioral simulations 
– Easy to program: Simulations in the state-effect 
pattern  BRASIL 
– Scalable: State-effect pattern in special-purpose 
MapReduce Engine  BRACE 
• We are moving to simulate NYC !  
36
Conclusions 
Thank you! 
• Behavioral Simulations can have huge impact, 
but need to be run at large-scale 
• New programming environment for 
behavioral simulations 
– Easy to program: Simulations in the state-effect 
pattern  BRASIL 
– Scalable: State-effect pattern in special-purpose 
MapReduce Engine  BRACE 
• We are moving to simulate NYC !  
36
Backup Slides 
37
38 
Ongoing Work 
Bringing Simulations to the Cloud 
• Scientists want to run their simulations in the 
cloud 
• Can we use the cloud? 
– Large and variable latency  latency compensation 
techniques 
– Large number of unreliable nodes  low-overhead 
checkpoints 
– Money as new optimization metric  think about 
how to allocate tasks
MITSIM: Single-Node Traffic Simulator 
Stockholm, Norway 
39
An Observation about Parallelism 
40
An Observation about Parallelism 
•A query phase in one tick is a spatial join 
40
An Observation about Parallelism 
•A query phase in one tick is a spatial join 
Id X Y 
1 1.5 1.5 
2 1.2 1.3 
3 2.3 2.4 
… 
40
An Observation about Parallelism 
•A query phase in one tick is a spatial join 
Id X Y 
1 1.5 1.5 
2 1.2 1.3 
3 2.3 2.4 
vX vY 
0 0 
0 0 
0 0 
… … 
40
An Observation about Parallelism 
•A query phase in one tick is a spatial join 
Id X Y 
1 1.5 1.5 
2 1.2 1.3 
3 2.3 2.4 
vX vY 
0 0 
0 0 
0 0 
Id aX aY C. 
1 2.1 3.9 1 
2 -3.3 -5.0 2 
… … 
… 
40
An Observation about Parallelism 
•A query phase in one tick is a spatial join 
Id X Y 
1 1.5 1.5 
2 1.2 1.3 
3 2.3 2.4 
vX vY 
0 0 
0 0 
0 0 
Id aX aY C. 
1 2.1 3.9 1 
2 -3.3 -5.0 2 
… … 
… 
•Behavioral simulations are large, iterated 
spatial joins 
40
An Observation about Parallelism 
•A query phase in one tick is a spatial join 
Id X Y 
1 1.5 1.5 
2 1.2 1.3 
3 2.3 2.4 
vX vY 
0 0 
0 0 
0 0 
Id aX aY C. 
1 2.1 3.9 1 
2 -3.3 -5.0 2 
… … 
… 
•Behavioral simulations are large, iterated 
spatial joins 
•We can parallelize iterated spatial joins in 
MapReduce! 
40
Compiling BRASIL 
• BRASIL translates to Monad Algebra 
41 
foreach(Fish p : Extent<Fish>) { 
p.avoidx <- 1 / x - p.x; 
p.count <- 1; 
}
Compiling BRASIL 
• BRASIL translates to Monad Algebra 
41 
P = <1:훱1 훰 훱p,2:훱2> 훰 PAIRWITH2 훰 휎 훱1= 훱2 훰 훱key 
훰 GET 훰 훱x 
E1 = <1:훱1 훰 훱p,2:휌(avoidx),3:1 / (훱1 훰 훱x – P)> 
E2 = <1:훱1 훰 훱p,2:휌(count),3:1> 
B = <1:훱1,2:훱2,3:훱2 ⊕ (E1 훰 SNG) ⊕ (E2 훰 SNG)> 
F = <1:훱1,2:훱2,3:<1:훱1 훰 푥p(훱2) 훰 PAIRWITHp, 2:훱2, 3:훱3> 훰 FLATMAP(B 훰 훱3)>
Compiling BRASIL 
• BRASIL translates to Monad Algebra 
41 
Select id, sum(1/f1.x – f2.x), count(*) 
from Fish f1, Fish f2 
where f1.x < f2.x + 1 
and f1.x > f2.x - 1
Compiling BRASIL 
• BRASIL translates to Monad Algebra 
•Can reuse classic DB optimization techniques 
•indexing, join reordering 
•Details of translation in our VLDB 2010 paper 
41 
Select id, sum(1/f1.x – f2.x), count(*) 
from Fish f1, Fish f2 
where f1.x < f2.x + 1 
and f1.x > f2.x - 1
A Tick in Fish Simulation 
• Query 
– For fish f in visibility α: 
• Write repulsion to f’s effects 
– For fish f in visibility ρ: 
• Write attraction to f’s effects 
• Update 
α 
ρ 
– new velocity = combined 
repulsion + combined 
attraction + old velocity 
– new position = old position + 
old velocity 42
A Tick in Fish Simulation 
• Query 
– For fish f in visibility α: 
• Write repulsion to f’s effects 
– For fish f in visibility ρ: 
• Write attraction to f’s effects 
• Update 
α 
ρ 
– new velocity = combined 
repulsion + combined 
attraction + old velocity 
– new position = old position + 
old velocity 42
A Tick in Fish Simulation 
• Query 
– For fish f in visibility α: 
• Write repulsion to f’s effects 
– For fish f in visibility ρ: 
• Write attraction to f’s effects 
• Update 
α 
ρ 
– new velocity = combined 
repulsion + combined 
attraction + old velocity 
– new position = old position + 
old velocity 42
A Tick in Fish Simulation 
• Query 
– For fish f in visibility α: 
• Write repulsion to f’s effects 
– For fish f in visibility ρ: 
• Write attraction to f’s effects 
• Update 
α 
ρ 
– new velocity = combined 
repulsion + combined 
attraction + old velocity 
– new position = old position + 
old velocity 42
A Tick in Fish Simulation 
• Query 
– For fish f in visibility α: 
• Write repulsion to f’s effects 
– For fish f in visibility ρ: 
• Write attraction to f’s effects 
• Update 
α 
ρ 
– new velocity = combined 
repulsion + combined 
attraction + old velocity 
– new position = old position + 
old velocity 42
A Tick in Fish Simulation 
• Query 
– For fish f in visibility α: 
• Write repulsion to f’s effects 
– For fish f in visibility ρ: 
• Write attraction to f’s effects 
• Update 
– new velocity = combined 
repulsion + combined 
attraction + old velocity 
– new position = old position + 
old velocity 43
A Tick in Fish Simulation 
• Query 
– For fish f in visibility α: 
• Write repulsion to f’s effects 
– For fish f in visibility ρ: 
• Write attraction to f’s effects 
• Update 
– new velocity = combined 
repulsion + combined 
attraction + old velocity 
– new position = old position + 
old velocity 43

More Related Content

Viewers also liked

深入剖析Concurrent hashmap中的同步机制(下)
深入剖析Concurrent hashmap中的同步机制(下)深入剖析Concurrent hashmap中的同步机制(下)
深入剖析Concurrent hashmap中的同步机制(下)wang hongjiang
 
深入剖析Concurrent hashmap中的同步机制(上)
深入剖析Concurrent hashmap中的同步机制(上)深入剖析Concurrent hashmap中的同步机制(上)
深入剖析Concurrent hashmap中的同步机制(上)wang hongjiang
 
Java7 fork join framework and closures
Java7 fork join framework and closuresJava7 fork join framework and closures
Java7 fork join framework and closureswang hongjiang
 
Effective linux.2.(tools)
Effective linux.2.(tools)Effective linux.2.(tools)
Effective linux.2.(tools)wang hongjiang
 
Effective linux.3.(diagnosis)
Effective linux.3.(diagnosis)Effective linux.3.(diagnosis)
Effective linux.3.(diagnosis)wang hongjiang
 
Exodus重构和向apollo迁移
Exodus重构和向apollo迁移Exodus重构和向apollo迁移
Exodus重构和向apollo迁移wang hongjiang
 
Hash map导致cpu100% 的分析
Hash map导致cpu100% 的分析Hash map导致cpu100% 的分析
Hash map导致cpu100% 的分析wang hongjiang
 
Real world akka recepies v3
Real world akka recepies v3Real world akka recepies v3
Real world akka recepies v3shinolajla
 
中等创业公司后端技术选型
中等创业公司后端技术选型中等创业公司后端技术选型
中等创业公司后端技术选型wang hongjiang
 
Building a Real-time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-time Data Pipeline: Apache Kafka at LinkedInBuilding a Real-time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-time Data Pipeline: Apache Kafka at LinkedInDataWorks Summit
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormMd. Shamsur Rahim
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingGuozhang Wang
 
Apache Kafka
Apache KafkaApache Kafka
Apache KafkaJoe Stein
 

Viewers also liked (20)

深入剖析Concurrent hashmap中的同步机制(下)
深入剖析Concurrent hashmap中的同步机制(下)深入剖析Concurrent hashmap中的同步机制(下)
深入剖析Concurrent hashmap中的同步机制(下)
 
深入剖析Concurrent hashmap中的同步机制(上)
深入剖析Concurrent hashmap中的同步机制(上)深入剖析Concurrent hashmap中的同步机制(上)
深入剖析Concurrent hashmap中的同步机制(上)
 
Java7 fork join framework and closures
Java7 fork join framework and closuresJava7 fork join framework and closures
Java7 fork join framework and closures
 
善用工具
善用工具善用工具
善用工具
 
Effective linux.2.(tools)
Effective linux.2.(tools)Effective linux.2.(tools)
Effective linux.2.(tools)
 
聊一些电影
聊一些电影聊一些电影
聊一些电影
 
Effective linux.3.(diagnosis)
Effective linux.3.(diagnosis)Effective linux.3.(diagnosis)
Effective linux.3.(diagnosis)
 
Exodus重构和向apollo迁移
Exodus重构和向apollo迁移Exodus重构和向apollo迁移
Exodus重构和向apollo迁移
 
Exodus2 大局观
Exodus2 大局观Exodus2 大局观
Exodus2 大局观
 
Enum开锁
Enum开锁Enum开锁
Enum开锁
 
Hash map导致cpu100% 的分析
Hash map导致cpu100% 的分析Hash map导致cpu100% 的分析
Hash map导致cpu100% 的分析
 
Jvm内存管理基础
Jvm内存管理基础Jvm内存管理基础
Jvm内存管理基础
 
Real world akka recepies v3
Real world akka recepies v3Real world akka recepies v3
Real world akka recepies v3
 
Ali-tomcat
Ali-tomcatAli-tomcat
Ali-tomcat
 
中等创业公司后端技术选型
中等创业公司后端技术选型中等创业公司后端技术选型
中等创业公司后端技术选型
 
Building a Real-time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-time Data Pipeline: Apache Kafka at LinkedInBuilding a Real-time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-time Data Pipeline: Apache Kafka at LinkedIn
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 

Similar to Behavioral Simulations in MapReduce

Analysis of invasive species using species distribution models: the silver-ch...
Analysis of invasive species using species distribution models: the silver-ch...Analysis of invasive species using species distribution models: the silver-ch...
Analysis of invasive species using species distribution models: the silver-ch...Blue BRIDGE
 
Op presentation dep_biodiversityconservation
Op presentation dep_biodiversityconservationOp presentation dep_biodiversityconservation
Op presentation dep_biodiversityconservationFrancesco Accatino
 
Bayesian Inference for front-tracking problems - 2013 IPDO conference
Bayesian Inference for front-tracking problems - 2013 IPDO conferenceBayesian Inference for front-tracking problems - 2013 IPDO conference
Bayesian Inference for front-tracking problems - 2013 IPDO conferenceMélanie Rochoux
 

Similar to Behavioral Simulations in MapReduce (7)

Analysis of invasive species using species distribution models: the silver-ch...
Analysis of invasive species using species distribution models: the silver-ch...Analysis of invasive species using species distribution models: the silver-ch...
Analysis of invasive species using species distribution models: the silver-ch...
 
Op presentation dep_biodiversityconservation
Op presentation dep_biodiversityconservationOp presentation dep_biodiversityconservation
Op presentation dep_biodiversityconservation
 
Puget Sound Pressures Assessment
Puget Sound Pressures Assessment Puget Sound Pressures Assessment
Puget Sound Pressures Assessment
 
Archipelagos
ArchipelagosArchipelagos
Archipelagos
 
Bayesian Inference for front-tracking problems - 2013 IPDO conference
Bayesian Inference for front-tracking problems - 2013 IPDO conferenceBayesian Inference for front-tracking problems - 2013 IPDO conference
Bayesian Inference for front-tracking problems - 2013 IPDO conference
 
Modeling full scale-data(2)
Modeling full scale-data(2)Modeling full scale-data(2)
Modeling full scale-data(2)
 
Fisheries cvc net_logo_final
Fisheries cvc net_logo_finalFisheries cvc net_logo_final
Fisheries cvc net_logo_final
 

More from Guozhang Wang

Consensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdfConsensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdfGuozhang Wang
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Guozhang Wang
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Guozhang Wang
 
Introduction to the Incremental Cooperative Protocol of Kafka
Introduction to the Incremental Cooperative Protocol of KafkaIntroduction to the Incremental Cooperative Protocol of Kafka
Introduction to the Incremental Cooperative Protocol of KafkaGuozhang Wang
 
Performance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams ApplicationsPerformance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams ApplicationsGuozhang Wang
 
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedGuozhang Wang
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsGuozhang Wang
 

More from Guozhang Wang (7)

Consensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdfConsensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdf
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
 
Introduction to the Incremental Cooperative Protocol of Kafka
Introduction to the Incremental Cooperative Protocol of KafkaIntroduction to the Incremental Cooperative Protocol of Kafka
Introduction to the Incremental Cooperative Protocol of Kafka
 
Performance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams ApplicationsPerformance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams Applications
 
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
 

Recently uploaded

Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01KreezheaRecto
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 

Recently uploaded (20)

Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 

Behavioral Simulations in MapReduce

  • 1. Behavioral Simulations in MapReduce Guozhang Wang, Marcos Vaz Salles, Benjamin Sowell, Xun Wang, Tuan Cao, Alan Demers, Johannes Gehrke, Walker White Cornell University 1
  • 2. What are Behavioral Simulations? • Simulations of individuals that interact to create emerging behavior in complex systems • Application Areas – Traffic networks – Ecology systems – Sociology systems – etc 2
  • 3. Why Behavioral Simulations? • Traffic – Congestion cost $87.2 billion in the U.S. in 2007 – More people killed by air pollution than accidents – Detailed models: micro-simulators not scale to NYC! • Ecology – Hard to scale to large fish schools or locust swarms 3
  • 4. Challenges of Behavioral Simulations • Easy to program  not scalable – Examples: Swarm, Mason – Typically one thread per agent, lots of contention • Scalable  hard to program – Examples: TRANSIMS, DynaMIT (traffic), GPU implementation of fish simulation (ecology) – Hard-coded models, compromise level of detail 4
  • 5. Challenges of Behavioral Simulations • Easy to program  not scalable – Examples: Swarm, Mason – Typically one thread per agent, lots of contention • Scalable  hard to program – Examples: TRANSIMS, DynaMIT (traffic), GPU implementation of fish simulation (ecology) – Hard-coded models, compromise level of detail 4 Can we do better?
  • 6. Our Contribution • A new simulation platform that combines: – Ease of programming • Program simulations in State-Effect pattern • BRASIL: Scripting language for domain scientists – Scalability • Execute simulations in the MapReduce model • BRACE: Special-purpose MapReduce engine 5
  • 7. Talk Outline • Motivation • Ease of Programming – Program Simulations in State-Effect Pattern – BRASIL • Scalability – Execute Simulations in MapReduce Model – BRACE • Experiments • Conclusion 6
  • 8. A Running Example: Fish Schools • Adapted from Couzin et al., Nature 2005 7 • Fish Behavior – Avoidance: if too close, repel other fish – Attraction: if seen within range, attract other fish
  • 9. A Running Example: Fish Schools • Adapted from Couzin et al., Nature 2005 7 α • Fish Behavior – Avoidance: if too close, repel other fish – Attraction: if seen within range, attract other fish
  • 10. A Running Example: Fish Schools • Adapted from Couzin et al., Nature 2005 7 α ρ • Fish Behavior – Avoidance: if too close, repel other fish – Attraction: if seen within range, attract other fish
  • 11. A Running Example: Fish Schools • Concurrency: agents are concurrent within a tick • Interactions: agents continuously interact • Spatial Locality: agents have limited visibility 8 • Time-stepping: agents proceed in ticks α ρ
  • 12. Classic Solutions for Concurrency 9 • Preempt conflicts  locking • Rollback in case of conflicts  optimistic concurrency control • Problems: – Strong iterations  many conflicts • Either lots of lock contention • Or lots of rollbacks – Does not scale well
  • 13. State-Effect Pattern • Programming pattern to deal with concurrency • Follows time-stepped model • Core Idea: Make all actions inside of a tick order-independent 10
  • 14. States and Effects • States: – Snapshot of agents at the beginning of the tick • position, velocity vector 11 • Effects: – Intermediate results from interaction, used to calculate new states • sets of forces from other fish α ρ
  • 15. States and Effects • States: – Snapshot of agents at the beginning of the tick • position, velocity vector 11 • Effects: – Intermediate results from interaction, used to calculate new states • sets of forces from other fish α ρ
  • 16. Two Phases of a Tick • Query: capture agent interaction – Read states  write effects – Each effect set is associated with combinator function – Effect writes are order-independent • Update: refresh world for next tick – Read effects  write states – Reads and writes are totally local – State writes are order-independent Tick Query Update 12
  • 17. A Tick in State-Effect • Query – For fish f in visibility α: • Write repulsion to f’s effects – For fish f in visibility ρ: • Write attraction to f’s effects • Update α ρ – new velocity = combined repulsion + combined attraction + old velocity – new position = old position + old velocity 13
  • 18. A Tick in State-Effect • Query – For fish f in visibility α: • Write repulsion to f’s effects – For fish f in visibility ρ: • Write attraction to f’s effects • Update α ρ – new velocity = combined repulsion + combined attraction + old velocity – new position = old position + old velocity 13
  • 19. A Tick in State-Effect • Query – For fish f in visibility α: • Write repulsion to f’s effects – For fish f in visibility ρ: • Write attraction to f’s effects • Update α ρ – new velocity = combined repulsion + combined attraction + old velocity – new position = old position + old velocity 13
  • 20. A Tick in State-Effect • Query – For fish f in visibility α: • Write repulsion to f’s effects – For fish f in visibility ρ: • Write attraction to f’s effects • Update α ρ – new velocity = combined repulsion + combined attraction + old velocity – new position = old position + old velocity 13
  • 21. A Tick in State-Effect • Query – For fish f in visibility α: • Write repulsion to f’s effects – For fish f in visibility ρ: • Write attraction to f’s effects • Update α ρ – new velocity = combined repulsion + combined attraction + old velocity – new position = old position + old velocity 13
  • 22. Fish in State-Effect • Query – For fish f in visibility α: • Write repulsion to f’s effects – For fish f in visibility ρ: • Write attraction to f’s effects • Update – new velocity = combined repulsion + combined attraction + old velocity – new position = old position + old velocity 14
  • 23. BRASIL (Big Red Agent SImulation Language) 15 • High-level language for domain scientists • Object-oriented style • Programs specify behavior logic of individual agents
  • 24. Fish in BRASIL 16 class Fish { // The fish location & velocity (x) public state float x : x + vx; #range[-1,1]; public state float vx : vx + rand() + avoidx / count * vx; // Used to update our velocity (x) private effect float avoidx : sum; private effect int count : sum; /** The query-phase for this fish. */ public void run() { // Use "forces" to repel fish too close foreach(Fish p : Extent<Fish>) { p.avoidx <- 1 / abs(x - p.x); ... p.count <- 1; }}}
  • 25. Fish in BRASIL 16 class Fish { // The fish location & velocity (x) public state float x : x + vx; #range[-1,1]; public state float vx : vx + rand() + avoidx / count * vx; // Used to update our velocity (x) private effect float avoidx : sum; private effect int count : sum; /** The query-phase for this fish. */ public void run() { •Syntax enforces state-effect pattern // Use "forces" to repel fish too close foreach(Fish p : Extent<Fish>) { p.avoidx <- 1 / abs(x - p.x); ... p.count <- 1; }}}
  • 26. Fish in BRASIL 16 class Fish { // The fish location & velocity (x) public state float x : x + vx; #range[-1,1]; public state float vx : vx + rand() + avoidx / count * vx; // Used to update our velocity (x) private effect float avoidx : sum; private effect int count : sum; /** The query-phase for this fish. */ public void run() { •Syntax enforces state-effect pattern •Translates to Monad Algebra // Use "forces" to repel fish too close foreach(Fish p : Extent<Fish>) { p.avoidx <- 1 / abs(x - p.x); ... p.count <- 1; }}}
  • 27. Fish in BRASIL 16 class Fish { // The fish location & velocity (x) public state float x : x + vx; #range[-1,1]; public state float vx : vx + rand() + avoidx / count * vx; // Used to update our velocity (x) private effect float avoidx : sum; private effect int count : sum; /** The query-phase for this fish. */ •Syntax enforces state-effect pattern •Translates to Monad Algebra P = <1:훱1 훰 훱p,2:훱2> 훰 PAIRWITH2 훰 휎 훱1= 훱2 훰 훱key }}} 훰 GET 훰 훱x E1 = <1:훱1 훰 훱p,2:휌(avoidx),3:1 / (훱1 훰 훱x – P)> E2 = <1:훱1 훰 훱p,2:휌(count),3:1> B = <1:훱1,2:훱2,3:훱2 ⊕ (E1 훰 SNG) ⊕ (E2 훰 SNG)> F = <1:훱1,2:훱2,3:<1:훱1 훰 푥p(훱2) 훰 PAIRWITHp, 2:훱2, 3:훱3> 훰 FLATMAP(B 훰 훱3)>
  • 28. Fish in BRASIL 16 •Syntax enforces state-effect pattern •Translates to Monad Algebra •Can reuse classic DB optimization techniques P = <1:훱1 훰 훱p,2:훱2> 훰 PAIRWITH2 훰 휎 훱1= 훱2 훰 훱key 훰 GET 훰 훱x E1 = <1:훱1 훰 훱p,2:휌(avoidx),3:1 / (훱1 훰 훱x – P)> E2 = <1:훱1 훰 훱p,2:휌(count),3:1> B = <1:훱1,2:훱2,3:훱2 ⊕ (E1 훰 SNG) ⊕ (E2 훰 SNG)> F = <1:훱1,2:훱2,3:<1:훱1 훰 푥p(훱2) 훰 PAIRWITHp, 2:훱2, 3:훱3> 훰 FLATMAP(B 훰 훱3)> •Details of translation in our VLDB 2010 paper
  • 29. Talk Outline 17 • Motivation • Ease of Programming – Program Simulations in State-Effect Pattern – BRASIL • Scalability – Execute Simulations in MapReduce Model – BRACE • Experiments • Conclusion
  • 30. How to Scale to Millions of Fish? 18 • Use multiple nodes in a cluster of machines for large simulation scenarios • Need to efficiently parallelize computations of state-effect pattern
  • 31. State-Effect Revisited 19 • Agent partitioning with replications across nodes • Communicate new states before next tick’s query phase Tick Query state effects Communicate Effects Update effects  new state Communicate New State
  • 32. State-Effect Revisited 20 • Agent partitioning with replications across nodes • Communicate new states before next tick’s query phase • Communicate effect assignments before update phase Tick Query state effects Communicate Effects Update effects  new state Communicate New State
  • 33. From State-Effect to Map-Reduce 21 Tick Query state effects Communicate Effects Update effects  new state Communicate New State
  • 34. From State-Effect to Map-Reduce Map1 t … Distribute data … 21 Tick Query state effects Communicate Effects Update effects  new state Communicate New State
  • 35. From State-Effect to Map-Reduce Map1 t … Distribute data Reduce1 t Assign effects (partial) … 21 Tick Query state effects Communicate Effects Update effects  new state Communicate New State
  • 36. From State-Effect to Map-Reduce Map1 t Reduce1 t Map2 t Reduce2 t … Distribute data Assign effects (partial) Forward data Aggregate effects … 21 Tick Query state effects Communicate Effects Update effects  new state Communicate New State
  • 37. From State-Effect to Map-Reduce Map1 t Reduce1 t Map2 t Reduce2 t Map1 t+1 … … Distribute data Assign effects (partial) Forward data Aggregate effects Update Redistribute data … 21 Tick Query state effects Communicate Effects Update effects  new state Communicate New State
  • 38. BRACE (Big Red Agent Computation Engine) 22 • Special-purpose MapReduce engine for behavioral simulations • Basic Optimizations – Keep data in main memory – Do Not checkpoint every iteration • Optimizations based on Spatial Properties: – Collocate tasks – Minimize communication overhead
  • 39. Spatial Partitioning • Partition simulation space into regions, each handled by a separate node 23
  • 40. Communication Between Partitions • Owned Region: agents in it are owned by the node 24 Owned
  • 41. Communication Between Partitions • Visible Region: agents in it are not owned, but need to be seen by the node 25 Owned Visible
  • 42. Communication Between Partitions • Visible Region: agents in it are not owned, but need to be seen by the node 25 Owned Visible
  • 43. Communication Between Partitions • Visible Region: agents in it are not owned, but need to be seen by the node 25 Owned Visible State Communication
  • 44. Communication Between Partitions • Visible Region: agents in it are not owned, but need to be seen by the node 25 Owned Visible Query
  • 45. Communication Between Partitions • Visible Region: agents in it are not owned, but need to be seen by the node 25 Owned Visible Effect communication
  • 46. Communication Between Partitions • Visible Region: agents in it are not owned, but need to be seen by the node 25 Owned Visible Update
  • 47. Communication Between Partitions • Visible Region: agents in it are not owned, but need to be seen by the node 26 Owned Visible
  • 48. Communication Between Partitions • Visible Region: agents in it are not owned, but need to be seen by the node 26 Owned Visible •Only need to com-municate with neighbors to •refresh states •forward assigned effects
  • 49. Effect Inversion • In case of local effects only, can save one round of communication in each tick Map1 t Reduce1 t 27 … Distribute data Assign effects (partial) Map2 t Reduce2 t Forward data Aggregate effects
  • 50. Effect Inversion • In case of local effects only, can save one round of communication in each tick Map1 t Reduce1 t Do not have non-local effects 27 … Distribute data Assign and Aggregate effects
  • 51. Effect Inversion Is Always Possible • Theorem: Every behavioral simulation written in BRASIL that uses non-local effects can be rewritten to an equivalent simulation that uses local effects only – Proof in the VLDB 2010 paper
  • 52. Intuition of Effect Inversion Theorem α Non-local Effect Writes
  • 53. Intuition of Effect Inversion Theorem α 2α Non-local Effect Writes Non-local State Reads Local Effect Writes +
  • 54. Talk Outline 30 • Motivation • Ease of Programming – Program Simulations in Time-stepped Pattern – BRASIL • Scalability – Execute Simulations in Dataflow Model – BRACE • Experiments • Conclusion
  • 55. Experimental Setup • BRACE prototype – Grid partitioning – KD-Tree spatial indexing – Basic load balancing • Hardware: Cornell WebLab Cluster (60 nodes, 2xQuadCore Xeon 2.66GHz, 4MB cache, 16GB RAM) 31
  • 56. Implemented Simulations • Traffic Simulation – Best-effort reimplementation of MITSIM lane changing and car following – Large segment of highway • Bacteria Simulation – Simple artificial society simulation • Fish School Simulation – Model of collective animal motion by Couzin et al., Nature, 2005 32
  • 57. Scalability: Traffic • Scale up the size of the highway with the number of the nodes • Notch consequence of multi-switch architecture 33
  • 58. Optimization: Bacteria • 16-node with indexing and effect inversion • 10,000 epochs of bacteria simulation 34
  • 59. Load Balancing: Fish • 16-node with load balancing turned on • Fish simulation of two independent schools that swim in opposite directions 35
  • 60. Conclusions • Behavioral Simulations can have huge impact, but need to be run at large-scale • New programming environment for behavioral simulations – Easy to program: Simulations in the state-effect pattern  BRASIL – Scalable: State-effect pattern in special-purpose MapReduce Engine  BRACE • We are moving to simulate NYC !  36
  • 61. Conclusions Thank you! • Behavioral Simulations can have huge impact, but need to be run at large-scale • New programming environment for behavioral simulations – Easy to program: Simulations in the state-effect pattern  BRASIL – Scalable: State-effect pattern in special-purpose MapReduce Engine  BRACE • We are moving to simulate NYC !  36
  • 63. 38 Ongoing Work Bringing Simulations to the Cloud • Scientists want to run their simulations in the cloud • Can we use the cloud? – Large and variable latency  latency compensation techniques – Large number of unreliable nodes  low-overhead checkpoints – Money as new optimization metric  think about how to allocate tasks
  • 64. MITSIM: Single-Node Traffic Simulator Stockholm, Norway 39
  • 65. An Observation about Parallelism 40
  • 66. An Observation about Parallelism •A query phase in one tick is a spatial join 40
  • 67. An Observation about Parallelism •A query phase in one tick is a spatial join Id X Y 1 1.5 1.5 2 1.2 1.3 3 2.3 2.4 … 40
  • 68. An Observation about Parallelism •A query phase in one tick is a spatial join Id X Y 1 1.5 1.5 2 1.2 1.3 3 2.3 2.4 vX vY 0 0 0 0 0 0 … … 40
  • 69. An Observation about Parallelism •A query phase in one tick is a spatial join Id X Y 1 1.5 1.5 2 1.2 1.3 3 2.3 2.4 vX vY 0 0 0 0 0 0 Id aX aY C. 1 2.1 3.9 1 2 -3.3 -5.0 2 … … … 40
  • 70. An Observation about Parallelism •A query phase in one tick is a spatial join Id X Y 1 1.5 1.5 2 1.2 1.3 3 2.3 2.4 vX vY 0 0 0 0 0 0 Id aX aY C. 1 2.1 3.9 1 2 -3.3 -5.0 2 … … … •Behavioral simulations are large, iterated spatial joins 40
  • 71. An Observation about Parallelism •A query phase in one tick is a spatial join Id X Y 1 1.5 1.5 2 1.2 1.3 3 2.3 2.4 vX vY 0 0 0 0 0 0 Id aX aY C. 1 2.1 3.9 1 2 -3.3 -5.0 2 … … … •Behavioral simulations are large, iterated spatial joins •We can parallelize iterated spatial joins in MapReduce! 40
  • 72. Compiling BRASIL • BRASIL translates to Monad Algebra 41 foreach(Fish p : Extent<Fish>) { p.avoidx <- 1 / x - p.x; p.count <- 1; }
  • 73. Compiling BRASIL • BRASIL translates to Monad Algebra 41 P = <1:훱1 훰 훱p,2:훱2> 훰 PAIRWITH2 훰 휎 훱1= 훱2 훰 훱key 훰 GET 훰 훱x E1 = <1:훱1 훰 훱p,2:휌(avoidx),3:1 / (훱1 훰 훱x – P)> E2 = <1:훱1 훰 훱p,2:휌(count),3:1> B = <1:훱1,2:훱2,3:훱2 ⊕ (E1 훰 SNG) ⊕ (E2 훰 SNG)> F = <1:훱1,2:훱2,3:<1:훱1 훰 푥p(훱2) 훰 PAIRWITHp, 2:훱2, 3:훱3> 훰 FLATMAP(B 훰 훱3)>
  • 74. Compiling BRASIL • BRASIL translates to Monad Algebra 41 Select id, sum(1/f1.x – f2.x), count(*) from Fish f1, Fish f2 where f1.x < f2.x + 1 and f1.x > f2.x - 1
  • 75. Compiling BRASIL • BRASIL translates to Monad Algebra •Can reuse classic DB optimization techniques •indexing, join reordering •Details of translation in our VLDB 2010 paper 41 Select id, sum(1/f1.x – f2.x), count(*) from Fish f1, Fish f2 where f1.x < f2.x + 1 and f1.x > f2.x - 1
  • 76. A Tick in Fish Simulation • Query – For fish f in visibility α: • Write repulsion to f’s effects – For fish f in visibility ρ: • Write attraction to f’s effects • Update α ρ – new velocity = combined repulsion + combined attraction + old velocity – new position = old position + old velocity 42
  • 77. A Tick in Fish Simulation • Query – For fish f in visibility α: • Write repulsion to f’s effects – For fish f in visibility ρ: • Write attraction to f’s effects • Update α ρ – new velocity = combined repulsion + combined attraction + old velocity – new position = old position + old velocity 42
  • 78. A Tick in Fish Simulation • Query – For fish f in visibility α: • Write repulsion to f’s effects – For fish f in visibility ρ: • Write attraction to f’s effects • Update α ρ – new velocity = combined repulsion + combined attraction + old velocity – new position = old position + old velocity 42
  • 79. A Tick in Fish Simulation • Query – For fish f in visibility α: • Write repulsion to f’s effects – For fish f in visibility ρ: • Write attraction to f’s effects • Update α ρ – new velocity = combined repulsion + combined attraction + old velocity – new position = old position + old velocity 42
  • 80. A Tick in Fish Simulation • Query – For fish f in visibility α: • Write repulsion to f’s effects – For fish f in visibility ρ: • Write attraction to f’s effects • Update α ρ – new velocity = combined repulsion + combined attraction + old velocity – new position = old position + old velocity 42
  • 81. A Tick in Fish Simulation • Query – For fish f in visibility α: • Write repulsion to f’s effects – For fish f in visibility ρ: • Write attraction to f’s effects • Update – new velocity = combined repulsion + combined attraction + old velocity – new position = old position + old velocity 43
  • 82. A Tick in Fish Simulation • Query – For fish f in visibility α: • Write repulsion to f’s effects – For fish f in visibility ρ: • Write attraction to f’s effects • Update – new velocity = combined repulsion + combined attraction + old velocity – new position = old position + old velocity 43

Editor's Notes

  1. Welcome to my talk, and as titled, today I am going to talk about building BEHVAIORAL SIMULATIONS with MapReduce, and it’s a joint work with the Cornell Database Group and School of Civil and Environmental Engineering.
  2. So first of all, what are Behavioral Simulations, which also called AGENT-BASED SIMULATIONS, used for. An example simulation for traffic networks is shown on the right. We can see that in this simulator, every single vehicle is an AGENT, which has its own behavior logics. For example, they will brake when the red light is on or the front car is too near, and change their lanes to bypass the front cars if they are too slow. And most of such behaviors are from the INTERACTIONS between agents. The screenshot only shows a small portion of the network, and you can IMAGIN that in a HUGE network with MILLIONS of vehicles, many interesting and complex traffic patterns will emerge, and people can base on those observations to better understand the traffic network. Besides this, behavioral simulations can also be applied to many other complex systems composed of individuals, such as Fish schools and pedestrian crowds.
  3. Understanding such systems with behavioral simulations is very important. Take traffic AGAIN, for example. Traffic congestion cost 87 BILLION dollars in the US in 2007 alone. In addition, in terms of gas emissions, studies have indicated that MORE people are killed by air pollution from vehicles than by traffic accidents. So, to UNDERSTAND traffic congestion and ESTIMATE vehicle emissions, traffic engineers have created many simulators which include various detailed driver behavior models. And you’ve just seen one example in the last slide. These so called MICRO-simulations can capture congestion and estimate emission much more precisely than those macro models based on equations of network flows. HOWEVER, the bad news is, no current traffic micro-simulators can scale to a large scenario like the New York City. And we see the SAME SITUATION in Ecology. People develop simulation models to study the behavior of animal swarms. However it is VERY HARD to scale such simulations to swarms of millions of individuals.
  4. So WHY is this the case? The main reason is due to the conflict between the programmability and scalability. Today, there are already some behavioral simulation platforms, such as Swarm from Santa Fe Institute and Mason from George Mason University, which offer programming tools to make easy development. In order to do that, these platforms usually program individual agents in different threads, and use message passing between threads to capture agent interactions. However, because of the heavy interaction between agents, THIS LEADS TO LOTS of contention which prevent them from scaling up. On the one hand, to achieve high scalability, scientists have coded a lot of ad-hoc simulators for their specific domain applications. In those simulators, models are usually HARD-CODED, and most of the time the their complexities need to be compromised for better performance. In a word, they must choose either programmability or scalability, but not both. (click) So, the question is, can we do better?
  5. So WHY is this the case? The main reason is due to the conflict between the programmability and scalability. Today, there are already some behavioral simulation platforms, such as Swarm from Santa Fe Institute and Mason from George Mason University, which offer programming tools to make easy development. In order to do that, these platforms usually program individual agents in different threads, and use message passing between threads to capture agent interactions. However, because of the heavy interaction between agents, THIS LEADS TO LOTS of contention which prevent them from scaling up. On the one hand, to achieve high scalability, scientists have coded a lot of ad-hoc simulators for their specific domain applications. In those simulators, models are usually HARD-CODED, and most of the time the their complexities need to be compromised for better performance. In a word, they must choose either programmability or scalability, but not both. (click) So, the question is, can we do better?
  6. At Cornell we answer this question by building a new simulation platform for domain scientists that provides both EASY PROGRMMABILITY AND SCALABILITY. The goal is that scientists can easily CODE and MODIFY their simulation models, hit a button, and then get a scalable and efficient implementation that is ready to run. For EASE OF PROGRAMMING, we let the simulation scientists to express simulation models in the STATE-EFFECT PATTERN. This is a programming pattern that we have developed for NON-PLAYER CHARACTERS in computer games. But as we will show later, it is also very common in many simulation models. The platform gives the domain scientist a high-level language, called BRASIL, to let them easily program in the STATE-EFFECT PATTERN. And also we can translate the program written in BRASIL into an algebraic form, so that we can automatically optimize it. For SCALABILITY, we will execute this state-effect pattern into MAPREDUCE over a cluster of computers. To get REALLY HIGH performance, our platform also provides a special-purpose MapReduce engine optimized for behavioral simulations, called BRACE.
  7. The rest of my talk will be around these two points. After that I will show you some experiments with our platform, and then conclude.
  8. As a RUNNING EXAMPLE I will use a simulation model of fish schools that was published in NATURE, 2005. Suppose we have a bunch of FISH IN THE OCEAN here on the left. Each fish has two behavior logics: AVOIDANCE and ATTRACTION. AVOIDANCE works like this: each fish like this BLACK ONE HERE looks around and repels fish that are too close to it by sending a repulsion force; here with this distance ALPHA it will send repulsion to these two fish. For ATTRACTION, fish look for other fish within larger distance RHO and send ATTRACTION FORCES to ALL OF them. Both the repulsion and the attraction forces will influence the fish velocities. And they do this again and again for every iterations of the simulation. So this simple model balances PERSONAL SPACE with SCHOOLING BEHAVIOR.
  9. As a RUNNING EXAMPLE I will use a simulation model of fish schools that was published in NATURE, 2005. Suppose we have a bunch of FISH IN THE OCEAN here on the left. Each fish has two behavior logics: AVOIDANCE and ATTRACTION. AVOIDANCE works like this: each fish like this BLACK ONE HERE looks around and repels fish that are too close to it by sending a repulsion force; here with this distance ALPHA it will send repulsion to these two fish. For ATTRACTION, fish look for other fish within larger distance RHO and send ATTRACTION FORCES to ALL OF them. Both the repulsion and the attraction forces will influence the fish velocities. And they do this again and again for every iterations of the simulation. So this simple model balances PERSONAL SPACE with SCHOOLING BEHAVIOR.
  10. As a RUNNING EXAMPLE I will use a simulation model of fish schools that was published in NATURE, 2005. Suppose we have a bunch of FISH IN THE OCEAN here on the left. Each fish has two behavior logics: AVOIDANCE and ATTRACTION. AVOIDANCE works like this: each fish like this BLACK ONE HERE looks around and repels fish that are too close to it by sending a repulsion force; here with this distance ALPHA it will send repulsion to these two fish. For ATTRACTION, fish look for other fish within larger distance RHO and send ATTRACTION FORCES to ALL OF them. Both the repulsion and the attraction forces will influence the fish velocities. And they do this again and again for every iterations of the simulation. So this simple model balances PERSONAL SPACE with SCHOOLING BEHAVIOR.
  11. And also it reveals MANY CHARACTERISTICS that are common in many behavioral simulations. The first observation is that time is discrretized into ticks. At each tick, agents execute their behavior logic simultaneously. EVEN MORE, there are STRONG INTERACTIONS among agents within ticks. So here in our Fish example, fish are always interacting by sending REPULSION and ATTRACTION forces to each other at EVERY TICK. Finally, these simulations tend to have SPATIAL LOCALITY. Agents usually do not have a GLOBAL knowledge of the simulation world, instead, they only act in response to nearby neighbors within their visibility range. For example, this black fish HERE only cares about the fish within distance RHO.
  12. So talking about the second point, to deal with agents concurrency within a tick, there are already several CLASSIC SOLUTIONs. You can either PREEMPT conflicts, using locking or synchronization, or use OPTIMISTIC CONCURRENCY CONTROL, and rollback in case of conflicts. The Swarm and Mason platforms we mentioned before use such solutions. The problem with those solutions is that these simulation have such STRONG INTERACTIONS, so there will be many CONFLICTS and we get either LOTS of contention or LOTS of rollbacks, which leads to poor scalability.
  13. So, our solution to this problem is the STATE-EFFECT PATTERN. In this pattern, we still follow a TIME-STEPPED MODEL, but we introduce PROGRAMMING RESTRICTIONS to make all actions inside of a tick ORDER-INDEPENDENT.
  14. To explain the state-effect PATTERN, let me firstly explains what are states and effects. STATES represents the SNAPSHOT of the world as of the beginning of the tick. It’s composed of the state of every agent inside the simulation. EFFECTS are sets of intermediate values PER AGENT generated from agent interactions, which are necessary to calculate the new states. In the Fish example EFFECTS are the sets of REPULSION forces and ATTRACTION forces from other fish. (click) For example, the two GRAY FISH here within distance ALPHA send REPULSION force to these BLACK FISH, and the other three within RHO but farther than ALPHA send ATTRACTION forces to this black fish. And the black fish collects a SET of these attraction forces as its EFFECTS. As these effects are SETS, it doesn’t matter in which ORDER the ATTRACTION or REPULSION forces are sent to the black fish; ALL that matters is that the black fish gets ALL of these forces.
  15. To explain the state-effect PATTERN, let me firstly explains what are states and effects. STATES represents the SNAPSHOT of the world as of the beginning of the tick. It’s composed of the state of every agent inside the simulation. EFFECTS are sets of intermediate values PER AGENT generated from agent interactions, which are necessary to calculate the new states. In the Fish example EFFECTS are the sets of REPULSION forces and ATTRACTION forces from other fish. (click) For example, the two GRAY FISH here within distance ALPHA send REPULSION force to these BLACK FISH, and the other three within RHO but farther than ALPHA send ATTRACTION forces to this black fish. And the black fish collects a SET of these attraction forces as its EFFECTS. As these effects are SETS, it doesn’t matter in which ORDER the ATTRACTION or REPULSION forces are sent to the black fish; ALL that matters is that the black fish gets ALL of these forces.
  16. We also divide the tick into TWO PHASES: the QUERY phase and the UPDATE phase. The QUERY phase captures all the agent interactions by taking the state of the world and computes effects for ALL agents. The UPDATE phase then takes those EFFECTS and updates the agent states for the next tick. In the QUERY phase, we associate each effect with an order-independent COMBINATOR function to collect the set. So each effect variable will not actually STORE the whole set, but instead keep the aggregate of the set. For example, we could use VECTOR ADDITION as a combinator function to sum all the forces sent to a fish. As a result, in the UPDATE phase, each agent only need to read its OWN state and effects to compute the new states, so the WRITES TO AGENT EFFECTS AND STATES can also be done in ANY ORDER.
  17. NOW let’s look at an EXAMPLE of how to process a tick in our FISH SIMULATION in the STATE-EFFECT PATTERN. First in the QUERY phase, our GRAY FISH here, for example, will write REPULSION and ATTRACTION effects to THE BLACK FISH. So using the vector addition combinatior, at the end of the QUERY phase we have the aggregated effects collected by the black fish. (click) Then in the UPDATE phase, the black fish compute its NEW VELOCITY by summing the COMBINED REPULSION and ATTRACTION with the OLD VELOCITY. The fish’s NEW POSITION is then obtained from the OLD POSITION with the OLD VELOCITY. And also, all the other GRAY FISH will do the same in the query and update phases, and get their new position and velocity states. Once they get the new states, they can update new positions with the new velocity. And then the next tick begins, and so on. And consecutively you will see a school of moving fish.
  18. NOW let’s look at an EXAMPLE of how to process a tick in our FISH SIMULATION in the STATE-EFFECT PATTERN. First in the QUERY phase, our GRAY FISH here, for example, will write REPULSION and ATTRACTION effects to THE BLACK FISH. So using the vector addition combinatior, at the end of the QUERY phase we have the aggregated effects collected by the black fish. (click) Then in the UPDATE phase, the black fish compute its NEW VELOCITY by summing the COMBINED REPULSION and ATTRACTION with the OLD VELOCITY. The fish’s NEW POSITION is then obtained from the OLD POSITION with the OLD VELOCITY. And also, all the other GRAY FISH will do the same in the query and update phases, and get their new position and velocity states. Once they get the new states, they can update new positions with the new velocity. And then the next tick begins, and so on. And consecutively you will see a school of moving fish.
  19. NOW let’s look at an EXAMPLE of how to process a tick in our FISH SIMULATION in the STATE-EFFECT PATTERN. First in the QUERY phase, our GRAY FISH here, for example, will write REPULSION and ATTRACTION effects to THE BLACK FISH. So using the vector addition combinatior, at the end of the QUERY phase we have the aggregated effects collected by the black fish. (click) Then in the UPDATE phase, the black fish compute its NEW VELOCITY by summing the COMBINED REPULSION and ATTRACTION with the OLD VELOCITY. The fish’s NEW POSITION is then obtained from the OLD POSITION with the OLD VELOCITY. And also, all the other GRAY FISH will do the same in the query and update phases, and get their new position and velocity states. Once they get the new states, they can update new positions with the new velocity. And then the next tick begins, and so on. And consecutively you will see a school of moving fish.
  20. NOW let’s look at an EXAMPLE of how to process a tick in our FISH SIMULATION in the STATE-EFFECT PATTERN. First in the QUERY phase, our GRAY FISH here, for example, will write REPULSION and ATTRACTION effects to THE BLACK FISH. So using the vector addition combinatior, at the end of the QUERY phase we have the aggregated effects collected by the black fish. (click) Then in the UPDATE phase, the black fish compute its NEW VELOCITY by summing the COMBINED REPULSION and ATTRACTION with the OLD VELOCITY. The fish’s NEW POSITION is then obtained from the OLD POSITION with the OLD VELOCITY. And also, all the other GRAY FISH will do the same in the query and update phases, and get their new position and velocity states. Once they get the new states, they can update new positions with the new velocity. And then the next tick begins, and so on. And consecutively you will see a school of moving fish.
  21. NOW let’s look at an EXAMPLE of how to process a tick in our FISH SIMULATION in the STATE-EFFECT PATTERN. First in the QUERY phase, our GRAY FISH here, for example, will write REPULSION and ATTRACTION effects to THE BLACK FISH. So using the vector addition combinatior, at the end of the QUERY phase we have the aggregated effects collected by the black fish. (click) Then in the UPDATE phase, the black fish compute its NEW VELOCITY by summing the COMBINED REPULSION and ATTRACTION with the OLD VELOCITY. The fish’s NEW POSITION is then obtained from the OLD POSITION with the OLD VELOCITY. And also, all the other GRAY FISH will do the same in the query and update phases, and get their new position and velocity states. Once they get the new states, they can update new positions with the new velocity. And then the next tick begins, and so on. And consecutively you will see a school of moving fish.
  22. So, to make it EASY for domain scientists to write simulations in the STATE-EFFECT PATTERN, we created a new language called BRASIL. BRASIL is an object-oriented language which allows domain scientists to PROGRAM the behavior of INDIVIDUAL agents.
  23. Now to give an example of BRASIL, we use the AVOIDANCE logic of the fish simulation. As you can see, in BRASIL, agents are defined in classes. And EVERY variable of EVERY class is tagged as either STATE or EFFECT. For example, the fish’s X position here is a STATE. We associate with state fields an UPDATE RULE, that calculates in this case the new X position from the OLD POSITION and the OLD VELOCITY. In addition, we MAY associate a VISIBILITY range with a given STATE variable. This is the parameter ALPHA that we have seen before. Now consider an EFFECT variable, AVOIDX, that collects all REPULSION forces from other fish in the X dimension. We associate with this effect a COMBINATOR function. In this case, we use SUM.
  24. Now to give an example of BRASIL, we use the AVOIDANCE logic of the fish simulation. As you can see, in BRASIL, agents are defined in classes. And EVERY variable of EVERY class is tagged as either STATE or EFFECT. For example, the fish’s X position here is a STATE. We associate with state fields an UPDATE RULE, that calculates in this case the new X position from the OLD POSITION and the OLD VELOCITY. In addition, we MAY associate a VISIBILITY range with a given STATE variable. This is the parameter ALPHA that we have seen before. Now consider an EFFECT variable, AVOIDX, that collects all REPULSION forces from other fish in the X dimension. We associate with this effect a COMBINATOR function. In this case, we use SUM.
  25. Now to give an example of BRASIL, we use the AVOIDANCE logic of the fish simulation. As you can see, in BRASIL, agents are defined in classes. And EVERY variable of EVERY class is tagged as either STATE or EFFECT. For example, the fish’s X position here is a STATE. We associate with state fields an UPDATE RULE, that calculates in this case the new X position from the OLD POSITION and the OLD VELOCITY. In addition, we MAY associate a VISIBILITY range with a given STATE variable. This is the parameter ALPHA that we have seen before. Now consider an EFFECT variable, AVOIDX, that collects all REPULSION forces from other fish in the X dimension. We associate with this effect a COMBINATOR function. In this case, we use SUM.
  26. Now to give an example of BRASIL, we use the AVOIDANCE logic of the fish simulation. As you can see, in BRASIL, agents are defined in classes. And EVERY variable of EVERY class is tagged as either STATE or EFFECT. For example, the fish’s X position here is a STATE. We associate with state fields an UPDATE RULE, that calculates in this case the new X position from the OLD POSITION and the OLD VELOCITY. In addition, we MAY associate a VISIBILITY range with a given STATE variable. This is the parameter ALPHA that we have seen before. Now consider an EFFECT variable, AVOIDX, that collects all REPULSION forces from other fish in the X dimension. We associate with this effect a COMBINATOR function. In this case, we use SUM.
  27. Now to give an example of BRASIL, we use the AVOIDANCE logic of the fish simulation. As you can see, in BRASIL, agents are defined in classes. And EVERY variable of EVERY class is tagged as either STATE or EFFECT. For example, the fish’s X position here is a STATE. We associate with state fields an UPDATE RULE, that calculates in this case the new X position from the OLD POSITION and the OLD VELOCITY. In addition, we MAY associate a VISIBILITY range with a given STATE variable. This is the parameter ALPHA that we have seen before. Now consider an EFFECT variable, AVOIDX, that collects all REPULSION forces from other fish in the X dimension. We associate with this effect a COMBINATOR function. In this case, we use SUM.
  28. So we have seen how to express these simulation of INDIVIDUALS in the state-effect pattern. Now let’s see how to SCALE the state-effect pattern to large scenarios, which are actually the really interesting scenarios for behavioral simulations.
  29. As you may be thinking, REALLY LARGE scenarios exceed the capability of a SINGLE node. So we want to use SEVERAL nodes in a cluster of machines to run those scenarios. As we have shown before, simulations written in BRASIL can be compiled into the monad algebra, so the idea is to further map this algebraic computation as a dataflow computation. And once TALKING ABOUT DATAFLOWS, we know that we can parallelize it using the MapReduce model.
  30. However, as you may be thinking, REALLY LARGE scenarios exceed the capability of a SINGLE node. So we want to use SEVERAL nodes in a cluster of machines to run those scenarios. As we have shown before, simulations written in BRASIL can be compiled into the monad algebra, so the idea is to further map this algebraic computation as a dataflow computation. And once TALKING ABOUT DATAFLOWS, we know that we can parallelize it using the MapReduce model.
  31. However, as you may be thinking, REALLY LARGE scenarios exceed the capability of a SINGLE node. So we want to use SEVERAL nodes in a cluster of machines to run those scenarios. As we have shown before, simulations written in BRASIL can be compiled into the monad algebra, so the idea is to further map this algebraic computation as a dataflow computation. And once TALKING ABOUT DATAFLOWS, we know that we can parallelize it using the MapReduce model.
  32. By looking at this extended STATE-EFFECT pattern, you may already get a feeling how it could be EASILY expressed in MAPREDUCE. So let’s do it: First, we have a MAPPER that will DISTRIBUTE data across partitions. The data is passed to a reducer that executes the QUERY phase and does the EFFECT ASSIGNMENTS. Note these assignments may be PARTIAL, as there may be to agents in OTHER partitions. So we need to forward these EFFECTS through a mapper to another REDUCER, which AGGREGATES all EFFECTS. Now we have all the EFFECTS AGGREGATED. So we can give them to another mapper, that executes the UPDATE PHASE for each agents as a key-value pair and distributes the data again for a NEW TICK. Now with this implementation, we can see that each tick has two map and reduce passes. However, the second map and reduce are only used to forward and aggregate non-local effects.
  33. By looking at this extended STATE-EFFECT pattern, you may already get a feeling how it could be EASILY expressed in MAPREDUCE. So let’s do it: First, we have a MAPPER that will DISTRIBUTE data across partitions. The data is passed to a reducer that executes the QUERY phase and does the EFFECT ASSIGNMENTS. Note these assignments may be PARTIAL, as there may be to agents in OTHER partitions. So we need to forward these EFFECTS through a mapper to another REDUCER, which AGGREGATES all EFFECTS. Now we have all the EFFECTS AGGREGATED. So we can give them to another mapper, that executes the UPDATE PHASE for each agents as a key-value pair and distributes the data again for a NEW TICK. Now with this implementation, we can see that each tick has two map and reduce passes. However, the second map and reduce are only used to forward and aggregate non-local effects.
  34. By looking at this extended STATE-EFFECT pattern, you may already get a feeling how it could be EASILY expressed in MAPREDUCE. So let’s do it: First, we have a MAPPER that will DISTRIBUTE data across partitions. The data is passed to a reducer that executes the QUERY phase and does the EFFECT ASSIGNMENTS. Note these assignments may be PARTIAL, as there may be to agents in OTHER partitions. So we need to forward these EFFECTS through a mapper to another REDUCER, which AGGREGATES all EFFECTS. Now we have all the EFFECTS AGGREGATED. So we can give them to another mapper, that executes the UPDATE PHASE for each agents as a key-value pair and distributes the data again for a NEW TICK. Now with this implementation, we can see that each tick has two map and reduce passes. However, the second map and reduce are only used to forward and aggregate non-local effects.
  35. By looking at this extended STATE-EFFECT pattern, you may already get a feeling how it could be EASILY expressed in MAPREDUCE. So let’s do it: First, we have a MAPPER that will DISTRIBUTE data across partitions. The data is passed to a reducer that executes the QUERY phase and does the EFFECT ASSIGNMENTS. Note these assignments may be PARTIAL, as there may be to agents in OTHER partitions. So we need to forward these EFFECTS through a mapper to another REDUCER, which AGGREGATES all EFFECTS. Now we have all the EFFECTS AGGREGATED. So we can give them to another mapper, that executes the UPDATE PHASE for each agents as a key-value pair and distributes the data again for a NEW TICK. Now with this implementation, we can see that each tick has two map and reduce passes. However, the second map and reduce are only used to forward and aggregate non-local effects.
  36. By looking at this extended STATE-EFFECT pattern, you may already get a feeling how it could be EASILY expressed in MAPREDUCE. So let’s do it: First, we have a MAPPER that will DISTRIBUTE data across partitions. The data is passed to a reducer that executes the QUERY phase and does the EFFECT ASSIGNMENTS. Note these assignments may be PARTIAL, as there may be to agents in OTHER partitions. So we need to forward these EFFECTS through a mapper to another REDUCER, which AGGREGATES all EFFECTS. Now we have all the EFFECTS AGGREGATED. So we can give them to another mapper, that executes the UPDATE PHASE for each agents as a key-value pair and distributes the data again for a NEW TICK. Now with this implementation, we can see that each tick has two map and reduce passes. However, the second map and reduce are only used to forward and aggregate non-local effects.
  37. So, we have built BRACE, a SPECIAL-PURPOSE MapReduce runtime for behavioral simulations. In BRACE we have TWO types of optimizations to get performance. BASIC OPTIMIZATIONS are the ones we would ALWAYS do for an iterated MapReduce model. For example, writing data to disk at every MapReduce pass DOES NOT MAKE ANY SENSE, so we do not do it. The more interesting optimizations are those that exploit many PROPERTIES of behavioral simulations, such as the SPATIAL PROPERTIES. And I will show some of them in the next few slides.
  38. First, we will use a SPATIAL PARTITIONING function to partition the data, for example, we can use a grid partitioning scheme, as we show in this slide. And each partition is assigned to a NODE in the cluster. (QUICK CHANGE)
  39. Which we call it the owned region of the node. In other words, the node controls all the agents falling this this partition, and compute their query and update phases. In this example, the green partition is the owned region, and the three agents falling in this region are assigned to the node.
  40. However, during the query phase, the node may also need to know other agents not belong to it, because some of its owned agents may need to query them. Therefore we define a larger region, shown as the BLUE region in this example as its VISIBLE REGION. During the query phase, some owned agents here may possibly read the these visible agents’ states, and even assign values to their effects, which we called the NON-LOCAL effects. Note here the non-local effects do not only means the effects assigned to agents owned by other nodes, but to any OTHER agents that not itself. The problem is that as long as there are non-local effect assignments, they could possibly go to nodes in different nodes. Therefore, nodes need to communicate with each other to REPLICATE the visible agent states at the beginning of the query phase, specified as the blue agents in this figure. Also at the END of the query phase, they need to communicate again to aggregate those non-local effect assignments. However, because of the spatial partitioning, a node only needs to communicate with its SPATIAL NEIGHBORS. This is one benefit we can get from the spatial properties of behavioral simulations.
  41. However, during the query phase, the node may also need to know other agents not belong to it, because some of its owned agents may need to query them. Therefore we define a larger region, shown as the BLUE region in this example as its VISIBLE REGION. During the query phase, some owned agents here may possibly read the these visible agents’ states, and even assign values to their effects, which we called the NON-LOCAL effects. Note here the non-local effects do not only means the effects assigned to agents owned by other nodes, but to any OTHER agents that not itself. The problem is that as long as there are non-local effect assignments, they could possibly go to nodes in different nodes. Therefore, nodes need to communicate with each other to REPLICATE the visible agent states at the beginning of the query phase, specified as the blue agents in this figure. Also at the END of the query phase, they need to communicate again to aggregate those non-local effect assignments. However, because of the spatial partitioning, a node only needs to communicate with its SPATIAL NEIGHBORS. This is one benefit we can get from the spatial properties of behavioral simulations.
  42. However, during the query phase, the node may also need to know other agents not belong to it, because some of its owned agents may need to query them. Therefore we define a larger region, shown as the BLUE region in this example as its VISIBLE REGION. During the query phase, some owned agents here may possibly read the these visible agents’ states, and even assign values to their effects, which we called the NON-LOCAL effects. Note here the non-local effects do not only means the effects assigned to agents owned by other nodes, but to any OTHER agents that not itself. The problem is that as long as there are non-local effect assignments, they could possibly go to nodes in different nodes. Therefore, nodes need to communicate with each other to REPLICATE the visible agent states at the beginning of the query phase, specified as the blue agents in this figure. Also at the END of the query phase, they need to communicate again to aggregate those non-local effect assignments. However, because of the spatial partitioning, a node only needs to communicate with its SPATIAL NEIGHBORS. This is one benefit we can get from the spatial properties of behavioral simulations.
  43. However, during the query phase, the node may also need to know other agents not belong to it, because some of its owned agents may need to query them. Therefore we define a larger region, shown as the BLUE region in this example as its VISIBLE REGION. During the query phase, some owned agents here may possibly read the these visible agents’ states, and even assign values to their effects, which we called the NON-LOCAL effects. Note here the non-local effects do not only means the effects assigned to agents owned by other nodes, but to any OTHER agents that not itself. The problem is that as long as there are non-local effect assignments, they could possibly go to nodes in different nodes. Therefore, nodes need to communicate with each other to REPLICATE the visible agent states at the beginning of the query phase, specified as the blue agents in this figure. Also at the END of the query phase, they need to communicate again to aggregate those non-local effect assignments. However, because of the spatial partitioning, a node only needs to communicate with its SPATIAL NEIGHBORS. This is one benefit we can get from the spatial properties of behavioral simulations.
  44. However, during the query phase, the node may also need to know other agents not belong to it, because some of its owned agents may need to query them. Therefore we define a larger region, shown as the BLUE region in this example as its VISIBLE REGION. During the query phase, some owned agents here may possibly read the these visible agents’ states, and even assign values to their effects, which we called the NON-LOCAL effects. Note here the non-local effects do not only means the effects assigned to agents owned by other nodes, but to any OTHER agents that not itself. The problem is that as long as there are non-local effect assignments, they could possibly go to nodes in different nodes. Therefore, nodes need to communicate with each other to REPLICATE the visible agent states at the beginning of the query phase, specified as the blue agents in this figure. Also at the END of the query phase, they need to communicate again to aggregate those non-local effect assignments. However, because of the spatial partitioning, a node only needs to communicate with its SPATIAL NEIGHBORS. This is one benefit we can get from the spatial properties of behavioral simulations.
  45. However, during the query phase, the node may also need to know other agents not belong to it, because some of its owned agents may need to query them. Therefore we define a larger region, shown as the BLUE region in this example as its VISIBLE REGION. During the query phase, some owned agents here may possibly read the these visible agents’ states, and even assign values to their effects, which we called the NON-LOCAL effects. Note here the non-local effects do not only means the effects assigned to agents owned by other nodes, but to any OTHER agents that not itself. The problem is that as long as there are non-local effect assignments, they could possibly go to nodes in different nodes. Therefore, nodes need to communicate with each other to REPLICATE the visible agent states at the beginning of the query phase, specified as the blue agents in this figure. Also at the END of the query phase, they need to communicate again to aggregate those non-local effect assignments. However, because of the spatial partitioning, a node only needs to communicate with its SPATIAL NEIGHBORS. This is one benefit we can get from the spatial properties of behavioral simulations.
  46. However, during the query phase, the node may also need to know other agents not belong to it, because some of its owned agents may need to query them. Therefore we define a larger region, shown as the BLUE region in this example as its VISIBLE REGION. During the query phase, some owned agents here may possibly read the these visible agents’ states, and even assign values to their effects, which we called the NON-LOCAL effects. Note here the non-local effects do not only means the effects assigned to agents owned by other nodes, but to any OTHER agents that not itself. The problem is that as long as there are non-local effect assignments, they could possibly go to nodes in different nodes. Therefore, nodes need to communicate with each other to REPLICATE the visible agent states at the beginning of the query phase, specified as the blue agents in this figure. Also at the END of the query phase, they need to communicate again to aggregate those non-local effect assignments. However, because of the spatial partitioning, a node only needs to communicate with its SPATIAL NEIGHBORS. This is one benefit we can get from the spatial properties of behavioral simulations.
  47. However, during the query phase, the node may also need to know other agents not belong to it, because some of its owned agents may need to query them. Therefore we define a larger region, shown as the BLUE region in this example as its VISIBLE REGION. During the query phase, some owned agents here may possibly read the these visible agents’ states, and even assign values to their effects, which we called the NON-LOCAL effects. Note here the non-local effects do not only means the effects assigned to agents owned by other nodes, but to any OTHER agents that not itself. The problem is that as long as there are non-local effect assignments, they could possibly go to nodes in different nodes. Therefore, nodes need to communicate with each other to REPLICATE the visible agent states at the beginning of the query phase, specified as the blue agents in this figure. Also at the END of the query phase, they need to communicate again to aggregate those non-local effect assignments. However, because of the spatial partitioning, a node only needs to communicate with its SPATIAL NEIGHBORS. This is one benefit we can get from the spatial properties of behavioral simulations.
  48. So if the simulation do not have any non-local effect assignments, we can remove these two tasks. and then each tick will only has one communication phase. Therefore, given a simulation with NON-LOCAL EFFECTS, it would be GREAT if we can transform it into a simulation with the same semantics but with LOCAL EFFECTS ONLY.
  49. So if the simulation do not have any non-local effect assignments, we can remove these two tasks. and then each tick will only has one communication phase. Therefore, given a simulation with NON-LOCAL EFFECTS, it would be GREAT if we can transform it into a simulation with the same semantics but with LOCAL EFFECTS ONLY.
  50. In fact, we can show that IN BRASIL such a transformation is ALWAYS possible. Let me give you the INTUITION behind this.
  51. Consider we start with a simulation in which our gray fish HERE do NON-LOCAL effect writes to our BLACK fish. In order to COMPUTE these non-local writes, the GRAY fish had to execute some PIECE OF CODE. The core idea of effect inversion is to TAKE THIS PIECE OF CODE, and have it be executed by the BLACK fish INSTEAD. So then, the black fish will compute the non-local effect assignments that the gray fish WOULD have computed and then can do those assignments as LOCAL assignments. HOWEVER, now the piece of code in the black fish may need to READ STATE from the gray fish. EVEN MORE, it may need to read state that the GRAY fish can read, namely from the BLUE fish HERE. So that means that now the BLACK fish needs TWICE its original visibility range, TWO ALPHA. So we PAY for our one less round of communication with LARGER VISIBILITY.
  52. Consider we start with a simulation in which our gray fish HERE do NON-LOCAL effect writes to our BLACK fish. In order to COMPUTE these non-local writes, the GRAY fish had to execute some PIECE OF CODE. The core idea of effect inversion is to TAKE THIS PIECE OF CODE, and have it be executed by the BLACK fish INSTEAD. So then, the black fish will compute the non-local effect assignments that the gray fish WOULD have computed and then can do those assignments as LOCAL assignments. HOWEVER, now the piece of code in the black fish may need to READ STATE from the gray fish. EVEN MORE, it may need to read state that the GRAY fish can read, namely from the BLUE fish HERE. So that means that now the BLACK fish needs TWICE its original visibility range, TWO ALPHA. So we PAY for our one less round of communication with LARGER VISIBILITY.
  53. So I have shown how to these express behavioral simulations of COMPLEX SYSTEMS in the STATE-EFFECT PATTERN, and show how to SCALE UP the state-effect pattern in BRACE. Let me show you now some EXPERIMENTS.
  54. We implemented a PROTOTYPE of BRACE and ran our experiments in the CORNELL WEB LAB CLUSTER. It’s a cluster of 60 nodes, each with 16GB of RAM.
  55. We implemented several simulation models in BRASIL. First a traffic simulation that is a best effort reimplementation of the MITSIM lane changing and car following models. MITSIM is a state-of-the-art SINGLE-NODE microsimulator and has very detailed models. We simulate here a large patch of HIGHWAY and measure LANE CHANGING behavior. We also implemented a simulation in which you may think that you have an environment with a bunch of BACTERIA and they reproduce until they take the WHOLE SPACE. But then when it gets too crowded they start killing each other. So the simulation reaches a STATIONARY, UNIFORM distribution. The third simulation I will show today is the FISH simulation we have been talking about THROUGHOUT THE TALK.
  56. In this graph, I am scaling the NUMBER OF NODES in the X axis and plotting it against the THROUGHPUT in AGENT TICKS per SECOND. As we scale the number of nodes, we also scale the SIZE OF THE HIGHWAY being simulated. So it is a SCALE-UP, not a SPEED-UP experiment. Basically, we see that BRACE has ALMOST LINEAR scalability in this simulation. The NOTCH in the graph is a consequence of the IP ROUTING architecture of the WEB LAB CLUSTER, as some nodes are in a subnet that is externally accessible and have HIGHER LATENCY to access.
  57. With the BACTERIA simulation, I show you a 16-NODE experiment illustrating the benefits of INDEXING and EFFECT INVERSION. In this case, either optimization brought a gain of about 20% to performance. And in addition, you can see that the gains of the two techniques can ADD UP.
  58. Finally, I want to show you an experiment on the effect of our SPATIAL PARTITIONING through periodic REPARTITIONING. In this simulation, we have a big FISH SCHOOL that breaks up into TWO PARTS that swim in OPPOSITE DIRECTIONS. So if we keep the original partitioning, eventually TWO NODES will be simulating ALL of the agents. Now if we do REPARTITIONING every once in a while, then we can utilize ALL 16 NODES in the cluster.
  59. NOW let me WRAP UP the talk. I have shown you what are BEHAVIORAL SIMULATIONS. These simulations of COMPLEX SYSTEMS that can have a lot of impact in ECONOMIC and ENVIRONMENTAL issues. However, they need to SCALE the to LARGE SCENARIOS. For that I have shown you a NEW PROGRAMMING ENVIRONMENT for behavioral simulations. Our PLATFORM is both EASY TO PROGRAM and SCALABLE. To make it EASY TO PROGRAM, we show how to express simulations in the STATE-EFFECT PATTERN and code them in BRASIL. To make it SCALABLE, we have shown how to translate STATE-EFFECT into MAPREDUCE and how to optimize this in a SPECIAL-PURPOSE MapReduce runtime called BRACE. THANK YOU for your attention, I am open to questions.
  60. NOW let me WRAP UP the talk. I have shown you what are BEHAVIORAL SIMULATIONS. These simulations of COMPLEX SYSTEMS that can have a lot of impact in ECONOMIC and ENVIRONMENTAL issues. However, they need to SCALE the to LARGE SCENARIOS. For that I have shown you a NEW PROGRAMMING ENVIRONMENT for behavioral simulations. Our PLATFORM is both EASY TO PROGRAM and SCALABLE. To make it EASY TO PROGRAM, we show how to express simulations in the STATE-EFFECT PATTERN and code them in BRASIL. To make it SCALABLE, we have shown how to translate STATE-EFFECT into MAPREDUCE and how to optimize this in a SPECIAL-PURPOSE MapReduce runtime called BRACE. THANK YOU for your attention, I am open to questions.
  61. Given these results on the WEBLAB cluster, we thought: LET’S SCALE BRACE UP TO BILLIONS OF FISH IN THE CLOUD!!! We have done all of these OPTIMIZATIONS and they work so well in our PRIVATE CLUSTER. So we took BRACE and ran it on Amazon EC2. The performance was TERRIBLE!! Here are some reasons: First, we observed very LARGE and VARIABLE latency. This problem is ameliorated with Amazon’s new service, but it is still QUITE BAD (WE MEASURED). So we are working with LATENCY COMPENSATION TECHNIQUES to deal with this problem. The cloud also has a large number of UNRELIABLE nodes, so you absolutely need some mechanism for FAULT TOLERANCE. We are working here on very LOW OVERHEAD checkpointing techniques so that we can avoid as much as possible wasting cycles. Finally, WE DO NOT LIKE TO SPEND MONEY! So we are looking at ways to allocate task in the cloud so as to run the simulation as CHEAPLY as possible.
  62. The design of the BRACE parallel runtime comes from the state-effect pattern. In the query phase, agents will access data from other agents who fall in its visibility range. If we put the agents as rows in a table, we can see that the query phase can be handled much like a spatial join in database literature. Take the previous fish example, in the query phase, every fish can firstly submit a spatial join query which pairs itself with every one of its neighbor fish, and then use the specified combinator as the aggregation function to generate the effect values grouping by their ids. Here the aX and aY are the aggregated effect values, and the C is the count of how many neighbor agents are queries. Therefore, since every tick of the simulation has one query phase, we can characterize the whole behavioral simulations as iterated spatial joins. By doing so we can represent the simulation computation as a sequence of iterated spatial join operators, and further model it as a dataflow program. We know that such a data flow program can be parallelized in the MapReduce model, and we will show how in the next slide. ------- Big leap!
  63. The design of the BRACE parallel runtime comes from the state-effect pattern. In the query phase, agents will access data from other agents who fall in its visibility range. If we put the agents as rows in a table, we can see that the query phase can be handled much like a spatial join in database literature. Take the previous fish example, in the query phase, every fish can firstly submit a spatial join query which pairs itself with every one of its neighbor fish, and then use the specified combinator as the aggregation function to generate the effect values grouping by their ids. Here the aX and aY are the aggregated effect values, and the C is the count of how many neighbor agents are queries. Therefore, since every tick of the simulation has one query phase, we can characterize the whole behavioral simulations as iterated spatial joins. By doing so we can represent the simulation computation as a sequence of iterated spatial join operators, and further model it as a dataflow program. We know that such a data flow program can be parallelized in the MapReduce model, and we will show how in the next slide. ------- Big leap!
  64. The design of the BRACE parallel runtime comes from the state-effect pattern. In the query phase, agents will access data from other agents who fall in its visibility range. If we put the agents as rows in a table, we can see that the query phase can be handled much like a spatial join in database literature. Take the previous fish example, in the query phase, every fish can firstly submit a spatial join query which pairs itself with every one of its neighbor fish, and then use the specified combinator as the aggregation function to generate the effect values grouping by their ids. Here the aX and aY are the aggregated effect values, and the C is the count of how many neighbor agents are queries. Therefore, since every tick of the simulation has one query phase, we can characterize the whole behavioral simulations as iterated spatial joins. By doing so we can represent the simulation computation as a sequence of iterated spatial join operators, and further model it as a dataflow program. We know that such a data flow program can be parallelized in the MapReduce model, and we will show how in the next slide. ------- Big leap!
  65. The design of the BRACE parallel runtime comes from the state-effect pattern. In the query phase, agents will access data from other agents who fall in its visibility range. If we put the agents as rows in a table, we can see that the query phase can be handled much like a spatial join in database literature. Take the previous fish example, in the query phase, every fish can firstly submit a spatial join query which pairs itself with every one of its neighbor fish, and then use the specified combinator as the aggregation function to generate the effect values grouping by their ids. Here the aX and aY are the aggregated effect values, and the C is the count of how many neighbor agents are queries. Therefore, since every tick of the simulation has one query phase, we can characterize the whole behavioral simulations as iterated spatial joins. By doing so we can represent the simulation computation as a sequence of iterated spatial join operators, and further model it as a dataflow program. We know that such a data flow program can be parallelized in the MapReduce model, and we will show how in the next slide. ------- Big leap!
  66. The design of the BRACE parallel runtime comes from the state-effect pattern. In the query phase, agents will access data from other agents who fall in its visibility range. If we put the agents as rows in a table, we can see that the query phase can be handled much like a spatial join in database literature. Take the previous fish example, in the query phase, every fish can firstly submit a spatial join query which pairs itself with every one of its neighbor fish, and then use the specified combinator as the aggregation function to generate the effect values grouping by their ids. Here the aX and aY are the aggregated effect values, and the C is the count of how many neighbor agents are queries. Therefore, since every tick of the simulation has one query phase, we can characterize the whole behavioral simulations as iterated spatial joins. By doing so we can represent the simulation computation as a sequence of iterated spatial join operators, and further model it as a dataflow program. We know that such a data flow program can be parallelized in the MapReduce model, and we will show how in the next slide. ------- Big leap!
  67. The design of the BRACE parallel runtime comes from the state-effect pattern. In the query phase, agents will access data from other agents who fall in its visibility range. If we put the agents as rows in a table, we can see that the query phase can be handled much like a spatial join in database literature. Take the previous fish example, in the query phase, every fish can firstly submit a spatial join query which pairs itself with every one of its neighbor fish, and then use the specified combinator as the aggregation function to generate the effect values grouping by their ids. Here the aX and aY are the aggregated effect values, and the C is the count of how many neighbor agents are queries. Therefore, since every tick of the simulation has one query phase, we can characterize the whole behavioral simulations as iterated spatial joins. By doing so we can represent the simulation computation as a sequence of iterated spatial join operators, and further model it as a dataflow program. We know that such a data flow program can be parallelized in the MapReduce model, and we will show how in the next slide. ------- Big leap!
  68. The design of the BRACE parallel runtime comes from the state-effect pattern. In the query phase, agents will access data from other agents who fall in its visibility range. If we put the agents as rows in a table, we can see that the query phase can be handled much like a spatial join in database literature. Take the previous fish example, in the query phase, every fish can firstly submit a spatial join query which pairs itself with every one of its neighbor fish, and then use the specified combinator as the aggregation function to generate the effect values grouping by their ids. Here the aX and aY are the aggregated effect values, and the C is the count of how many neighbor agents are queries. Therefore, since every tick of the simulation has one query phase, we can characterize the whole behavioral simulations as iterated spatial joins. By doing so we can represent the simulation computation as a sequence of iterated spatial join operators, and further model it as a dataflow program. We know that such a data flow program can be parallelized in the MapReduce model, and we will show how in the next slide. ------- Big leap!
  69. Which makes BRASIL even cooler is that it is not only EASY TO PROGRAM, it also compiles to a DATABASE algebra, the MONAD algebra. So to give you a simple example, the query phase you have seen before can be actually translated into a spatial self join. This means that we can APPLY all sorts of DATABASE optimizations such as indexing, join reordering, etc, to improve the performance. For the details of our translation to the MONAD algebra, check our VLDB 2010 paper coming soon.
  70. Which makes BRASIL even cooler is that it is not only EASY TO PROGRAM, it also compiles to a DATABASE algebra, the MONAD algebra. So to give you a simple example, the query phase you have seen before can be actually translated into a spatial self join. This means that we can APPLY all sorts of DATABASE optimizations such as indexing, join reordering, etc, to improve the performance. For the details of our translation to the MONAD algebra, check our VLDB 2010 paper coming soon.
  71. Which makes BRASIL even cooler is that it is not only EASY TO PROGRAM, it also compiles to a DATABASE algebra, the MONAD algebra. So to give you a simple example, the query phase you have seen before can be actually translated into a spatial self join. This means that we can APPLY all sorts of DATABASE optimizations such as indexing, join reordering, etc, to improve the performance. For the details of our translation to the MONAD algebra, check our VLDB 2010 paper coming soon.
  72. Which makes BRASIL even cooler is that it is not only EASY TO PROGRAM, it also compiles to a DATABASE algebra, the MONAD algebra. So to give you a simple example, the query phase you have seen before can be actually translated into a spatial self join. This means that we can APPLY all sorts of DATABASE optimizations such as indexing, join reordering, etc, to improve the performance. For the details of our translation to the MONAD algebra, check our VLDB 2010 paper coming soon.
  73. NOW let’s look at an EXAMPLE of a tick in our FISH SIMULATION. First our GRAY FISH here, in the QUERY phase, will write REPULSION and ATTRACTION effects to all fish within range ALPHA and RHO, respectively. So using the vector addition combination function, at the end of the QUERY phase we have the aggregated effects collected from other fish. We then execute the UPDATE phase. In this phase, we calculate the fish’s NEW VELOCITY by summing the COMBINED REPULSION with the COMBINE ATTRACTION with the OLD VELOCITY. The fish’s NEW POSITION is obtained by summing the OLD POSITION with the OLD VELOCITY. After that all the fish will update to their new positions with the new velocity. And then the next tick begins, and SO ON.
  74. NOW let’s look at an EXAMPLE of a tick in our FISH SIMULATION. First our GRAY FISH here, in the QUERY phase, will write REPULSION and ATTRACTION effects to all fish within range ALPHA and RHO, respectively. So using the vector addition combination function, at the end of the QUERY phase we have the aggregated effects collected from other fish. We then execute the UPDATE phase. In this phase, we calculate the fish’s NEW VELOCITY by summing the COMBINED REPULSION with the COMBINE ATTRACTION with the OLD VELOCITY. The fish’s NEW POSITION is obtained by summing the OLD POSITION with the OLD VELOCITY. After that all the fish will update to their new positions with the new velocity. And then the next tick begins, and SO ON.
  75. NOW let’s look at an EXAMPLE of a tick in our FISH SIMULATION. First our GRAY FISH here, in the QUERY phase, will write REPULSION and ATTRACTION effects to all fish within range ALPHA and RHO, respectively. So using the vector addition combination function, at the end of the QUERY phase we have the aggregated effects collected from other fish. We then execute the UPDATE phase. In this phase, we calculate the fish’s NEW VELOCITY by summing the COMBINED REPULSION with the COMBINE ATTRACTION with the OLD VELOCITY. The fish’s NEW POSITION is obtained by summing the OLD POSITION with the OLD VELOCITY. After that all the fish will update to their new positions with the new velocity. And then the next tick begins, and SO ON.
  76. NOW let’s look at an EXAMPLE of a tick in our FISH SIMULATION. First our GRAY FISH here, in the QUERY phase, will write REPULSION and ATTRACTION effects to all fish within range ALPHA and RHO, respectively. So using the vector addition combination function, at the end of the QUERY phase we have the aggregated effects collected from other fish. We then execute the UPDATE phase. In this phase, we calculate the fish’s NEW VELOCITY by summing the COMBINED REPULSION with the COMBINE ATTRACTION with the OLD VELOCITY. The fish’s NEW POSITION is obtained by summing the OLD POSITION with the OLD VELOCITY. After that all the fish will update to their new positions with the new velocity. And then the next tick begins, and SO ON.
  77. NOW let’s look at an EXAMPLE of a tick in our FISH SIMULATION. First our GRAY FISH here, in the QUERY phase, will write REPULSION and ATTRACTION effects to all fish within range ALPHA and RHO, respectively. So using the vector addition combination function, at the end of the QUERY phase we have the aggregated effects collected from other fish. We then execute the UPDATE phase. In this phase, we calculate the fish’s NEW VELOCITY by summing the COMBINED REPULSION with the COMBINE ATTRACTION with the OLD VELOCITY. The fish’s NEW POSITION is obtained by summing the OLD POSITION with the OLD VELOCITY. After that all the fish will update to their new positions with the new velocity. And then the next tick begins, and SO ON.