This document discusses parallel and distributed databases. Distributed databases can improve scalability, performance and resilience to failures. However, distributing a database also introduces complexity, as communication is needed between distributed components and failures become more common. The document describes different types of distributed database architectures and how data can be distributed and queries processed in a parallel manner. Maintaining consistency across distributed nodes is challenging and usually requires techniques like two-phase commit for distributed transactions.
3. Why distribute a database
Scalability and performance
Resilience to failures
Throughput
Data
size
versus
X X
4. Why distribute a database
Data is already distributed
Or needs to be distributed
Data is in multiple systems
5. Why not distribute a database
You must earn your complexity!
Communication needed
Must build a complex infrastructure
Unpredictable latencies must be masked
More types of failures
More components to fail
Network failures
Congestion, timeouts
More complex planning
Communication cost plus I/O cost
May have to deal with heterogeneity
Different types of systems
Different schemas, possibly incompatible
Different administrative domains
16. Server 1 Server 2 Server 3 Server 4
Bike $86
6/2/07 636353
Chair $10
6/5/07 662113
How to distribute the data?
Couch $570
6/1/07 424252
Car $1123
6/1/07 256623
Lamp $19
6/7/07 121113
Bike $56
6/9/07 887734
Scooter $18
6/11/07 252111
Hammer $8000
6/11/07 116458
17. How to distribute the data?
Hash partitioning Range partitioning
(key,value)
Hash()
(key,value)
<= X > X
18. Server 1 Server 2 Server 3 Server 4
How to distribute the data?
Bike
Chair
Couch
Car
Lamp
Bike
Scooter
Hammer
$86
$10
$570
$1123
$19
$56
$18
$8000
6/2/07
6/5/07
6/1/07
6/1/07
6/7/07
6/9/07
6/11/07
6/11/07
636353
662113
424252
256623
121113
887734
252111
116458
36. Conclusion
Parallelism and distribution very useful
Performance
Fault tolerance
Scale
But complex!
Rethink lots of aspects of the system
Must earn the complexity