This is a set of slides from the Claremont Report on Database Research, see http://db.cs.berkeley.edu/claremont/ for more details. These particular slides are from a more in depth talk by "Eric A. Brewer." (Uploaded for discussion at the Stanford InfoBlog, http://infoblog.stanford.edu/.)
ICT role in 21st century education and its challenges
Claremont Report on Database Research: In Depth Talk (Eric A. Brewer)
1. Thought about it…
Most of my wish list hasn’t changed
much
An outside view… Sigmod 97 keynote about
search
CIDR 2003 keynote about new areas that
Prof. Eric A. Brewer don’t fit DBMS well
UC Berkeley So, some review, some new stuff
Intel Research (until July)
Proposal: Layered Database Example: Search Engines
Pros: No use of database technology
Enable new database-like things Things that would have been helpful:
Faster innovation for components High availability and replication
Many parallel experiments (like Linux) Atomic version vectors
Should be public domain ideally Tools for new declarative languages
Cons: Join machinery
Hard to ensure global properties Not needed:
But those that care will get them…
Complex locks, Query Optimization
Closest is Berkeley DB (?) Transactions, Redo, Undo
Example: Scientific Computing Other Misfits
Uses databases, but not a good fit Bioinformatics:
Data often stored in files Wrong operators
Most operators are outside the DBMS Need error propagation
Database is an expensive replicated file system Versioning, read mostly
(in/out but no joins)
App Servers:
Things that layered system might provide:
Session state, session migration
Multi-version storage system
New operators
App server will be a small database
Tools for new declarative languages Workflow
1
2. So what happened? Directions I’d like to see…
Accepted: one size does not fit all… Integrated notion of statistics
Couldn’t get much traction on layered Store the noise (rather than clean it)
database Create cleaner views
Built our own from scratch Core probabilistic queries
Stasis, Rusty Sears Move away from update-in-place
Open source, could be something special Many inputs are sacred (e.g. science)
But big picture largely unchanged Transactional versioning
Too hard to explore the fun spaces Provenance & annotation
But layering DID happen!
But whole database is now just transactional storage
Directions (2) Many Core
Better integration into PL Hard to get any performance benefit for
BASE semantics (not just ACID) I/O bound applications
Repeated automatic extraction Main memory DB??
Web crawlers do this Limited by off-chip bandwidth
Much of MapReduce workload Need dataflow optimizations on/off chip
Need to integrate with versioning,
provenance, statistics
Import is a continuous process, not an
event
Backup 1) Layering enables competition
Examples from OS community:
X86, SPEC benchmarks, Virtual machines
SCSI disks, RAID, NAS
Routers, Firewalls, Proxies
Some layers commodities (raw disks)
Some layers innovative (replication)
Always have unexpected uses
2
3. 2) Many more experiments 3) Reduces Time to Market
Centralized planning tries very few Lower cost of entry
things More important:
Just good enough!
Layering enables many more bets
Few global properties in early versions
Also enables VC funding The web, search engines, even e-commerce
Ex: IP layer, ASICs => networking startups P2P
Enables niche markets (lower cost of entry) WebMethods
Easier path for XML, bio, spatial, …. Global properties added over time!
Most bets fail, but some succeed Ugly but fast wins the race…
Claims Conclusions
If you can’t control, then enable Can’t control (or predict) the future…
This is the lesson from OS work for CIDR better to enable broad innovation
Unix, TCP enabled the web
Neither attempted to control usage
Control
Make global properties tractable
HTTP in turn enabled P2P But limits innovation
DB research suffers from “Albatross 9i”
Artifact hides the enabling technology A public domain layered database:
CIDR exists for this reason Would enable more innovation
Allow a broader range of properties
Rate of Innovation
Claim: layering increases innovation
1) Enables competition
2) Many more experiments
3) Reduces time to market
3