28. How Voldemort was born ? Reference : 1) http://www.slideshare.net/bhupeshbansal/hadoop-user-group-jan2010 2) http://www.slideshare.net/adorepump/voldemort-nosql
Example: member data--does not make sense to repeatedly join positions, emails, groups, etc. Explain about joins How to better model in java? Json like data model
Example: member data--does not make sense to repeatedly join positions, emails, groups, etc. Explain about joins How to better model in java? Json like data model
Statistical learning as the ultimate agile development tool (Peter Norvig), “business logic” through data rather than code
No Joins Across data domains due to APIs Within data domains due to performance Natural operation: getAll(id…) Latency: if you want to call 30 services on your main pages, they better be quick (30 * 20ms = 600ms)
- Strong Consistency: all clients see the same view, even in presence of updates - High Availability: all clients can find some replica of the data, even in the presence of failures Partition-tolerance: the system properties hold even when the system is partitioned high availability : Mantra for websites Better to deal with inconsistencies, because their primary need is to scale well to allow for a smooth user experience.
Hashing .. Why do we need it ?? Basic problem : Clients need to know which data is where ?? Many ways of solving it Central configuration Hashing Linear hashing works : issue is when cluster is dynamic ?? KeyHash –node IDmapping change for a lot of entries When you add new slots Consistent hashing : preserves key –Node mapping for most of the keys and only change the minimal amount needed How to do it ?? Number of partitions ---------------------------- Arbitrary , each node is allocated many partitions (better load balancing and fault tolerance) Few hundreds to few thousands .. Key partition mapping is fixed and only ownership of partitions can change
Give example of read and writes with vector clocks Pros and cons vs paxos and 2pc User can supply strategy for handling cases where v1 and v2 are not comparable.
Fancy way of doing Optimistic locking
Very simple APIS NO Range Scans .. . No iterator on KeySet / Entry SET : Very hard to fix performance Have plans to provide such an iterator
Explain about partitions Make things fast by removing slow things, not by tuning HTTP client not performant Separate caching layer
Transfer time: 30 minutes Can max out a gb network, so be careful
Example: member data--does not make sense to repeatedly join positions, emails, groups, etc. Explain about joins How to better model in java? Json like data model