Hdfs as a storage system
Great file system
Works fantastic for map-reduce
Great adoption in enterprises
Example : Need to store all my customer documents
A few million customers each with a few thousand documents
Don’t need a directory structure
Need REST API as the primary access mechanism
Simple access semantics
Very large scale (billions of documents)
Wide range of object sizes
File System forces to think in terms of files and directories.
Two important questions
What is the partitioning scheme?
How does a storage container look like?