This is a talk on a fundamental approach to thinking about scalability, and how Hadoop, HBase, and Lucene are enabling companies to process amazing amounts of data. It's also about how Social Media is making the traditional RDBMS irrelevant.
4. Social Media and Scaling
•Scalability Matters Now.
•SM produces large, complex data
5. Social Media and Scaling
•Scalability Matters Now.
•SM produces large, complex data
•Anyone can collect the web
6. Social Media and Scaling
•Scalability Matters Now.
•SM produces large, complex data
•Anyone can collect the web
•Make a Twitter in a few days
7. Social Media and Scaling
•Scalability Matters Now.
•SM produces large, complex data
•Anyone can collect the web
•Make a Twitter in a few days
•Easy to get TBs of data
8. Social Media and Scaling
•Scalability Matters Now.
•SM produces large, complex data
•Anyone can collect the web
•Make a Twitter in a few days
•Easy to get TBs of data
•Big Data enabling new fields for
companies
44. Avoiding Impedance Mismatch
•Most problems can be divided into
High or Low latency
•Get a lot of data eventually, or a little
now
45. Avoiding Impedance Mismatch
•Most problems can be divided into
High or Low latency
•Get a lot of data eventually, or a little
now
•MapReduce vs. Sharding/Indexing
51. Hadoop + MR
•Special: Crunch web-scale data fast
•Sacrifice: Low-Latency, Transactions,
Random Access, Updates
52. Hadoop + MR
•Special: Crunch web-scale data fast
•Sacrifice: Low-Latency, Transactions,
Random Access, Updates
•Structure: Chunked flat files
53. Structured Processing Cluster
Enriched Data
Structured
Analysis
Unstructured Store in
Cluster HBase
Store in Search
Indexing
Hadoop Cluster
HBase
Records
Sharded
Lucene Index
Lucene Index
54. Document Structure
ContentID: 00BAC189
Title: Iron Maiden Rules
Body: I think Janick Gers is an amazing guitarist blah blah
PostDT: 20090718
ParentID: 0FDEADBEEF
Permalink: www.roadtofailure.com/post?=20