2. Menu
Who am I?
Early adopters of Hadoop
Next generation use cases
Changing big data architectures
Art of the possible
My request
Questions
Appetiz
er
Main
Dessert
3. Who am I?
Google, Software Engineer
Personalized Search
Personalized Recommendations
WibiData, CTO
Real-time Personalization Platform
Customer Use Cases
42. What does this all mean?
The real value is in next generation “action”
use cases
The architecture for “action” is different
Design for your problem, since you don’t know
the art of the possible.
Requirements first, then technology
43. My Request
Stop building faster data warehouses.
You already understand your data.
Turn your understanding into action.
This talk is really about blind spots. I believe there are three that are ultimately keeping many of you from “tapping the true value of Hadoop.”
How are we going to store all the information on the internet?
Google File System (GFS)
How are we going to analyze is?
MapReduce (MR)
How are we going to do something with it?
BigTable (BT)
“I, too, need to store large amounts of data!”
These are technology companies
The followers on this wave are in other businesses, but need to use technology to move forward
They waited to see if these technologies would really work
Three things the follower does not see:
New use cases from a few early adopters
What changes about the architectures to support new use cases
Where the early adopters are ultimately going
I don’t actually mean the use cases that are way out there.
I mean the very next ones that you early adopters are doing now, and you should be doing next (this year or next year)
We all know product recommendations
Recommendations are not just for products.
Recommend content
Recommend people
Recommend actions
Auto-complete
Recommendations within search
Personalized search results
Search within the enterprise
Optimizing experiences on each channel
The key ingredients here are:
Data consolidation (get everything in one place so it is accessible)
Experimentation (try different things on live traffic)
Rapid iteration (optimize by making changes quickly)
You should, too. At the very least, you should start doing “traditional BI” on big data.
Next generation use cases are in two categories:
Analysis: Now that we have data, and it is consolidated, let’s ask more questions.
Action: Now that we have data, and it is consolidated, let’s put it to work.
Followers (early majority) are at the Understand phase. Early adopters are going deep into Understand, or moving on to Act.
I really want to talk about the last phase. What are the key ingredients?
Early adopters are changing their system architectures:
They are adding new-age tools
They are removing and replacing outdated systems
They are restructuring and shuffling components
Review the difference between building upon understanding versus moving into action.
You got data delivered back into the application, but did you include any of the key ingredients?
Let’s focus on the early adopters who migrated into action. What have they done?
We have already added the KVStore, HBase, to connect data back to the frontends.
We can add a stream processing engine to get real-time.
We can use the Lambda architecture to get all sorts of nice properties like immutable data sources, and make only incremental additions.
What does it look like to go through this process of “going deep” into action?
Add room for a stream processing system (Storm, Samza)
Add a query layer on top to join the results from the batch layer from the speed layer
You got data delivered back into the application, but did you include any of the key ingredients?
To make a change to something you need to edit the batch layer, the speed layer, and potentially the query that joins the two.
You don’t have enough data to see the future of where people are going.
What’s next?
What’s next?
I don’t know how to quantify the business value. I’ll leave that to Gartner.
But I hope that I can convince you that:
The intrinsic value of each phase is greater than the previous. What good is collecting data if you don’t do anything with it? What good is it if you don’t understand it?
The realized value to the business at each phase is even more extreme that what I’ve shown here. What good is understanding unless you do something with it? You can do something with it as a human being, but many more decisions now are made by machines, not humans.
How long does this take?
The testing, aka experiment design, development, and deployment is the bottleneck.
Why are you spending so much money working on increasing the speed of these other phases?
What you would design to solve the first three phases (up to understanding) is different from what you would build to solve “action.”
We don’t know what’s coming next. Design for your problem. And do so without just blindly following the early adopters. Instead, start with your requirements, and design with purpose.