Here at the conference we’re talking about data science. But before we can appreciate the changes happening in data science, we must first talk about Data. Data is doubling every two years. The fast growing volume, variety and velocity of data is overwhelming traditional systems and approaches. A revolutionary approach is required to leverage this data. And with this new technology, Data science as we know, is undergoing tremendous change.
What is the source of this data growth? While structured data growth has been relatively modest, the growth in unstructured data has been exponential.
Source of statistic: http://link.springer.com/chapter/10.1007/978-3-642-39146-0_2
The database/datastore landscape is evolving to meet the new requirements. 2009 was the inflection point. NoSchema systems in which applications control structure. Developers are being empowered and they are voting for the agility offered by these systems.
In the early days if this revolution we sacrificed the query language, and we eliminated the ability to leverage the knowledge and tools available to millions of people. We’re changing that by a distributed SQL engine. But when we do that, we have to keep in mind that this transition to a NoSchema world happened for a reason, and we don’t want to reintroduce the centralized, DBA-managed schema.
IT-driven = months of delay, unnecessary work (data is no longer relevant, etc.)
The so-what needs to be conveyed. Why does it matter that it’s not needed.
6 months -> 3 months -> 3 months -> day zero
So imagine now what you can get…
Data Agility is needed for Business Agility
>>> Stand still during slide, move in at the punchline (why does this matter to YOU)
lets do a quick product walk through
CSV vs JSON formatting
CSV vs JSON formatting
It’s 10pm in Vegas and I Want Good Hummus!
CSV vs JSON formatting
Distributed quey engine
Any Drill bit can accept the request
Driver drillbit