Sowohl die Technologie Hadoop als auch das Konzept Data Lake sind erst seit kurzer Zeit im Enterprise-Kontext im Einsatz. Entsprechend fällt es Unternehmen häufig schwer, zwischen medialer Überhöhung und tatsächlich realisierbaren Mehrwert zu unterscheiden. Ob es Unternehmen gelingt, sich die Vorteile zunutze zu machen, untersucht diese Studie zum Stand und zur zukünftigen Entwicklung von Big Data und dessen Anwendungsfälle in Unternehmen. Mit weltweit über 380 Teilnehmern bei einer breit gefächerten Branchenverteilung gehört die vorliegende Studie „Hadoop und Data Lakes“ zu den größten Untersuchungen, die sich speziell den Herausforderungen bei der Datenanalyse mit Hadoop widmen.
18. Der BARC Survey “Hadoop und Data Lakes”
Über
380
Teilnehmer
Breite Abdeckung verschiedener Branchen und Unternehmensgrößen
Anwenderumfrage zum Status Quo und
Fortschritt von Hadoop
Relevanz von Hadoop
und Data Lakes
Nutzen und
Herausforderungen
Einsatzszenarien
Status Quo von Hadoop und
Data Lakes
Globale
Befragung
Auflage
2.
Bis 250
Mitarbeiter
250 – 2,500
Mitarbeiter
Mehr als 2,500
Mitarbeiter
23% 33% 45%
ServicesIndustrie
Banksektor
IT
Handel
Öffentlicher
Sektor
Sonstige
24% 22% 16%
14%
9%
6%
9%
19. 7 Erkenntnisse aus der Studie
1. Hadoop - Trendtechnologie mit hohem Potential
2016 21
25. 7 Erkenntnisse aus der Studie
1. Hadoop - Trendtechnologie mit hohem Potential
2. BICC und Data Science Teams treiben Hadoop und Data-Lake-Projekte
2016 27
29. 7 Erkenntnisse aus der Studie
1. Hadoop - Trendtechnologie mit hohem Potential
2. BICC und Data Science Teams treiben Hadoop und Data-Lake-Projekte
3. Umsetzung erfolgt hauptsächlich durch kommerzielle Werkzeuge und Hadoop-
Distributionen
2016 31
34. 7 Erkenntnisse aus der Studie
1. Hadoop - Trendtechnologie mit hohem Potential
2. BICC und Data Science Teams treiben Hadoop und Data-Lake-Projekte
3. Umsetzung erfolgt hauptsächlich durch kommerzielle Werkzeuge und Hadoop-
Distributionen
4. Klarer Fall für Hadoop: Customer Intelligence und Predictive Analytics
2016 36
38. 7 Erkenntnisse aus der Studie
1. Hadoop - Trendtechnologie mit hohem Potential
2. BICC und Data Science Teams treiben Hadoop und Data-Lake-Projekte
3. Umsetzung erfolgt hauptsächlich durch kommerzielle Werkzeuge und Hadoop-
Distributionen
4. Klarer Fall für Hadoop: Customer Intelligence und Predictive Analytics
5. Großer analytischer Nutzen durch Hadoop
2016 40
42. 7 Erkenntnisse aus der Studie
1. Hadoop - Trendtechnologie mit hohem Potential
2. BICC und Data Science Teams treiben Hadoop und Data-Lake-Projekte
3. Umsetzung erfolgt hauptsächlich durch kommerzielle Werkzeuge und Hadoop-
Distributionen
4. Klarer Fall für Hadoop: Customer Intelligence und Predictive Analytics
5. Großer analytischer Nutzen durch Hadoop
6. Hadoop ermöglicht Anwendungsfälle, die bisher nicht umgesetzt werden konnten
2016 44
45. 7 Erkenntnisse aus der Studie
1. Hadoop - Trendtechnologie mit hohem Potential
2. BICC und Data Science Teams treiben Hadoop und Data-Lake-Projekte
3. Umsetzung erfolgt hauptsächlich durch kommerzielle Werkzeuge und Hadoop-
Distributionen
4. Klarer Fall für Hadoop: Customer Intelligence und Predictive Analytics
5. Großer analytischer Nutzen durch Hadoop
6. Hadoop ermöglicht Anwendungsfälle, die bisher nicht umgesetzt werden konnten
7. Die größten Herausforderungen sind fehlendes Know-how und Unsicherheit bei der
Nutzung
2016 47
TonSpur: Der Data Lake ist die beste Antwort auf all unsere analytischen und operativen Fragenstellungen; Ich brauche einen Data Lake
Thank you Philip
So what is our view on this? Well, data drives business and being data driven has become a business imperative.
It’s all driven from the fact that
What once was a cost to be managed is now a source of competitive advantage and new revenue streams. Organizations are actively working to gather more data by instrumenting applications, platforms, and physical devices to create more of it and storing it for a longer time horizon– in order to drive this advantage.
Data is now a strategic asset, and you need a strategy for it.
And the amount of data at our disposal is massive.
Point from infographics
DU will grow from 4.4 to 44 Zb in 2020
90% of all data is unstructured
25% would be valuable if tagged; less than 1% is
30 billion connected things
90% of data is less than 2yrs old
And the amount of data at our disposal is massive.
Point from infographics
DU will grow from 4.4 to 44 Zb in 2020
90% of all data is unstructured
25% would be valuable if tagged; less than 1% is
30 billion connected things
90% of data is less than 2yrs old
And the amount of data at our disposal is massive.
Point from infographics
DU will grow from 4.4 to 44 Zb in 2020
90% of all data is unstructured
25% would be valuable if tagged; less than 1% is
30 billion connected things
90% of data is less than 2yrs old
And the amount of data at our disposal is massive.
Point from infographics
DU will grow from 4.4 to 44 Zb in 2020
90% of all data is unstructured
25% would be valuable if tagged; less than 1% is
30 billion connected things
90% of data is less than 2yrs old
And the amount of data at our disposal is massive.
Point from infographics
DU will grow from 4.4 to 44 Zb in 2020
90% of all data is unstructured
25% would be valuable if tagged; less than 1% is
30 billion connected things
90% of data is less than 2yrs old
And the amount of data at our disposal is massive.
Point from infographics
DU will grow from 4.4 to 44 Zb in 2020
90% of all data is unstructured
25% would be valuable if tagged; less than 1% is
30 billion connected things
90% of data is less than 2yrs old
And you can do great things with this data besides run businesses. Data impacts people, lives, science…. The impossible becomes possible….
It can help destroy human trafficking…
It helps fight child sexual exploitation in the Thorn project by accelerating victim identification, disrupting platforms that facilitate abusive behavior and deterring predators
It helps improve healthcare whether that’s in research on cancer, detecting sepsis earlier or by giving premature babies better chances of survival through improved pain management
And it will take use far. The Orion multi purpose crew vehicle may take us to Mars one day yet until that time, things need to be tested very carefully and rigorously. Hundreds of Mb/s of sensor and telemetry in testing alone.
With data, we can do great things
And there are two sides to the data conversation.
Left hand side is essentially the 3Vs, what everyone has been talking about for some time now. Big data.
”it will soon be technically feasible & affordable to record and store everything”
We can finally become the equivalent of a hoarder yet in a digital fashion. Plane spotters.
Yet storing data for storing data’s sake is neither use nor ornament.
So more important, arguably, is the right hand side and HOW we use that data.
It’s not just about the size, variety, etc. but the complexity of the ways we can manipulate it and in so doing, uncover some really profound things.
And especially for analytics, this is a big deal because we get much more granular insight.
Much more data and brute force helps build better models.
All of this happens faster. Together, it’s enough quantitative difference to make a qualitative difference.
And the prize is great. Business are looking at advantages and use cases around
40% decision making
37% business development
36% customer analysis
34% predictive analytics
33% process optimization
http://viralcocktail.com/people-imagined-what-life-would-be-like-in-the-year-2000-and-were-so-far-off/8/
And the prize is great. Business are looking at advantages and use cases around
40% decision making
37% business development
36% customer analysis
34% predictive analytics
33% process optimization
http://viralcocktail.com/people-imagined-what-life-would-be-like-in-the-year-2000-and-were-so-far-off/8/
Translating it to a boardroom level, there are a number of initiatives that this wealth of data is very well suited to. Data drives business value in for these, tremendous value.
The three areas of opportunities within businesses generally are:
Customer and Channel – How do I build a 360 picture of my customer to deliver new revenue streams?
Data-Driven Products – How can I build better data-driven products and services, at lower cost?
Security, Risk, and Compliance – How do meet compliance regulations and preserve data security to minimize our corporate risk profile?
So let’s look at each of these in turn
All of these initiatives have one thing in common: they need a different approach to dealing with data. A different scale on which is must be consumed, stored, analysed. Beyond current systems.
It needs a modern data architecture as well as strategy
Yet as Philip already hinted, you need a platform that’s uniquely suited to handle the demands that this then places on it. Hadoop is just that. Actually, not just Hadoop; more like Hadoop and it’s related open source projects.
It has the extreme performance and efficiency that let’s you handle the volume, variety and velocity of the data, irrespective of how much you throw at it
On the other hand it also provides the agility to gain insight from that data. All of that data. To enable that self service and democratization of data access and analytics so may organizations and departments pine for.
And that’s why Hadoop and associated open source projects work so brilliantly well.
They are
Open Source and run on Industry standard hardware which makes them extremely cost effective
Scalable to Petabyte level, on prem, in cloud, as a mixture, anything
And extremely Flexible to handle Mutliple data types Processing engines. Not just about SQL.
But it’s not just technology that enables all this. This is a digital transformation. This is a change process too.
As with so much, all good things come in threes as Philip already showed.
Getting the right arch is one thing; you also need the right stakeholders and skills in your team and adopt an agile approach
Let’s talk more about an agile, iterative approach…Goal is to exploit the technical underpinning the big data platform – A platform that allows flexibility in capture and interpretation.
So question is, how best to employ this? Continuous iteration.
These are the 3 key steps to being agile.
Collect, Create and Manage: Figure out what data you have and what data you need. Tag it so only the right people can see it.
Collect
Collect the familiar, the new, the never seen, the always dropped.
No need to worry up front on how to use, so just start using – make it available to any and all frameworks.
Document upfront to make downstream and future analysis easier.
Understand that quality can be built iteratively, too.
Create
Find the gaps, no matter the type, as you learn more.
Integration can come with iterations, so focus on what value new sources can bring.
Don’t forget that your business creates lots of data outside the data warehouse.
B2B contracts means be explicit about capturing and/or asking, delivering and using data
Explore and Analyze: Now you have many tools hitting the same dataset. Continue to add new tools and new applications and watch the value grow.Start with somewhat limited scope – a single dataset – for a team, and get familiar, go deep. Enrich your data. Get experience and momentum. Build grassroots advocacy.
Understand the data and its usage better, find the probable linkages to other data sets a (identify resolution) – lay down the groundwork for future. Extend enrichment.
Fuse data sets (and possibly even teams?) together to find intersections, correlations. To uncover the really “unknown unknowns.” Move from enriched to refined to derived data (latter is data that would exist without the former; wholly new yet separate and distinct from its predecessors)
Operationalize: Move data closer to users so they can impact the business. Launch embedded, smart applications to deliver insights to customers and business users.
Operationalize
Bring data and insight to all workflows in the business. Integrate into the very decision-making, at every step. Take advantage of the longitudinal analytics afforded by the platform: past, present, and future-looking analytics, simultaneously. Data is brought to and sought by those who use it, simultaneously.