The document discusses how an Enterprise Data Lake (EDL) provides a more effective solution for enterprise BI and analytics compared to traditional enterprise data warehouses (EDW). It argues that EDL allows enterprises to retain all datasets, service ad-hoc requests with no latency or development time, and offer a low-cost, low-maintenance solution that supports direct analytics and reporting on data stored in its native format. The document promotes EDL as a mainstream solution that should be part of every mid-sized and large enterprise's standard IT stack.
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Raj Babu of AgileIss
1. DATA LAKE - RE BIRTH OF ENTERPRISE DATA THINKING
MAKING BIG DATA MEANINGFUL FOR ALL ENTERPRISE
WWW.AGILEISS.COM
1
Making BiG Data meaningful for All
By
Raj Babu
Raj@AgileiSS.com
HADOOP IS NOT FOR SELECTED FEW, BUT FOR ALL ENTERPRISE
2. About Agile iSS
Agile iSS , We are a BI & Analytics services company
servicing our clients on Big Data, Data Lake, BI, BI on Cloud, BI/Analytics
As Service.
Our Goal is to make Big Data meaningful for all Enterprises.
We are focused on helping our clients upgrade their current EXPENSIVE
and old tech based ineffective BI solution to a POWERFUL,
EFFECTIVE BI & ANALYTICS solution that is effective and has
lower TCO.
WWW.AGILEISS.COM
2
3. WWW.AGILEISS.COM
DATA LAKE - RE BIRTH OF ENTERPRISE DATA THINKING
ENTERPRISE DATA LAKE (EDL)
I have just two goal for my 25 minute presentation today……
To convince you all on following……
Big Data is not only a solution for the select few
Enterprises…..who have 100’s of TB’s or ZB’s of data. Big Data through
Enterprise Data Lake (EDL) is now Mainstream and should be part
of standard IT stack solution for all mid and large Enterprises.
EDL makes Enterprise BI systems more Agile, Nimble, Economical &
Valuable.
4. WWW.AGILEISS.COM
DATA LAKE - RE BIRTH OF ENTERPRISE DATA THINKING
MAKING BIG DATA MEANINGFUL FOR ALL ENTERPRISE
Why Enterprise Data Lake Solution (based on Big Data, No-SQL
technology) + Traditional BI as Enterprise BI & Analytics
Solution is a significantly more effective, than its predecessor
EDW that has tried and failed in the last 2 decades ..?
5. Why EDW Failed ?
WWW.AGILEISS.COM
If you Google “Challenges with EDW”, you will get something like this……
Takes too long to
get anything done
BI is too Expensive to
Build and Manage and
never on the schedule
that Business wants
Our BI team and
system can’t
implement changes
fast..
Over complicated
Architecture…
Our BI cant do anything ad-
hoc, they need requirements,
design, architecture, ETL for
everything & it never gets
done after all……
Our BI is Always
incomplete, it never
has all
the data we need Our BI is not suitable for
ad-hoc Analytics
6. WWW.AGILEISS.COM
6
It is extremely expensive and practically impossible to gather requirements, design, build ETL and store all the data
needed in EDW & DM. EDW or Data Martsare optimized for data
analysis by processing and storing only subsets of datasets.
An EDL is designed to “RETAIN ALL DATASETS“. This
is the single most powerful feature of EDL as we will
never know the future complete scope of datasets
for analytics.
Why EDW Failed? & EDL is taking over
7. Why EDL clearly wins over EDW ?
WWW.AGILEISS.COM
Service ad-hoc request
with no latency & no
development
Inexpensive and low
maintenance cost to
manage as there is
no or very minimal
Build effort
Minimal development
team involvement, unless
data is needed in Data
Mart
All Data is in Data
Lake…
Can do ad-hoc, no
need for any SDLC to
access any new data.
No more
waiting….Perfect
place to offload all
new & ad-hoc
request.
In EDL, ETL or Database
is not needed for
Reporting or Analytics
Offers a perfect
solution..NO heavy
duty ETL
8. What is a Data Lake ?
WWW.AGILEISS.COM
8
From Wiktionary
data lake
A massive, easily accessible data repository built on (relatively)
inexpensive computer hardware for storing “Big Data".
Techtarget
A data lake is a large object-based storage repository that holds data in
its native format until it is needed.
Etymology
Pentaho CTO James Dixon is credited with coining the term "data lake". As he described it in his blog entry.
If you Google Data Lake you will get following results…….
9. What is Data Lake Cont…….
WWW.AGILEISS.COM
9
From Wiktionary……
Pentaho CTO James Dixon described it in his blog entry,
"If you think of a datamart as a store of bottled water – cleansed and packaged and
structured for easy consumption.
-The data lake is a large body of water in a more natural state. The contents of
the data lake stream in from a source to fill the lake, and
various users of the lake can come to examine, dive in, or take samples.
10. What Data Lake has to Offer
WWW.AGILEISS.COM
10
** EDL image by PWC
ETL
In here all kinds of Analytics
happen. 85% Analytics, 15%
Proto type Reporting
EDL, ODS, Warm Archive
Data Marts
11. Is EDL a Product or tool ?
WWW.AGILEISS.COM
11
EDL is really a Reference Architecturefor the Enterprise BI
solution using Hadoop based Big-Data as the foundation.
There are now many leading DB vendors seeing EDL as a clear winner and are
incorporating it in their offering and calling it Data Hub
12. Traditional
ETL
Analytics &
Data Scientist
Meta
Data
Enterprise Data
WWW.AGILEISS.COM
12
Big Data
ETL
Direct Analytics &
Reporting
Data
Mart’s
Enterprise Data Lake (EDL) On-Premise Reference Architecture
For BI & Analytics
Data Lake on Hadoop
(Horton Works, Cloudera, MAPR )
15. Your EDL can be Following
WWW.AGILEISS.COM
• A central Enterprise Data Repository ODS, Data Hub
• Staging source for all systems
• A warm and Active Data Archive /Vault
• Hadoop Data Warehouse
16. WWW.AGILEISS.COM
• Anyone one and everyone who is impatient about getting their hands on data
• The ones that cant give requirement but wanted reports yesterday
• The ones that have no patience for ETL or Report development
• Analytics, Data Science team
• ETL team for Staging
• By not having to buy DB capacity to store all data in BI database
• When volume of data too high to process through a regular DB
Your EDL can service following……
17. Who are all supporting Data Lake or Data Hub ?
WWW.AGILEISS.COM
17
18. Explore EDL - There is nothing to loose
WWW.AGILEISS.COM
18
With EDL there is no need for
expensive ETL, Databases
and
long delays associated with your
BI & Analytics Platform.