SlideShare ist ein Scribd-Unternehmen logo
1 von 55
Downloaden Sie, um offline zu lesen
Grab some coffee and enjoy 
the pre-show banter before 
the top of the hour!
“The Inevitable Shift: How Big Data 
Impacts Enterprise Architecture” 
RoundTable Webcast | April 9, 2014
Host 
Eric Kavanagh 
CEO, The Bloor Group 
@eric_kavanagh eric.kavanagh@bloorgroup.com
Big Data Information Architecture 
Exploratory Webcast 
January 22, 2014 
Roundtable Webcast 
April 9, 2014 
Findings Webcast 
June 25, 2014 
#BigDataArch 
✓ 
✓
Analysts 
Robin Bloor 
Chief Analyst, The Bloor Group 
Richard Winter 
President & Founder, WinterCorp 
Mike Ferguson 
Managing Director, Intelligent Business Strategies
BIG DATA
Hadoop as the Data Reservoir
Big Data and the Data Reservoir
BDIA: The Story So Far 
Robin Bloor, Ph.D.
Big Data – A Poorly Defined Term 
WHAT 
IS BIG 
DATA? 
Traditional 
data 
Business 
data 
Log file 
data 
Operational 
data 
Mobile data 
Location 
data Social 
network 
data 
Public data 
Commercial 
databases 
Streaming 
data 
Internet of 
Things
Atoms and Molecules 
The ATOM of data has 
become the EVENT 
A TRANSACTION is a 
MOLECULE of ATOMIC 
EVENTS
The Traffic Cop (Events)
Atoms and Molecules 
DATA FLOW 
is becoming a driving factor 
This suggests the need 
for a 
DATA RESERVOIR
Hadoop as the Data Reservoir
Big Data and the Data Reservoir
The Workload Paradigm Shift 
u Previously, we viewed 
database workloads as 
an i/o optimization 
problem 
u With analytics the 
workload is a very 
variable mix of i/o and 
calculation 
u No databases were built 
precisely for this – not 
even Big Data databases
The Big Data Applications 
It’s pretty much 
all about 
BI & ANALYTICS
The Biological System 
u Our human control system 
works at different speeds: 
• Almost instant reflex 
• Swift response 
• Considered response 
u Organizations will gradually 
implement similar control 
systems 
u This suggests a data-flow-based 
architecture 
u The EDW is memory
The Corporate Biological System 
u Right now this division into 
two different data flows is 
already occurring 
u Currently we can distinguish 
between: 
• Real-time/Business-time 
applications 
• Analytical applications 
u We should build specific 
architectures for this
W I N T E R C O R P 
Big Data Information Architecture 
Bloor Group Roundtable 
Richard Winter 
WinterCorp 
April 2014 
T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S
Big Data and the Data Reservoir 
From Robin’s charts:
It’s About the Platforms & Their Roles 
• Data Warehouse 
• Data Mart 
• Data Refinery 
• Data Landing Zone 
• Data Discovery 
• Graph Analytics 
• Etc. 
W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S 22! 
© ©2012, 2010 2013, Winter 2014 WINTER Corporation. CORPORATION, All Rights BELMONT Reserved. 
MA. ALL RIGHTS RESERVED.
Data Refining Example 
Data from Turbines 
W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S 23! 
© 201©©0 ,2 2010 2001121, ,2 2001132, Winter 2 W01I4N TWERIN Corporation. CTOERR PCOORRAPTOIORANT, All ICOANM, Rights BBRELIDMGOEN Reserved. 
MT MA.A A. LALL RLI GRIHGTHST RSE RSEESREVREVDE.D 
.
Data Refining Example 
Data Management Requirements 
1. Hundreds of TB or more of data per week 
2. Raw data life: few hours to a few days 
3. Challenge: find the important events or trends quickly 
4. Massive analysis problem 
5. When analyzing, read entire files 
6. Keep only the significant data 
W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S 24! 
© ©2012, 2010 2013, Winter 2014 WINTER Corporation. CORPORATION, All Rights BELMONT Reserved. 
MA. ALL RIGHTS RESERVED.
Business Example 
Enterprise Data Warehouse 
W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S 25! 
© ©2012, 2010 2013, Winter 2014 WINTER Corporation. CORPORATION, All Rights BELMONT Reserved. 
MA. ALL RIGHTS RESERVED.
Enterprise Data Warehouse 
Data Management Requirements 
1. Data volume 
a. TB to PB – all retained for at least five years 
b. Continual growth of data and workload 
2. Data sources: hundreds to thousands 
a. Data sources change their feeds frequently 
b. New data sources are frequent 
3. Challenges 
a. Data must be correct 
b. Data must be integrated 
4. Typical enterprise data lifetime: decades 
5. Analytic application lifetime: years 
6. Many thousands of data users (104 – 106) 
7. Hundreds of analytic applications 
8. Thousands of one time analyses 
9. Tens of thousands of complex queries 
W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S 26! 
© ©2012, 2010 2013, Winter 2014 WINTER Corporation. CORPORATION, All Rights BELMONT Reserved. 
MA. ALL RIGHTS RESERVED.
Some Platform Examples 
Requirement 
Platform 
Data Refinery 
Hadoop 
Complex SQL Query 
Data Warehouse 
Enforce/Manage Business Rules 
Data Warehouse 
Intensive Batch Processing 
Hadoop 
Simple Data Mart 
Multiple Options 
Data Discovery 
New Category 
Integrated Data 
Data Warehouse 
Data Landing Zone 
Hadoop 
Document Store 
Multiple Options 
Stream Processing 
Multiple Options 
ETL 
Multiple Options 
W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S 27! 
© ©2012, 2010 2013, Winter 2014 WINTER Corporation. CORPORATION, All Rights BELMONT Reserved. MA. ALL RIGHTS RESERVED.
Understand the Platform Cost Tradeoffs 
• Cost tradeoffs can be surprising – platform cost is not 
always the driver 
• Requires a total cost framework & systematic 
approach 
• “Big Data: What Does it Really Cost?” 
wintercorp.com/tcod-­‐‑report 
W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S 28! 
© ©2012, 2010 2013, Winter 2014 WINTER Corporation. CORPORATION, All Rights BELMONT Reserved. MA. ALL RIGHTS RESERVED.
Data Platforms 
A Changing Picture 
• Categories are not seiled 
• Data Warehouse has a continuing, major role 
• Hadoop has a major role 
• Everything else is in flux 
W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S 29! 
© ©2012, 2010 2013, Winter 2014 WINTER Corporation. CORPORATION, All Rights BELMONT Reserved. MA. ALL RIGHTS RESERVED.
Big Data Information Architecture 
Mike Ferguson 
Managing Director 
Intelligent Business Strategies 
Bloor Group Big Data Roundtable 
April 2014 
Twitter: @mikeferguson1
For Many Years The Traditional Data Warehouse and BI 
Environment Has Been Used For Analysis & Reporting 
31 
Operational 
systems 
web 
P 
o 
r 
t 
a 
l 
Employees 
Partners 
Customers 
BI 
Tools 
Platform 
Integration / DQ 
Data 
Reports & 
analytics 
DW 
Data warehouse & 
data marts
However There Are New Types of Data That Businesses 
Now Want to Analyse 
§ Web data 
32 
• Clickstream data, e-commerce logs 
• Social networks data e.g., Twitter 
§ Semi-structured data e.g., e-mail, 
XML, JSON 
§ Unstructured content 
• How much is TEXT worth to you 
§ Sensor data 
• Temperature, light, vibration, location, 
liquid flow, pressure, RFIDs 
§ Vertical industries structured 
transaction data 
• E.g. Telecom call data records, retail Source: Analytics: The Real-World Use of Big Data 
Said Business School Oxford and IBM
The Impact of Big Data – We Now Have Different 
Platforms Optimised For Different Analytical Workloads 
Big Data workloads now mean we require multiple platforms for analytical processing 
33 
Streaming 
data 
Advanced Analytic 
(multi-structured data) 
Hadoop 
data store 
DW & marts 
Data Warehouse 
RDBMS 
NoSQL 
DBMS 
EDW 
NoSQL DB 
e.g. graph DB 
mart 
Advanced Analytics 
(structured data) 
DW 
Appliance 
Analytical 
RDBMS 
C 
MDM 
R 
U 
Cust 
Prod 
Asset 
D 
Graph 
analysis 
Investigative 
analysis, 
Data refinery 
Data mining, 
model 
development 
Traditional 
query, 
reporting & 
analysis 
Real-time 
stream 
processing & 
decision 
management 
Master data 
management
Hadoop Is A Platform At The Heart of Big Data Analytics 
– There Are Multiple Ways To Access Hadoop 
34 
Java MapReduce SQL 
APIs to HDFS, 
HBase, Cascading 
file file file file file 
file file file file file 
file file 
file file 
Vendor SQL on 
Hadoop engine 
webHDFS 
(An HTTP 
interface to 
HDFS has 
REST APIs) 
HDFS 
file 
file 
index 
index Index 
partition 
file 
file 
MapReduce Hadoop 2.0 F’work 
YARN 
SQL 
PIG latin 
scripts 
MapReduce 
Application 
BI Tools / Apps
35 
Popular Hadoop Use Cases 
§ Hadoop as a data refinery 
• Offloading data integration from a DW 
§ Hadoop for investigative analysis in an analytical sandbox 
§ Hadoop as an on-line data warehouse archive
36 
The Hadoop Data Refinery 
EDW 
Graph 
DBMS 
Analytical DBMS 
DW 
Appliance 
CRM 
ERP 
SCM 
Ops 
XML, 
JSON 
Web 
logs 
social 
NoSQL DB 
web 
Data marts 
insights 
ELT 
processing 
cloud
A Centralised Hadoop Based Data Refinery is One Way to 
Scale at Reduced Cost 
37 
Data Hub - Consume, Clean, Integrate, Analyse And Provision 
Data From Hadoop To Any Analytical Platform 
DW & marts 
mar 
t 
business 
insight 
NoSQL DB 
e.g. graph DB EDW 
Generated 
MapReduce 
ELT jobs 
sandbox 
ELT Processing 
!"#$% 
&'()% 
Advanced Analytics 
(structured data) 
RDBMS social Cloud Files office docs 
*+,*-./0123% 
Web logs web services 
sensors feeds 
DW 
Appliance 
Exploratory analysis 
Staging area / 
landing zone 
Sometime analysts refer to this as a Data Refinery 
Data Refinery 
What is the purpose 
of the data refinery? 
Is it to process un-modelled 
data or all data?
Investigative Analysis Can Be Done In A Hadoop Sandbox 
38 
Click stream web log data 
Customer interaction data 
Social interaction data (e.g. 
Twitter, Facebook) 
Sensor data 
Rich media data (video, audio) 
External web content 
Documents 
Internal web content 
Seismic data (oil & gas) 
Investigative / 
Exploratory 
Analysis 
Data Scientists 
master data archived DW data 
MDM System 
C 
R 
U 
Product 
Asset 
Customer 
D 
EDW 
mart 
new 
business 
insight 
sandbox 
Multi-structured 
data 
Historical Data
Joining Big Data With Master Data During Exploratory 
Analysis Can Produce Insight for Competitive Advantage 
39 
Streaming Data 
Graph Data Multi-Structured 
NoSQL DB 
e.g. graph DB C 
+ 
Master Data Business Value 
Created 
R 
U 
Master 
data 
D 
sentiment Customer 
sentiment & 
Product sentiment 
Customer online 
behaviour 
Prospects & 
Influencers 
Sensor data Field service 
optimization 
Risk mgm’t 
Asset performance 
customer 
product 
customer 
customer 
asset
40 
New Insights Can Be Added Into A DW To Enrich What You 
Already Know 
DW 
D 
I 
new 
insights 
Operational 
systems 
Data Scientists 
sandbox 
Web 
logs 
social 
web cloud 
e.g. Deriving insight from social web sites like for sentiment analytics
Alternatively New Insights In Hadoop Can Integrated With A 
DW Using Data Virtualization To Provide Enriched Information 
41 
DW 
D 
I 
Data Vitualisation 
SQL on 
Hadoop 
new 
insights 
OLTP systems 
Data Scientists 
sandbox 
Web 
logs 
social 
web cloud 
e.g. Deriving insight from social web sites like for sentiment analytics
42 
Using Hadoop As A Data Archive Means Data Can Be Kept 
On-line, Analysed And Still Integrated With Data In The DW 
DW 
D 
I 
new 
insights 
OLTP systems 
Data Vitualisation 
SQL on 
Hadoop 
Archive unused 
or data > n years 
Archived data
Real-time Data From NoSQL DBMSs Can Also Be Joined To 
DW Data Using Data Virtualization 
43 
DW 
D 
I 
Data Vitualisation 
Nested 
data !! 
real-time 
insights 
OLTP systems 
Web 
logs 
social 
NoSQL DB 
Column Family DB 
Document DB 
sensors 
Nested data like JSON needs to be handled by the data virtualisation server
44 
Investigative Analysis Can Be Done In A Graph DBMS 
– New Insight Can Also Come From Graph Analysis 
Investigative / 
Exploratory 
Analysis 
Data Scientists 
MDM System 
C 
R 
U 
Product 
Asset 
Customer 
D 
new 
business 
Insight 
Structured data 
master data 
Multi-structured 
data 
Graph 
DBMS
SQL access to 
streaming data in 
45 
SQL Access To Big Data - Options 
SQL access to 
big data in 
Hadoop 
SQL 
SQL access to big 
data in an 
analytical RDBMS 
SQL 
Analytical 
RDBMS 
motion 
SQL 
streaming 
data 
SQL access to 
big data via data 
virtualisation 
SQL 
data virtualisation server 
DW 
SQL access to a combination of the above
46 
SQL on Hadoop Challenges 
– Multi-structured Data May Need to Be Analysed 
{ "firstName": ”Wayne", 
"lastName": ”Rooney", 
"age": 25, 
"address": { 
"streetAddress": "21 Sir Matt Busby Way", 
"city": ”Manchester”, 
“country”: “England”, 
"postalCode": “M1 6DY” 
}, 
"phoneNumbers": [ 
{ "type": "home”, 
"number": ”0161-123-1234” 
}, 
{ 
"type": ”mobile", 
"number": ”07779-123234” 
} 
] 
} 
JSON data 
Text data 
Image Data 
SQL?? 
SQL?? 
SQL??
47 
SQL on Hadoop Challenges 
– Multi-structured Data May Need to Be Analysed 
Web log data 
SQL?? 
SQL?? 
Tab delimited 
file data
Hadoop Storage Is Independent of Any SQL Engine Accessing 
HDFS - Multiple SQL Engines Can Coexist On The Same Data 
Storage is independent 
of any SQL engine 
48 
SQL SQL SQL SQL 
Source: Hortonworks 
§ Key points about Hadoop 
• It is possible to have MULTIPLE SQL engines on the same data 
• Different SQL engines run on different Hadoop frameworks (M/R, Tez, 
Spark) or on no framework at all i.e. directly access HDFS or HBase data
Relational DBMS / Hadoop Integration – Several Vendors Have 
Integrated RDBMS with Hadoop to Run Analytics 
49 
SQL, XQuery 
Relational DBMS 
External 
Polymorphic 
table function(s) 
HDFS / Hbase/ Hive 
Allows join across data in a 
single RDBMS and Hadoop 
RDBMS optimizer handles 
transparent access to external 
analytical platforms on behalf 
of the user 
CitusDB 
Exasol EXAPowerlitics 
IBM PureData System for Analytics and DB2 HDFS clients 
Oracle HDFS Client 
Pivotal HAWQ PFX 
Teradata SQL H 
RDBMS and Hadoop could 
be deployed on the same 
hardware cluster or on 
different hardware clusters
Product examples: 
Cirro, Cisco, Denodo, Informatica Data Services, ScleraDB 
BUT what about optimization? 
Can the data virtualisation server push 
down analytics to underlying platforms 
to make them do the work? 
50 
Self-Service BI 
Self-Service Access To Big Data Via Data Virtualization 
Business 
analyst 
Self-service Data 
Discovery & Visualisation 
or Dashboard Server 
Data Virtualization and Optimization 
personal 
& office 
data Predictive 
DW 
models 
Transaction 
systems 
Data Management Tools (ETL, DQ, etc.)
Conclusions - People In Different Roles In The Analytical 
Landscape Need to Work Together To Deliver Value 
51 
sandbox Analytical Operational 
Exploratory analysis 
Model producer 
Business Analyst Business Manager/ 
Operations Worker 
Data Scientist 
Model consumer 
Data discovery & 
visualisation 
Information Producer 
• Build reports 
• Build and publish 
dashboards 
Information consumer 
Decision maker 
Action taker
52 
Thank You! 
www.intelligentbusiness.biz 
mferguson@intelligentbusiness.biz 
Twitter: @mikeferguson1 
Tel/Fax (+44)1625 520700
ROUNDTABLE 
DISCUSSION
Questions? 
#BigDataArch 
or 
USE THE Q&A
THANK 
YOU! 
Image on Slide 53 borrowed from http://www.apieceofmonologue.com/2012/08/stanley-kubrick-film-photography- 
design.html 
REGISTER FOR BDIA WEBCASTS AT: 
http://insideanalysis.com/research/big-data-information-architecture

Weitere ähnliche Inhalte

Was ist angesagt?

Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
Chicago Hadoop Users Group
 
Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15
madynav
 
SiSense Overview
SiSense OverviewSiSense Overview
SiSense Overview
Bruno Aziza
 

Was ist angesagt? (20)

Solution Architecture US healthcare
Solution Architecture US healthcare Solution Architecture US healthcare
Solution Architecture US healthcare
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
 
Analyst Webinar: Best Practices In Enabling Data-Driven Decision Making
Analyst Webinar: Best Practices In Enabling Data-Driven Decision MakingAnalyst Webinar: Best Practices In Enabling Data-Driven Decision Making
Analyst Webinar: Best Practices In Enabling Data-Driven Decision Making
 
Data Virtualization at Logitech = #Winning
Data Virtualization at Logitech = #WinningData Virtualization at Logitech = #Winning
Data Virtualization at Logitech = #Winning
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence Architecture
 
Modern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleModern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | Qubole
 
5 Myths about Spark and Big Data by Nik Rouda
5 Myths about Spark and Big Data by Nik Rouda5 Myths about Spark and Big Data by Nik Rouda
5 Myths about Spark and Big Data by Nik Rouda
 
Datawarehouse & bi introduction
Datawarehouse & bi introductionDatawarehouse & bi introduction
Datawarehouse & bi introduction
 
Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15
 
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
 
Finance and Audit Predictive Analytics
Finance and Audit Predictive AnalyticsFinance and Audit Predictive Analytics
Finance and Audit Predictive Analytics
 
Fixing data science & Accelerating Artificial Super Intelligence Development
 Fixing data science & Accelerating Artificial Super Intelligence Development Fixing data science & Accelerating Artificial Super Intelligence Development
Fixing data science & Accelerating Artificial Super Intelligence Development
 
SiSense Overview
SiSense OverviewSiSense Overview
SiSense Overview
 
Benefits of a data warehouse presentation by Being topper
Benefits of a data warehouse presentation by Being topperBenefits of a data warehouse presentation by Being topper
Benefits of a data warehouse presentation by Being topper
 
Solution architecture for big data projects
Solution architecture for big data projectsSolution architecture for big data projects
Solution architecture for big data projects
 
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview PresentationFilling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
 
Analytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopAnalytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual Workshop
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 

Ähnlich wie Foundation for Success: How Big Data Fits in an Information Architecture

The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
Manish Chopra
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Pentaho
 

Ähnlich wie Foundation for Success: How Big Data Fits in an Information Architecture (20)

MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Are You Killing the Benefits of Your Data Lake?
Are You Killing the Benefits of Your Data Lake?Are You Killing the Benefits of Your Data Lake?
Are You Killing the Benefits of Your Data Lake?
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Big Data Expo 2015 - Hortonworks Common Hadoop Use Cases
Big Data Expo 2015 - Hortonworks Common Hadoop Use CasesBig Data Expo 2015 - Hortonworks Common Hadoop Use Cases
Big Data Expo 2015 - Hortonworks Common Hadoop Use Cases
 
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Tcod a framework for the total cost of big data - december 6 2013 - winte...
Tcod   a framework for the total cost of big data  - december 6 2013  - winte...Tcod   a framework for the total cost of big data  - december 6 2013  - winte...
Tcod a framework for the total cost of big data - december 6 2013 - winte...
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
Prague data management meetup 2017-02-28
Prague data management meetup 2017-02-28Prague data management meetup 2017-02-28
Prague data management meetup 2017-02-28
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
 

Mehr von Inside Analysis

Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
Inside Analysis
 

Mehr von Inside Analysis (20)

An Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIAn Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BI
 
Agile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessAgile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for Success
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
Fit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownFit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data Letdown
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of Data
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop Adoption
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time Analytics
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of Everything
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global Level
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your Architecture
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the Risk
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data Warehouse
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave Duggal
 
Modus Operandi
Modus OperandiModus Operandi
Modus Operandi
 
Phasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyPhasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey Malafsky
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Foundation for Success: How Big Data Fits in an Information Architecture

  • 1. Grab some coffee and enjoy the pre-show banter before the top of the hour!
  • 2. “The Inevitable Shift: How Big Data Impacts Enterprise Architecture” RoundTable Webcast | April 9, 2014
  • 3. Host Eric Kavanagh CEO, The Bloor Group @eric_kavanagh eric.kavanagh@bloorgroup.com
  • 4. Big Data Information Architecture Exploratory Webcast January 22, 2014 Roundtable Webcast April 9, 2014 Findings Webcast June 25, 2014 #BigDataArch ✓ ✓
  • 5. Analysts Robin Bloor Chief Analyst, The Bloor Group Richard Winter President & Founder, WinterCorp Mike Ferguson Managing Director, Intelligent Business Strategies
  • 7. Hadoop as the Data Reservoir
  • 8. Big Data and the Data Reservoir
  • 9. BDIA: The Story So Far Robin Bloor, Ph.D.
  • 10. Big Data – A Poorly Defined Term WHAT IS BIG DATA? Traditional data Business data Log file data Operational data Mobile data Location data Social network data Public data Commercial databases Streaming data Internet of Things
  • 11. Atoms and Molecules The ATOM of data has become the EVENT A TRANSACTION is a MOLECULE of ATOMIC EVENTS
  • 12. The Traffic Cop (Events)
  • 13. Atoms and Molecules DATA FLOW is becoming a driving factor This suggests the need for a DATA RESERVOIR
  • 14. Hadoop as the Data Reservoir
  • 15. Big Data and the Data Reservoir
  • 16. The Workload Paradigm Shift u Previously, we viewed database workloads as an i/o optimization problem u With analytics the workload is a very variable mix of i/o and calculation u No databases were built precisely for this – not even Big Data databases
  • 17. The Big Data Applications It’s pretty much all about BI & ANALYTICS
  • 18. The Biological System u Our human control system works at different speeds: • Almost instant reflex • Swift response • Considered response u Organizations will gradually implement similar control systems u This suggests a data-flow-based architecture u The EDW is memory
  • 19. The Corporate Biological System u Right now this division into two different data flows is already occurring u Currently we can distinguish between: • Real-time/Business-time applications • Analytical applications u We should build specific architectures for this
  • 20. W I N T E R C O R P Big Data Information Architecture Bloor Group Roundtable Richard Winter WinterCorp April 2014 T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S
  • 21. Big Data and the Data Reservoir From Robin’s charts:
  • 22. It’s About the Platforms & Their Roles • Data Warehouse • Data Mart • Data Refinery • Data Landing Zone • Data Discovery • Graph Analytics • Etc. W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S 22! © ©2012, 2010 2013, Winter 2014 WINTER Corporation. CORPORATION, All Rights BELMONT Reserved. MA. ALL RIGHTS RESERVED.
  • 23. Data Refining Example Data from Turbines W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S 23! © 201©©0 ,2 2010 2001121, ,2 2001132, Winter 2 W01I4N TWERIN Corporation. CTOERR PCOORRAPTOIORANT, All ICOANM, Rights BBRELIDMGOEN Reserved. MT MA.A A. LALL RLI GRIHGTHST RSE RSEESREVREVDE.D .
  • 24. Data Refining Example Data Management Requirements 1. Hundreds of TB or more of data per week 2. Raw data life: few hours to a few days 3. Challenge: find the important events or trends quickly 4. Massive analysis problem 5. When analyzing, read entire files 6. Keep only the significant data W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S 24! © ©2012, 2010 2013, Winter 2014 WINTER Corporation. CORPORATION, All Rights BELMONT Reserved. MA. ALL RIGHTS RESERVED.
  • 25. Business Example Enterprise Data Warehouse W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S 25! © ©2012, 2010 2013, Winter 2014 WINTER Corporation. CORPORATION, All Rights BELMONT Reserved. MA. ALL RIGHTS RESERVED.
  • 26. Enterprise Data Warehouse Data Management Requirements 1. Data volume a. TB to PB – all retained for at least five years b. Continual growth of data and workload 2. Data sources: hundreds to thousands a. Data sources change their feeds frequently b. New data sources are frequent 3. Challenges a. Data must be correct b. Data must be integrated 4. Typical enterprise data lifetime: decades 5. Analytic application lifetime: years 6. Many thousands of data users (104 – 106) 7. Hundreds of analytic applications 8. Thousands of one time analyses 9. Tens of thousands of complex queries W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S 26! © ©2012, 2010 2013, Winter 2014 WINTER Corporation. CORPORATION, All Rights BELMONT Reserved. MA. ALL RIGHTS RESERVED.
  • 27. Some Platform Examples Requirement Platform Data Refinery Hadoop Complex SQL Query Data Warehouse Enforce/Manage Business Rules Data Warehouse Intensive Batch Processing Hadoop Simple Data Mart Multiple Options Data Discovery New Category Integrated Data Data Warehouse Data Landing Zone Hadoop Document Store Multiple Options Stream Processing Multiple Options ETL Multiple Options W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S 27! © ©2012, 2010 2013, Winter 2014 WINTER Corporation. CORPORATION, All Rights BELMONT Reserved. MA. ALL RIGHTS RESERVED.
  • 28. Understand the Platform Cost Tradeoffs • Cost tradeoffs can be surprising – platform cost is not always the driver • Requires a total cost framework & systematic approach • “Big Data: What Does it Really Cost?” wintercorp.com/tcod-­‐‑report W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S 28! © ©2012, 2010 2013, Winter 2014 WINTER Corporation. CORPORATION, All Rights BELMONT Reserved. MA. ALL RIGHTS RESERVED.
  • 29. Data Platforms A Changing Picture • Categories are not seiled • Data Warehouse has a continuing, major role • Hadoop has a major role • Everything else is in flux W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S 29! © ©2012, 2010 2013, Winter 2014 WINTER Corporation. CORPORATION, All Rights BELMONT Reserved. MA. ALL RIGHTS RESERVED.
  • 30. Big Data Information Architecture Mike Ferguson Managing Director Intelligent Business Strategies Bloor Group Big Data Roundtable April 2014 Twitter: @mikeferguson1
  • 31. For Many Years The Traditional Data Warehouse and BI Environment Has Been Used For Analysis & Reporting 31 Operational systems web P o r t a l Employees Partners Customers BI Tools Platform Integration / DQ Data Reports & analytics DW Data warehouse & data marts
  • 32. However There Are New Types of Data That Businesses Now Want to Analyse § Web data 32 • Clickstream data, e-commerce logs • Social networks data e.g., Twitter § Semi-structured data e.g., e-mail, XML, JSON § Unstructured content • How much is TEXT worth to you § Sensor data • Temperature, light, vibration, location, liquid flow, pressure, RFIDs § Vertical industries structured transaction data • E.g. Telecom call data records, retail Source: Analytics: The Real-World Use of Big Data Said Business School Oxford and IBM
  • 33. The Impact of Big Data – We Now Have Different Platforms Optimised For Different Analytical Workloads Big Data workloads now mean we require multiple platforms for analytical processing 33 Streaming data Advanced Analytic (multi-structured data) Hadoop data store DW & marts Data Warehouse RDBMS NoSQL DBMS EDW NoSQL DB e.g. graph DB mart Advanced Analytics (structured data) DW Appliance Analytical RDBMS C MDM R U Cust Prod Asset D Graph analysis Investigative analysis, Data refinery Data mining, model development Traditional query, reporting & analysis Real-time stream processing & decision management Master data management
  • 34. Hadoop Is A Platform At The Heart of Big Data Analytics – There Are Multiple Ways To Access Hadoop 34 Java MapReduce SQL APIs to HDFS, HBase, Cascading file file file file file file file file file file file file file file Vendor SQL on Hadoop engine webHDFS (An HTTP interface to HDFS has REST APIs) HDFS file file index index Index partition file file MapReduce Hadoop 2.0 F’work YARN SQL PIG latin scripts MapReduce Application BI Tools / Apps
  • 35. 35 Popular Hadoop Use Cases § Hadoop as a data refinery • Offloading data integration from a DW § Hadoop for investigative analysis in an analytical sandbox § Hadoop as an on-line data warehouse archive
  • 36. 36 The Hadoop Data Refinery EDW Graph DBMS Analytical DBMS DW Appliance CRM ERP SCM Ops XML, JSON Web logs social NoSQL DB web Data marts insights ELT processing cloud
  • 37. A Centralised Hadoop Based Data Refinery is One Way to Scale at Reduced Cost 37 Data Hub - Consume, Clean, Integrate, Analyse And Provision Data From Hadoop To Any Analytical Platform DW & marts mar t business insight NoSQL DB e.g. graph DB EDW Generated MapReduce ELT jobs sandbox ELT Processing !"#$% &'()% Advanced Analytics (structured data) RDBMS social Cloud Files office docs *+,*-./0123% Web logs web services sensors feeds DW Appliance Exploratory analysis Staging area / landing zone Sometime analysts refer to this as a Data Refinery Data Refinery What is the purpose of the data refinery? Is it to process un-modelled data or all data?
  • 38. Investigative Analysis Can Be Done In A Hadoop Sandbox 38 Click stream web log data Customer interaction data Social interaction data (e.g. Twitter, Facebook) Sensor data Rich media data (video, audio) External web content Documents Internal web content Seismic data (oil & gas) Investigative / Exploratory Analysis Data Scientists master data archived DW data MDM System C R U Product Asset Customer D EDW mart new business insight sandbox Multi-structured data Historical Data
  • 39. Joining Big Data With Master Data During Exploratory Analysis Can Produce Insight for Competitive Advantage 39 Streaming Data Graph Data Multi-Structured NoSQL DB e.g. graph DB C + Master Data Business Value Created R U Master data D sentiment Customer sentiment & Product sentiment Customer online behaviour Prospects & Influencers Sensor data Field service optimization Risk mgm’t Asset performance customer product customer customer asset
  • 40. 40 New Insights Can Be Added Into A DW To Enrich What You Already Know DW D I new insights Operational systems Data Scientists sandbox Web logs social web cloud e.g. Deriving insight from social web sites like for sentiment analytics
  • 41. Alternatively New Insights In Hadoop Can Integrated With A DW Using Data Virtualization To Provide Enriched Information 41 DW D I Data Vitualisation SQL on Hadoop new insights OLTP systems Data Scientists sandbox Web logs social web cloud e.g. Deriving insight from social web sites like for sentiment analytics
  • 42. 42 Using Hadoop As A Data Archive Means Data Can Be Kept On-line, Analysed And Still Integrated With Data In The DW DW D I new insights OLTP systems Data Vitualisation SQL on Hadoop Archive unused or data > n years Archived data
  • 43. Real-time Data From NoSQL DBMSs Can Also Be Joined To DW Data Using Data Virtualization 43 DW D I Data Vitualisation Nested data !! real-time insights OLTP systems Web logs social NoSQL DB Column Family DB Document DB sensors Nested data like JSON needs to be handled by the data virtualisation server
  • 44. 44 Investigative Analysis Can Be Done In A Graph DBMS – New Insight Can Also Come From Graph Analysis Investigative / Exploratory Analysis Data Scientists MDM System C R U Product Asset Customer D new business Insight Structured data master data Multi-structured data Graph DBMS
  • 45. SQL access to streaming data in 45 SQL Access To Big Data - Options SQL access to big data in Hadoop SQL SQL access to big data in an analytical RDBMS SQL Analytical RDBMS motion SQL streaming data SQL access to big data via data virtualisation SQL data virtualisation server DW SQL access to a combination of the above
  • 46. 46 SQL on Hadoop Challenges – Multi-structured Data May Need to Be Analysed { "firstName": ”Wayne", "lastName": ”Rooney", "age": 25, "address": { "streetAddress": "21 Sir Matt Busby Way", "city": ”Manchester”, “country”: “England”, "postalCode": “M1 6DY” }, "phoneNumbers": [ { "type": "home”, "number": ”0161-123-1234” }, { "type": ”mobile", "number": ”07779-123234” } ] } JSON data Text data Image Data SQL?? SQL?? SQL??
  • 47. 47 SQL on Hadoop Challenges – Multi-structured Data May Need to Be Analysed Web log data SQL?? SQL?? Tab delimited file data
  • 48. Hadoop Storage Is Independent of Any SQL Engine Accessing HDFS - Multiple SQL Engines Can Coexist On The Same Data Storage is independent of any SQL engine 48 SQL SQL SQL SQL Source: Hortonworks § Key points about Hadoop • It is possible to have MULTIPLE SQL engines on the same data • Different SQL engines run on different Hadoop frameworks (M/R, Tez, Spark) or on no framework at all i.e. directly access HDFS or HBase data
  • 49. Relational DBMS / Hadoop Integration – Several Vendors Have Integrated RDBMS with Hadoop to Run Analytics 49 SQL, XQuery Relational DBMS External Polymorphic table function(s) HDFS / Hbase/ Hive Allows join across data in a single RDBMS and Hadoop RDBMS optimizer handles transparent access to external analytical platforms on behalf of the user CitusDB Exasol EXAPowerlitics IBM PureData System for Analytics and DB2 HDFS clients Oracle HDFS Client Pivotal HAWQ PFX Teradata SQL H RDBMS and Hadoop could be deployed on the same hardware cluster or on different hardware clusters
  • 50. Product examples: Cirro, Cisco, Denodo, Informatica Data Services, ScleraDB BUT what about optimization? Can the data virtualisation server push down analytics to underlying platforms to make them do the work? 50 Self-Service BI Self-Service Access To Big Data Via Data Virtualization Business analyst Self-service Data Discovery & Visualisation or Dashboard Server Data Virtualization and Optimization personal & office data Predictive DW models Transaction systems Data Management Tools (ETL, DQ, etc.)
  • 51. Conclusions - People In Different Roles In The Analytical Landscape Need to Work Together To Deliver Value 51 sandbox Analytical Operational Exploratory analysis Model producer Business Analyst Business Manager/ Operations Worker Data Scientist Model consumer Data discovery & visualisation Information Producer • Build reports • Build and publish dashboards Information consumer Decision maker Action taker
  • 52. 52 Thank You! www.intelligentbusiness.biz mferguson@intelligentbusiness.biz Twitter: @mikeferguson1 Tel/Fax (+44)1625 520700
  • 55. THANK YOU! Image on Slide 53 borrowed from http://www.apieceofmonologue.com/2012/08/stanley-kubrick-film-photography- design.html REGISTER FOR BDIA WEBCASTS AT: http://insideanalysis.com/research/big-data-information-architecture