SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
Bramhope
a modern data
architecture
for BI
© the DataShed Limited 2015
a (not so) long time ago in a galaxy far, far away…
The most complex system you had to handle was an old AS/400
Publishing data weekly was adequate for most users
Analysts didn’t really exist
Most queries took hours to run – but that was ok
© the DataShed Limited 2015
what the hell happened?
© the DataShed Limited 2015
data explosion
Source(s): CSC: http://www.csc.com/insights/flxwd/78931-big_data_universe_beginning_to_explode Gartner: http://www.gartner.com/technology/research/it-spending-forecast/
Growth between
2010 and 2020:
Data: 500%
Budget: 16%
$3.0
$4.0
$5.0
0
5
10
15
20
25
30
35
40
2010 2012 2013 2014 2015 2016 2017 2018 2020
GlobalITBudget($Trillion)
GlobalDataVolume(Zettabytes)
Data Growth IT Budget Growth Expon. (Data Growth) Expon. (IT Budget Growth)
© the DataShed Limited 2015
help is at hand…
© the DataShed Limited 2015
Hadoop & Big Data tools
First incarnation in 2005
Highly-scalable data processing, based on a distributed file system (HDFS)
Ability to handle PB size workloads
Becoming more mature – including:
ANSI-SQL compliant data warehousing tools (Hive & Stinger.next)
Batch processing (Map Reduce/Tez, Pig)
Operations management (Ambari)
Security (Knox)
Governance (HCatalog)
© the DataShed Limited 2015
…or is it?
© the DataShed Limited 2015
too small?
Small cluster: 5 – 50 nodes
Assuming a single node:
24GB RAM
Single socket quad core
4-6 2 TB SATA drives
Total storage ≈ 10 TB
How many of us need to process 10TB of data?
Source: http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.2/bk_cluster-planning-guide/content/conclusion.html
© the DataShed Limited 2015
if not Hadoop, then what?
Big Data has driven innovation in both technology and tools.
If you can’t adopt the tools, you can still adopt some of the principles:
Design for scale-out, rather than up.
ELT vs ETL
Lambda data architecture
… amongst other things!
© the DataShed Limited 2015
prepare to scale out
Data Storage
Data Integration
Data Marts & Cubes
Business Intelligence Apps
Executives: Dashboards
Managers & Stakeholders: Reports
Business/Data Analysts: Cubes & Direct Access
Specific, small data
marts
CRM, ERP, Transactional System
ETL Tools (SSIS,
Informatica, Scripting,
Oracle Data Integrator)
Data Integration
Data Marts & Cubes
ETL Tools (SSIS,
Informatica, Scripting,
Oracle Data Integrator)
Data Integration
Data Marts & Cubes
ETL Tools (SSIS,
Informatica, Scripting,
Oracle Data Integrator)
Specific, small data
marts
Specific, small data
marts
Data Storage
Data Integration
Data Marts & Cubes
Business Intelligence Apps
Executives: Dashboards
Managers & Stakeholders: Reports
Business/Data Analysts: Cubes & Direct Access
Data marts constructed on top of an EDW
Cubes present views of this data to business users
ETL Tools (SSIS, Informatica, Scripting, Oracle Data Integrator)
CRM, ERP, Transactional System
© the DataShed Limited 2015
ELT vs ETL
Key considerations:
Metadata & data lineage
How real-time is real-time?
How long does it take you to get data to analysts?
How powerful is your presentation server?
Can you use both?
vs
Schema on read? Or schema on write?
© the DataShed Limited 2015
lambda data architecture
© the DataShed Limited 2015
…most importantly, think
differently
© the DataShed Limited 2015
You don’t need big data to use Big Data tools
Example: Prediction.io (http://prediction.io/)
Open Source Machine Learning Server, utilizes Hadoop, HBase, Spark and ElasticSearch
© the DataShed Limited 2015
any questions?
© the DataShed Limited 2015
ed thewlis
tech director – the DataShed
@edthewlis
ed@thedatashed.co.uk
© the DataShed Limited 2015

Weitere ähnliche Inhalte

Was ist angesagt?

Journey to Cloud Analytics
Journey to Cloud Analytics Journey to Cloud Analytics
Journey to Cloud Analytics Datavail
 
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...Denodo
 
Solution Centric Architectural Presentation - Implementing a Logical Data War...
Solution Centric Architectural Presentation - Implementing a Logical Data War...Solution Centric Architectural Presentation - Implementing a Logical Data War...
Solution Centric Architectural Presentation - Implementing a Logical Data War...Denodo
 
Dell hans timmerman v1.1
Dell hans timmerman v1.1Dell hans timmerman v1.1
Dell hans timmerman v1.1BigDataExpo
 
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)Denodo
 
Data Services and the Modern Data Ecosystem (Middle East)
Data Services and the Modern Data Ecosystem (Middle East)Data Services and the Modern Data Ecosystem (Middle East)
Data Services and the Modern Data Ecosystem (Middle East)Denodo
 
Fixing data science & Accelerating Artificial Super Intelligence Development
 Fixing data science & Accelerating Artificial Super Intelligence Development Fixing data science & Accelerating Artificial Super Intelligence Development
Fixing data science & Accelerating Artificial Super Intelligence DevelopmentManojKumarR41
 
Data Virtualization at Logitech = #Winning
Data Virtualization at Logitech = #WinningData Virtualization at Logitech = #Winning
Data Virtualization at Logitech = #WinningDenodo
 
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Data Democratization for Faster Decision-making and Business Agility (ASEAN)Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Data Democratization for Faster Decision-making and Business Agility (ASEAN)Denodo
 
Data Virtualization - Enabling Next Generation Analytics
Data Virtualization - Enabling Next Generation AnalyticsData Virtualization - Enabling Next Generation Analytics
Data Virtualization - Enabling Next Generation AnalyticsDenodo
 
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...Denodo
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonCapgemini
 
2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey Results2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey ResultsAtScale
 
Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work
Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to WorkDenodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work
Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to WorkDenodo
 
Getting down to business on Big Data analytics
Getting down to business on Big Data analyticsGetting down to business on Big Data analytics
Getting down to business on Big Data analyticsThe Marketing Distillery
 
Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)Denodo
 
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)Denodo
 
Pieter den Hamer Alliander
Pieter den Hamer Alliander Pieter den Hamer Alliander
Pieter den Hamer Alliander BigDataExpo
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphDataStax
 

Was ist angesagt? (20)

Journey to Cloud Analytics
Journey to Cloud Analytics Journey to Cloud Analytics
Journey to Cloud Analytics
 
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
 
Solution Centric Architectural Presentation - Implementing a Logical Data War...
Solution Centric Architectural Presentation - Implementing a Logical Data War...Solution Centric Architectural Presentation - Implementing a Logical Data War...
Solution Centric Architectural Presentation - Implementing a Logical Data War...
 
[XConf Brasil 2020] Data mesh
[XConf Brasil 2020] Data mesh[XConf Brasil 2020] Data mesh
[XConf Brasil 2020] Data mesh
 
Dell hans timmerman v1.1
Dell hans timmerman v1.1Dell hans timmerman v1.1
Dell hans timmerman v1.1
 
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
 
Data Services and the Modern Data Ecosystem (Middle East)
Data Services and the Modern Data Ecosystem (Middle East)Data Services and the Modern Data Ecosystem (Middle East)
Data Services and the Modern Data Ecosystem (Middle East)
 
Fixing data science & Accelerating Artificial Super Intelligence Development
 Fixing data science & Accelerating Artificial Super Intelligence Development Fixing data science & Accelerating Artificial Super Intelligence Development
Fixing data science & Accelerating Artificial Super Intelligence Development
 
Data Virtualization at Logitech = #Winning
Data Virtualization at Logitech = #WinningData Virtualization at Logitech = #Winning
Data Virtualization at Logitech = #Winning
 
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Data Democratization for Faster Decision-making and Business Agility (ASEAN)Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
 
Data Virtualization - Enabling Next Generation Analytics
Data Virtualization - Enabling Next Generation AnalyticsData Virtualization - Enabling Next Generation Analytics
Data Virtualization - Enabling Next Generation Analytics
 
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
 
2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey Results2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey Results
 
Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work
Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to WorkDenodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work
Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work
 
Getting down to business on Big Data analytics
Getting down to business on Big Data analyticsGetting down to business on Big Data analytics
Getting down to business on Big Data analytics
 
Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)
 
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
 
Pieter den Hamer Alliander
Pieter den Hamer Alliander Pieter den Hamer Alliander
Pieter den Hamer Alliander
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 

Andere mochten auch

Why Big Data Analytics Needs Business Intelligence Too
Why Big Data Analytics Needs Business Intelligence Too Why Big Data Analytics Needs Business Intelligence Too
Why Big Data Analytics Needs Business Intelligence Too Barry Devlin
 
Business unIntelligence, Chapter 5
Business unIntelligence, Chapter 5Business unIntelligence, Chapter 5
Business unIntelligence, Chapter 5Barry Devlin
 
Business unIntelligence - a Whistle Stop Tour
Business unIntelligence - a Whistle Stop TourBusiness unIntelligence - a Whistle Stop Tour
Business unIntelligence - a Whistle Stop TourBarry Devlin
 
How big data is transforming BI
How big data is transforming BIHow big data is transforming BI
How big data is transforming BIDeZyre
 
Three signs your architecture is too small for big data. Camp IT December 2014
Three signs your architecture is too small for big data.  Camp IT December 2014Three signs your architecture is too small for big data.  Camp IT December 2014
Three signs your architecture is too small for big data. Camp IT December 2014Craig Jordan
 

Andere mochten auch (6)

Why Big Data Analytics Needs Business Intelligence Too
Why Big Data Analytics Needs Business Intelligence Too Why Big Data Analytics Needs Business Intelligence Too
Why Big Data Analytics Needs Business Intelligence Too
 
Business unIntelligence, Chapter 5
Business unIntelligence, Chapter 5Business unIntelligence, Chapter 5
Business unIntelligence, Chapter 5
 
Business unIntelligence - a Whistle Stop Tour
Business unIntelligence - a Whistle Stop TourBusiness unIntelligence - a Whistle Stop Tour
Business unIntelligence - a Whistle Stop Tour
 
Etl elt simplified
Etl elt simplifiedEtl elt simplified
Etl elt simplified
 
How big data is transforming BI
How big data is transforming BIHow big data is transforming BI
How big data is transforming BI
 
Three signs your architecture is too small for big data. Camp IT December 2014
Three signs your architecture is too small for big data.  Camp IT December 2014Three signs your architecture is too small for big data.  Camp IT December 2014
Three signs your architecture is too small for big data. Camp IT December 2014
 

Ähnlich wie Modern Data Architecture

How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...
How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...
How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...Enterprise Management Associates
 
Exploiting Data Lakes: Architecture, Capabilities & Future
Exploiting Data Lakes: Architecture, Capabilities & FutureExploiting Data Lakes: Architecture, Capabilities & Future
Exploiting Data Lakes: Architecture, Capabilities & FutureAgilisium Consulting
 
Building a marketing data lake
Building a marketing data lakeBuilding a marketing data lake
Building a marketing data lakeSumit Sarkar
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019mark madsen
 
A Connected Data Landscape: Virtualization and the Internet of Things
A Connected Data Landscape: Virtualization and the Internet of ThingsA Connected Data Landscape: Virtualization and the Internet of Things
A Connected Data Landscape: Virtualization and the Internet of ThingsInside Analysis
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationDenodo
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
 
Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Harald Erb
 
Visualisation and forecasting on IT capacity planning data
Visualisation and forecasting on IT capacity planning dataVisualisation and forecasting on IT capacity planning data
Visualisation and forecasting on IT capacity planning dataAndrew Gadsby
 
The State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and ScaleThe State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and ScaleVoltDB
 
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingAmazon Web Services
 
Modernize your Infrastructure and Mobilize Your Data
Modernize your Infrastructure and Mobilize Your DataModernize your Infrastructure and Mobilize Your Data
Modernize your Infrastructure and Mobilize Your DataPrecisely
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Denodo
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunitiesBigdata Meetup Kochi
 
Delivering Analytics at The Speed of Transactions with Data Fabric
Delivering Analytics at The Speed of Transactions with Data FabricDelivering Analytics at The Speed of Transactions with Data Fabric
Delivering Analytics at The Speed of Transactions with Data FabricDenodo
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoptionHortonworks
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Denodo
 
Succeeding with Analytics: Mastering People, Process, and Technology
Succeeding with Analytics: Mastering People, Process, and TechnologySucceeding with Analytics: Mastering People, Process, and Technology
Succeeding with Analytics: Mastering People, Process, and Technologyibi
 

Ähnlich wie Modern Data Architecture (20)

How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...
How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...
How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...
 
Exploiting Data Lakes: Architecture, Capabilities & Future
Exploiting Data Lakes: Architecture, Capabilities & FutureExploiting Data Lakes: Architecture, Capabilities & Future
Exploiting Data Lakes: Architecture, Capabilities & Future
 
Building a marketing data lake
Building a marketing data lakeBuilding a marketing data lake
Building a marketing data lake
 
Business Insights ICT
Business Insights ICTBusiness Insights ICT
Business Insights ICT
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019
 
A Connected Data Landscape: Virtualization and the Internet of Things
A Connected Data Landscape: Virtualization and the Internet of ThingsA Connected Data Landscape: Virtualization and the Internet of Things
A Connected Data Landscape: Virtualization and the Internet of Things
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020
 
Visualisation and forecasting on IT capacity planning data
Visualisation and forecasting on IT capacity planning dataVisualisation and forecasting on IT capacity planning data
Visualisation and forecasting on IT capacity planning data
 
The State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and ScaleThe State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and Scale
 
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data Warehousing
 
Modernize your Infrastructure and Mobilize Your Data
Modernize your Infrastructure and Mobilize Your DataModernize your Infrastructure and Mobilize Your Data
Modernize your Infrastructure and Mobilize Your Data
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunities
 
Delivering Analytics at The Speed of Transactions with Data Fabric
Delivering Analytics at The Speed of Transactions with Data FabricDelivering Analytics at The Speed of Transactions with Data Fabric
Delivering Analytics at The Speed of Transactions with Data Fabric
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Succeeding with Analytics: Mastering People, Process, and Technology
Succeeding with Analytics: Mastering People, Process, and TechnologySucceeding with Analytics: Mastering People, Process, and Technology
Succeeding with Analytics: Mastering People, Process, and Technology
 

Kürzlich hochgeladen

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 

Modern Data Architecture

  • 1. Bramhope a modern data architecture for BI © the DataShed Limited 2015
  • 2. a (not so) long time ago in a galaxy far, far away… The most complex system you had to handle was an old AS/400 Publishing data weekly was adequate for most users Analysts didn’t really exist Most queries took hours to run – but that was ok © the DataShed Limited 2015
  • 3. what the hell happened? © the DataShed Limited 2015
  • 4. data explosion Source(s): CSC: http://www.csc.com/insights/flxwd/78931-big_data_universe_beginning_to_explode Gartner: http://www.gartner.com/technology/research/it-spending-forecast/ Growth between 2010 and 2020: Data: 500% Budget: 16% $3.0 $4.0 $5.0 0 5 10 15 20 25 30 35 40 2010 2012 2013 2014 2015 2016 2017 2018 2020 GlobalITBudget($Trillion) GlobalDataVolume(Zettabytes) Data Growth IT Budget Growth Expon. (Data Growth) Expon. (IT Budget Growth) © the DataShed Limited 2015
  • 5. help is at hand… © the DataShed Limited 2015
  • 6. Hadoop & Big Data tools First incarnation in 2005 Highly-scalable data processing, based on a distributed file system (HDFS) Ability to handle PB size workloads Becoming more mature – including: ANSI-SQL compliant data warehousing tools (Hive & Stinger.next) Batch processing (Map Reduce/Tez, Pig) Operations management (Ambari) Security (Knox) Governance (HCatalog) © the DataShed Limited 2015
  • 7. …or is it? © the DataShed Limited 2015
  • 8. too small? Small cluster: 5 – 50 nodes Assuming a single node: 24GB RAM Single socket quad core 4-6 2 TB SATA drives Total storage ≈ 10 TB How many of us need to process 10TB of data? Source: http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.2/bk_cluster-planning-guide/content/conclusion.html © the DataShed Limited 2015
  • 9. if not Hadoop, then what? Big Data has driven innovation in both technology and tools. If you can’t adopt the tools, you can still adopt some of the principles: Design for scale-out, rather than up. ELT vs ETL Lambda data architecture … amongst other things! © the DataShed Limited 2015
  • 10. prepare to scale out Data Storage Data Integration Data Marts & Cubes Business Intelligence Apps Executives: Dashboards Managers & Stakeholders: Reports Business/Data Analysts: Cubes & Direct Access Specific, small data marts CRM, ERP, Transactional System ETL Tools (SSIS, Informatica, Scripting, Oracle Data Integrator) Data Integration Data Marts & Cubes ETL Tools (SSIS, Informatica, Scripting, Oracle Data Integrator) Data Integration Data Marts & Cubes ETL Tools (SSIS, Informatica, Scripting, Oracle Data Integrator) Specific, small data marts Specific, small data marts Data Storage Data Integration Data Marts & Cubes Business Intelligence Apps Executives: Dashboards Managers & Stakeholders: Reports Business/Data Analysts: Cubes & Direct Access Data marts constructed on top of an EDW Cubes present views of this data to business users ETL Tools (SSIS, Informatica, Scripting, Oracle Data Integrator) CRM, ERP, Transactional System © the DataShed Limited 2015
  • 11. ELT vs ETL Key considerations: Metadata & data lineage How real-time is real-time? How long does it take you to get data to analysts? How powerful is your presentation server? Can you use both? vs Schema on read? Or schema on write? © the DataShed Limited 2015
  • 12. lambda data architecture © the DataShed Limited 2015
  • 13. …most importantly, think differently © the DataShed Limited 2015
  • 14. You don’t need big data to use Big Data tools Example: Prediction.io (http://prediction.io/) Open Source Machine Learning Server, utilizes Hadoop, HBase, Spark and ElasticSearch © the DataShed Limited 2015
  • 15. any questions? © the DataShed Limited 2015
  • 16. ed thewlis tech director – the DataShed @edthewlis ed@thedatashed.co.uk © the DataShed Limited 2015