SlideShare ist ein Scribd-Unternehmen logo
1 von 32
5 Things That Make Hadoop a Game 
Changer
And about me 
• 15 years of data management 
• 10 years devoted to data engineering 
• I’m still a nerd, not an academic 
• Startups to the fortune 100 
Elliott Cordo 
Chief Architect, Caserta Concepts 
elliott@casertaconcepts.com
Caserta Concepts 
 Technology services company with focused expertise in: 
 Data Warehousing 
 Business Intelligence 
 Big Data Analytics 
Data is all we do. 
 Established in 2001: 
 Data Science & Analytics 
 Data on the Cloud 
 Data Interaction & Visualization 
 Industry recognized work force 
 Strategy, Assessments, Implementation 
 Writing, Education, Mentoring 
 Broad experience across industries: 
 Financial Services / Insurance / Services 
 eCommerce / Advertising / Higher Education / Healthcare
Client Portfolio 
Finance. Healthcare 
& Insurance 
Retail/eCommerce 
& Manufacturing 
Education 
& Services
Community
Caserta Innovation Lab (CIL) 
• Internal laboratory established to test & develop 
solution concepts and ideas 
• Used to accelerate client projects 
• Examples: 
• Search (SOLR) based BI 
• Big Data Governance Toolkit 
• Text Analytics on Social Network Data 
• Continuous Integration / End-to-end streaming 
• Recommendation Engine Optimization 
• Others (confidential) 
• CIL is hosted on
What is “a Hadoop”? 
• A distributed file system 
– Like any file system – any data type can be stored 
• A distributed processing framework 
– Processing is spread across a cluster of machines
Born in the server rooms of tech giants 
• They were the first guys to deal with this BIG DATA 
stuff 
– GB turned to TB turned to PB 
– Technologies weren’t scalable enough 
– New workloads 
– New data types 
• Now many of us have to deal with it too!
What is “a Big Data”? 
4V’s 
Lets keep it simple: 
• Data is big – measured in multiple TB or PB 
• Data isn’t very relational: 
– Json 
– Log files 
– Raw text 
– Various binary formats
Some misconceptions about Hadoop 
• Data Lake – just dump your data in there and 
everything will be fine. 
• You have to be a java programmer 
• You don’t have to be a programmer 
• SQL is no good, actually SQL Is really good! 
• Hadoop is the solution for everything
So what makes Hadoop a game 
changer?
#1 Low-Cost computing and data 
management 
• Before Hadoop we only had a few tools in our 
quiver for dealing with massive datasets. 
• Scaling up to ginormous servers and storage 
systems (via SAN’s and such) 
• MPP Databases (Massively Parallel Processing)
Now you know longer need $$$ to be 
competitive 
• Store anything, for as long as needed! 
• Run massive query and computing workloads 
• Leverage commodity servers or cloud 
• Enablement for startups and innovators
And about those MPP’s 
• I’m an MPP proponent and often use them in 
conjunction with large Hadoop implementations – 
storage for “hot data” 
• However they are EXPENSIVE! 
– Orders of magnitude more expensive than Hadoop 
– Often use proprietary hardware 
• ..and they are limited: 
– Not generally as scalable as Hadoop 
– Only good at structured data
#2 Promote agile data culture 
Enable your data analysts and data scientists to 
thrive 
• Onboard data fast 
• Blend with governed data 
• Produce business insights
The Traditional Way 
ENTERPRISE 
DATA WAREHOUSE 
Data Governance 
Data Integration 
Modeling 
Cool 
new 
data 
• Data warehouse is 
central/protected/highly 
governed 
• Rigorous SDLC and governance 
• A lot of work just to validate 
this data is valuable?
The Data Lake 
 Not all data in Hadoop will be fully governed 
 Only a subset of the data in Hadoop will be fully governed. 
 We refer to this as the Trusted layer of the Big Data Warehouse. 
 Limited barriers to ingest 
Big 
Data 
Warehouse 
Data Science Workspace 
Staging 
machine learning, blending with 
Metadata  Catalog 
ILM  who has access, how long do we “manage it” 
Data Quality and Monitoring  Monitoring of 
Landing Area – Source Data in “Full Fidelity” 
completeness of data 
Metadata  Catalog 
ILM  who has access, 
how long do we “manage it” 
Agile business insight through data-munging, 
external data, development of facts 
Data is ready to be turned 
into information: 
organized, well defined, 
complete. 
Raw machine data 
collection, collect 
everything 
Metadata  Catalog 
ILM  who has access, how long to “manage it” 
Data Quality and Monitoring  Monitoring 
of completeness of data 
User community arbitrary queries and reporting Fully Data Governed
The Data Science Factory 
• Data scientists have access to ALL layers of the data lake 
• New data is easy to onboard  only requirement basic metadata 
and ILM 
• The products of data science may evolve into new data products, 
including new data warehouse events (facts) and reference data 
(dimensions) 
BDW 
Data Science 
Workspace 
Staging 
Landing Area 
Cool 
new 
data 
New 
Insights
#3 Hadoop + Cloud = elasticity 
• Pay for the computing you need 
• Scale elastically and quickly
Unique opportunities 
• Cloud storage is cheap and dependable 
– Archiving services are even cheaper 
• Pay only for the data processing you need 
• Test new processes quickly and inexpensively 
• Optimize operations and scale rapidly
Ephemeral data processing 
Why have severs sitting around all day doing nothing! 
• Build cluster in one command! 
• Bootstrap  install anything special you need: 
– Spark, Impala, Mahout 
• Run the jobs you need to 
• Tear it down 
aws emr create-cluster --applications Name=Pig --ami-version 3.2.1 -- 
instance-groups 
InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge 
InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge 
--steps Type=PIG,Name="Pig Program",ActionOnFailure=CONTINUE,Args=[- 
f,s3://elasticmapreduce/samples/pig-apache/do-reports2.pig,- 
p,INPUT=s3://elasticmapreduce/samples/pig-apache/input,- 
p,OUTPUT=s3://mybucket/pig-apache/output]
#4 No rigid data structures 
• As a distributed file system Hadoop can store 
anything. 
• Structure isn’t required, unless you want it… 
• Remove barriers for new data ingest 
• Keep data in it’s full fidelity
Embrace semi-structured data 
• Logs are ubiquitous 
• Json is pretty darn awesome 
• VERY difficult to model in the relational world 
– Extreme denomination/brittle data structures 
– Alternative: fidelity loss 
• Plenty of tools for dealing with these formats “Late Bind” 
– Hive Serdes 
– PIG
Embrace unstructured data 
• Raw text 
• PDF 
• Various binary data formats 
• Custom engineered processes, machine 
learning
Why is it important that Hadoop let’s 
you store as-is? 
• Data which is not important today could be tomorrow 
• What if your processes are flawed, and you need to 
recast? 
• Versioning as the data evolves 
• This is a traditional data management best practice – it 
just hasn’t always been practical
#5 Hadoop: The Data OS 
• Hadoop has evolved 
• It’s not just about map reduce anymore 
• YARN has enabled a new generation of 
applications to run within Hadoop
YARN 
• Yet Another Resource Negotiator 
• YARN + HDFS 
– Shareable cluster resources 
– A shared distributed file system 
• What we get: 
– Real time engines 
– Distributed databases 
– ETL applications
And why does this matter? 
• Hadoop isn’t just batch anymore 
• Real time workloads 
• Interactive queries 
• Machine Learning 
• Machine to Machine communication
Some challenges 
• Data governance 
• Sophistication of tools 
– Ease of use 
– Resources 
• Culture
Some tips 
• Don’t throw away your data warehouse, 
unless it’s sucks.. 
• Don’t forget governance 
• Establish a Data Science Factory  produce 
business insights
The most important tip 
Polygot Persistence – “where any decent sized 
enterprise will have a variety of different data 
storage technologies for different kinds of data. 
There will still be large amounts of it managed in 
relational stores, but increasingly we'll be first asking 
how we want to manipulate the data and only then 
figuring out what technology is the best bet for it.” 
-- Martin Fowler
Thank You 
Elliott Cordo 
Chief Architect, Caserta Concepts 
elliott@casertaconcepts.com

Weitere ähnliche Inhalte

Was ist angesagt?

How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...Amr Awadallah
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfRob Winters
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftAmazon Web Services
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecturemark madsen
 
Building a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveBuilding a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveGeekNightHyderabad
 
From Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data WarehouseFrom Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data WarehouseOsama Hussein
 
Scalable data pipeline
Scalable data pipelineScalable data pipeline
Scalable data pipelineGreenM
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Lviv Startup Club
 
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...Databricks
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsInformatica
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
Big Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksBig Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksAmazon Web Services
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseRob Winters
 
Creating a Modern Data Architecture
Creating a Modern Data ArchitectureCreating a Modern Data Architecture
Creating a Modern Data ArchitectureZaloni
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaJeffrey T. Pollock
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data LakeMetroStar
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?David P. Moore
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data LakeRobert Chong
 

Was ist angesagt? (20)

How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the Bijenkorf
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecture
 
Building a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveBuilding a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's Perspective
 
From Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data WarehouseFrom Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data Warehouse
 
Scalable data pipeline
Scalable data pipelineScalable data pipeline
Scalable data pipeline
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
 
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Big Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksBig Data & Data Lakes Building Blocks
Big Data & Data Lakes Building Blocks
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data Warehouse
 
Creating a Modern Data Architecture
Creating a Modern Data ArchitectureCreating a Modern Data Architecture
Creating a Modern Data Architecture
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
2022 02 Integration Bootcamp
2022 02 Integration Bootcamp2022 02 Integration Bootcamp
2022 02 Integration Bootcamp
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data Lake
 

Ähnlich wie 5 Things that Make Hadoop a Game Changer

Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data WarehouseCaserta
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMichael Hiskey
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & HadoopBlackvard
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarKognitio
 
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopBig Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopCaserta
 

Ähnlich wie 5 Things that Make Hadoop a Game Changer (20)

Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data Warehouse
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
Big Data Boom
Big Data BoomBig Data Boom
Big Data Boom
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Big data rmoug
Big data rmougBig data rmoug
Big data rmoug
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
Retail & CPG
Retail & CPGRetail & CPG
Retail & CPG
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
 
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopBig Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 

Mehr von Caserta

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingCaserta
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Caserta
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017Caserta
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Caserta
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Caserta
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseCaserta
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Caserta
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?Caserta
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation Caserta
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for EveryoneCaserta
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure CloudCaserta
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the CloudCaserta
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data LakeCaserta
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by DatabricksCaserta
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkCaserta
 

Mehr von Caserta (20)

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven Marketing
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's Enterprise
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 

Kürzlich hochgeladen

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

5 Things that Make Hadoop a Game Changer

  • 1. 5 Things That Make Hadoop a Game Changer
  • 2. And about me • 15 years of data management • 10 years devoted to data engineering • I’m still a nerd, not an academic • Startups to the fortune 100 Elliott Cordo Chief Architect, Caserta Concepts elliott@casertaconcepts.com
  • 3. Caserta Concepts  Technology services company with focused expertise in:  Data Warehousing  Business Intelligence  Big Data Analytics Data is all we do.  Established in 2001:  Data Science & Analytics  Data on the Cloud  Data Interaction & Visualization  Industry recognized work force  Strategy, Assessments, Implementation  Writing, Education, Mentoring  Broad experience across industries:  Financial Services / Insurance / Services  eCommerce / Advertising / Higher Education / Healthcare
  • 4. Client Portfolio Finance. Healthcare & Insurance Retail/eCommerce & Manufacturing Education & Services
  • 6. Caserta Innovation Lab (CIL) • Internal laboratory established to test & develop solution concepts and ideas • Used to accelerate client projects • Examples: • Search (SOLR) based BI • Big Data Governance Toolkit • Text Analytics on Social Network Data • Continuous Integration / End-to-end streaming • Recommendation Engine Optimization • Others (confidential) • CIL is hosted on
  • 7. What is “a Hadoop”? • A distributed file system – Like any file system – any data type can be stored • A distributed processing framework – Processing is spread across a cluster of machines
  • 8. Born in the server rooms of tech giants • They were the first guys to deal with this BIG DATA stuff – GB turned to TB turned to PB – Technologies weren’t scalable enough – New workloads – New data types • Now many of us have to deal with it too!
  • 9. What is “a Big Data”? 4V’s Lets keep it simple: • Data is big – measured in multiple TB or PB • Data isn’t very relational: – Json – Log files – Raw text – Various binary formats
  • 10. Some misconceptions about Hadoop • Data Lake – just dump your data in there and everything will be fine. • You have to be a java programmer • You don’t have to be a programmer • SQL is no good, actually SQL Is really good! • Hadoop is the solution for everything
  • 11. So what makes Hadoop a game changer?
  • 12. #1 Low-Cost computing and data management • Before Hadoop we only had a few tools in our quiver for dealing with massive datasets. • Scaling up to ginormous servers and storage systems (via SAN’s and such) • MPP Databases (Massively Parallel Processing)
  • 13. Now you know longer need $$$ to be competitive • Store anything, for as long as needed! • Run massive query and computing workloads • Leverage commodity servers or cloud • Enablement for startups and innovators
  • 14. And about those MPP’s • I’m an MPP proponent and often use them in conjunction with large Hadoop implementations – storage for “hot data” • However they are EXPENSIVE! – Orders of magnitude more expensive than Hadoop – Often use proprietary hardware • ..and they are limited: – Not generally as scalable as Hadoop – Only good at structured data
  • 15. #2 Promote agile data culture Enable your data analysts and data scientists to thrive • Onboard data fast • Blend with governed data • Produce business insights
  • 16. The Traditional Way ENTERPRISE DATA WAREHOUSE Data Governance Data Integration Modeling Cool new data • Data warehouse is central/protected/highly governed • Rigorous SDLC and governance • A lot of work just to validate this data is valuable?
  • 17. The Data Lake  Not all data in Hadoop will be fully governed  Only a subset of the data in Hadoop will be fully governed.  We refer to this as the Trusted layer of the Big Data Warehouse.  Limited barriers to ingest Big Data Warehouse Data Science Workspace Staging machine learning, blending with Metadata  Catalog ILM  who has access, how long do we “manage it” Data Quality and Monitoring  Monitoring of Landing Area – Source Data in “Full Fidelity” completeness of data Metadata  Catalog ILM  who has access, how long do we “manage it” Agile business insight through data-munging, external data, development of facts Data is ready to be turned into information: organized, well defined, complete. Raw machine data collection, collect everything Metadata  Catalog ILM  who has access, how long to “manage it” Data Quality and Monitoring  Monitoring of completeness of data User community arbitrary queries and reporting Fully Data Governed
  • 18. The Data Science Factory • Data scientists have access to ALL layers of the data lake • New data is easy to onboard  only requirement basic metadata and ILM • The products of data science may evolve into new data products, including new data warehouse events (facts) and reference data (dimensions) BDW Data Science Workspace Staging Landing Area Cool new data New Insights
  • 19. #3 Hadoop + Cloud = elasticity • Pay for the computing you need • Scale elastically and quickly
  • 20. Unique opportunities • Cloud storage is cheap and dependable – Archiving services are even cheaper • Pay only for the data processing you need • Test new processes quickly and inexpensively • Optimize operations and scale rapidly
  • 21. Ephemeral data processing Why have severs sitting around all day doing nothing! • Build cluster in one command! • Bootstrap  install anything special you need: – Spark, Impala, Mahout • Run the jobs you need to • Tear it down aws emr create-cluster --applications Name=Pig --ami-version 3.2.1 -- instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge --steps Type=PIG,Name="Pig Program",ActionOnFailure=CONTINUE,Args=[- f,s3://elasticmapreduce/samples/pig-apache/do-reports2.pig,- p,INPUT=s3://elasticmapreduce/samples/pig-apache/input,- p,OUTPUT=s3://mybucket/pig-apache/output]
  • 22. #4 No rigid data structures • As a distributed file system Hadoop can store anything. • Structure isn’t required, unless you want it… • Remove barriers for new data ingest • Keep data in it’s full fidelity
  • 23. Embrace semi-structured data • Logs are ubiquitous • Json is pretty darn awesome • VERY difficult to model in the relational world – Extreme denomination/brittle data structures – Alternative: fidelity loss • Plenty of tools for dealing with these formats “Late Bind” – Hive Serdes – PIG
  • 24. Embrace unstructured data • Raw text • PDF • Various binary data formats • Custom engineered processes, machine learning
  • 25. Why is it important that Hadoop let’s you store as-is? • Data which is not important today could be tomorrow • What if your processes are flawed, and you need to recast? • Versioning as the data evolves • This is a traditional data management best practice – it just hasn’t always been practical
  • 26. #5 Hadoop: The Data OS • Hadoop has evolved • It’s not just about map reduce anymore • YARN has enabled a new generation of applications to run within Hadoop
  • 27. YARN • Yet Another Resource Negotiator • YARN + HDFS – Shareable cluster resources – A shared distributed file system • What we get: – Real time engines – Distributed databases – ETL applications
  • 28. And why does this matter? • Hadoop isn’t just batch anymore • Real time workloads • Interactive queries • Machine Learning • Machine to Machine communication
  • 29. Some challenges • Data governance • Sophistication of tools – Ease of use – Resources • Culture
  • 30. Some tips • Don’t throw away your data warehouse, unless it’s sucks.. • Don’t forget governance • Establish a Data Science Factory  produce business insights
  • 31. The most important tip Polygot Persistence – “where any decent sized enterprise will have a variety of different data storage technologies for different kinds of data. There will still be large amounts of it managed in relational stores, but increasingly we'll be first asking how we want to manipulate the data and only then figuring out what technology is the best bet for it.” -- Martin Fowler
  • 32. Thank You Elliott Cordo Chief Architect, Caserta Concepts elliott@casertaconcepts.com