SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
Modern Data Warehouse
Stephen Alex
BI & Big Data Architect
AGENDA
 History and Milestones
 Traditional Data Warehouse
 Key trends breaking the traditional data warehouse
 Modern Data Warehouse
 Multiple parallel processing (MPP) architecture
 Hadoop Ecosystem
 Technical Innovation on Hadoop
 Big Data Value Assessment
2Rolta AdvizeX Confidential & Proprietary 9/11/2016
History and Milestones
 1970’s: Relational Model Invented
 1984: DB2 released, RDBMS declared mainstream
 1990: RDBMS takes over
3Rolta AdvizeX Confidential & Proprietary 9/11/2016
The Traditional Data Warehouse
 Central repository for all internal data in a
company.
 Overall relational schema.
 The predictable data structure and quality
optimized processing and reporting.
 Data is in disk block formatting
 Fundamental operation is read a row
 Indexing via B-trees
 Dynamic row-level locking
 Data transfer usually EOD
4
Key Trends Breaking The Traditional Data Warehouse
5
Key Related Business and IT Trends
 Emerging Technologies are disruptive by nature and play a
key role in driving digital business and the related business
trends.
 Business Ecosystems enable each of the business trends,
and organizations are aggressively searching for ways to
leverage the role they play in the business ecosystem
 Business Moments provide opportunities to capture value
by setting in motion a series of events and actions involving a
network of people, businesses and things that spans or
crosses multiple industries and business ecosystems.
 Digital Economics seeks to harvest value from across the
business ecosystem by identifying business moments of
opportunity and exploiting the economics of connections.
This early-stage trend will have increasing importance as
business models evolve to leverage algorithmic business.
 Algorithmic Business propels organizations to leverage
business algorithms to drive value in the business
ecosystem. In this early-stage trend, we are starting to see
organizations transforming data with algorithms to drive
intelligent actions, particularly with the IoT.
6
The Risks of Bottlenecks in Data Movement
7
Hadoop Changes the Game
 Storage and Compute on One Platform
8
Modern Data Warehouse
9
 Incorporates Hadoop, traditional data
warehouses, and other data stores.
 Includes multiple repositories may
reside in different locations.
 Includes Data from cloud, mobile
devices, sensors, and the Internet of
Things
 Includes structured/semi-
structured/unstructured, raw data
 Inexpensive commodity hardware in
cluster mode
Multiple parallel processing (MPP) architecture
 Multiple parallel processing (MPP)
architecture enables extremely powerful
distributed computing and scale
 Resources can be added for a near linear
scale-out to the largest data warehousing
projects.
 MPP architecture uses a “shared-nothing”
There are multiple physical nodes, each
running its own instance. This results in
performance many times faster than
traditional architectures.
10
Apache Hadoop Ecosystem
 Hadoop ecosystem
components as part of
Apache Software
Foundation projects.
 The components are
categorized into file
system and data store,
serialization, job
execution, and others as
shown on the image.
11
Hadoop / BDD Ecosystem
Technology Purpose
Hadoop Distributed
File System
Distributed file system that provides high-throughput access to application data. Data is
split into blocks and distributed across multiple nodes in the cluster
Hadoop YARN Framework for job scheduling/monitoring and cluster resource management
Hive Facilitates ad hoc queries over data stored in HDFS. Uses HiveQL which is a SQL-like
language. Provides a relational view of data stored in HDFS.
HCatalog Hcatalog (aka Hive Metastore) provides a table and storage management layer for Hadoop
Spark Spark Powers a stack of high-level tools including Spark SQL, MLlib for machine learning,
GraphX, and Spark Streaming
Pig Pig is a high level platform for creating MapReduce programs. BDD uses Pig to manipulate
data prior to ingesting via data processing.
Technology Purpose
Oozie Oozie is the workflow scheduler system to manage Apache Hadoop jobs. BDD
uses Oozie for workflow management (sampling, profiling, enrichment).
Sqoop Tool for efficiently transferring bulk data between Hadoop and structured
datastores such a relational database
Flume Tool for efficiently collecting, aggregating and moving large amounts of streaming
data into the HDFS
ZooKeeper Zookeeper is a centralized service for maintaining configuration information,
naming, providing distributed synchronization, and providing group services
Hue Hue is a set of web applications that enable you to interact with CDH cluster.
Hadoop / BDD Ecosystem
Top Three Hadoop Vendors
14
Oracle BDD Technical Innovation on Hadoop
15
Key Features and Functionality:
Find
• Access a rich, interactive catalog of all data in Hadoop
• Use familiar search and guided navigation to find information quickly
• See data set summaries, user annotation and recommendations
• Provision personal and enterprise data to Hadoop via self-service
Explore
• Visualize all attributes by type
• Sort attributes by information potential
• Assess attribute statistics, data quality and outliers
• Use a scratch pad to uncover correlations between attributes
Transform
• Get the data ready for analytics via Intuitive, user driven data wrangling
• Leverage an extensive library of data transformations and enrichments
• Preview results, undo, commit and replay transforms
• Test on sample data in memory then apply to full data set in Hadoop
Discover
• Join and blend data for deeper perspectives
• Compose project pages via drag and drop
• Use powerful search and guided navigation to ask questions
• See new patterns in rich, interactive data visualizations
Share
• Share projects, bookmarks and snapshots with others
• Build galleries and tell Big Data stories
• Collaborate and iterate as a team
• Publish blended data to HDFS for leverage in other tools
Components of Big Data Discovery
16
Big Data Value Assessment
17
Descriptive analytics looks at past performance and understands that
performance by mining historical data to look for the reasons behind past
success or failure and that is the traditional BI work.
Predictive analytics answers the question what will happen. This is when
historical performance data is combined with rules, algorithms, and external
data to determine the probable future outcome of an event or the likelihood
of a situation occurring.
Prescriptive analytics not only anticipates what will happen and when it will
happen, but also why it will happen.
Basic Analytics
Advanced Analytics
Prescriptive
Predictive
Descriptive
Thank You!!!
Stephen Alex
BI & Big Data Architect
(732) 485-0011(m)
9/11/201618
Rolta AdvizeX Proprietary and Confidential

Weitere ähnliche Inhalte

Was ist angesagt?

5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data LakeMetroStar
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data WarehousingAmdocs
 
Hybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsHybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsDavid Portnoy
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data SolutionJames Serra
 
The Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameThe Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameCloudera, Inc.
 
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...Eric Javier Espino Man
 
Data Vault Vs Data Lake
Data Vault Vs Data LakeData Vault Vs Data Lake
Data Vault Vs Data LakeCalum Miller
 
From Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data WarehouseFrom Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data WarehouseBui Ha
 
2012 10 bigdata_overview
2012 10 bigdata_overview2012 10 bigdata_overview
2012 10 bigdata_overviewjdijcks
 
Building a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveBuilding a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveGeekNightHyderabad
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Data Con LA
 
Scalable data pipeline
Scalable data pipelineScalable data pipeline
Scalable data pipelineGreenM
 
Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher   Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher Tamir Dresher
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Lviv Startup Club
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesData Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesDenodo
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESBData Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESBDenodo
 

Was ist angesagt? (20)

Data Lake
Data LakeData Lake
Data Lake
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
 
Hybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsHybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop Implementations
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
The Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameThe Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the Same
 
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
 
Data Vault Vs Data Lake
Data Vault Vs Data LakeData Vault Vs Data Lake
Data Vault Vs Data Lake
 
From Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data WarehouseFrom Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data Warehouse
 
2012 10 bigdata_overview
2012 10 bigdata_overview2012 10 bigdata_overview
2012 10 bigdata_overview
 
Building a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveBuilding a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's Perspective
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
 
Scalable data pipeline
Scalable data pipelineScalable data pipeline
Scalable data pipeline
 
Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher   Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesData Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESBData Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
 

Andere mochten auch

A First Look at San Francisco’s New ETL Job Platform
A First Look at San Francisco’s New ETL Job PlatformA First Look at San Francisco’s New ETL Job Platform
A First Look at San Francisco’s New ETL Job PlatformSafe Software
 
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Andreas Buckenhofer
 
Data warehouse design
Data warehouse designData warehouse design
Data warehouse designines beltaief
 
Supporting Data Services Marketplace using Data Virtualization
Supporting Data Services Marketplace using Data VirtualizationSupporting Data Services Marketplace using Data Virtualization
Supporting Data Services Marketplace using Data VirtualizationDenodo
 
Introduction to ETL process
Introduction to ETL process Introduction to ETL process
Introduction to ETL process Omid Vahdaty
 
Automate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayAutomate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayTorana, Inc.
 
Data Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional ModelingData Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional ModelingDunn Solutions Group
 
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsOperating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsDataWorks Summit/Hadoop Summit
 
Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Andreas Buckenhofer
 
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...Denodo
 
Designing and implementing_an_etl_framework
Designing and implementing_an_etl_frameworkDesigning and implementing_an_etl_framework
Designing and implementing_an_etl_frameworkBharat Vadlamudi
 
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Andreas Buckenhofer
 
CDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
CDC und Data Vault für den Aufbau eines DWH in der AutomobilindustrieCDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
CDC und Data Vault für den Aufbau eines DWH in der AutomobilindustrieAndreas Buckenhofer
 
Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Andreas Buckenhofer
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep LearningPoo Kuan Hoong
 
Data Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OneData Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OnePanchaleswar Nayak
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesIvo Andreev
 

Andere mochten auch (19)

A First Look at San Francisco’s New ETL Job Platform
A First Look at San Francisco’s New ETL Job PlatformA First Look at San Francisco’s New ETL Job Platform
A First Look at San Francisco’s New ETL Job Platform
 
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
 
Data warehouse design
Data warehouse designData warehouse design
Data warehouse design
 
Supporting Data Services Marketplace using Data Virtualization
Supporting Data Services Marketplace using Data VirtualizationSupporting Data Services Marketplace using Data Virtualization
Supporting Data Services Marketplace using Data Virtualization
 
Introduction to ETL process
Introduction to ETL process Introduction to ETL process
Introduction to ETL process
 
Automate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayAutomate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile way
 
Data Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional ModelingData Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional Modeling
 
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsOperating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and Improvements
 
Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
 
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
 
Designing and implementing_an_etl_framework
Designing and implementing_an_etl_frameworkDesigning and implementing_an_etl_framework
Designing and implementing_an_etl_framework
 
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
 
CDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
CDC und Data Vault für den Aufbau eines DWH in der AutomobilindustrieCDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
CDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
 
Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Data Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OneData Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_One
 
Data Warehouse-Final
Data Warehouse-FinalData Warehouse-Final
Data Warehouse-Final
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 

Ähnlich wie Modern data warehouse

Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overviewvhrocca
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsCognizant
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paperSupratim Ray
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoptionfaizrashid1995
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkAgnihotriGhosh2
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsJane Roberts
 
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)GeeksLab Odessa
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introductionsaisreealekhya
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeSysfore Technologies
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystemnallagangus
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelEditor IJCATR
 

Ähnlich wie Modern data warehouse (20)

Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoption
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and spark
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
paper
paperpaper
paper
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
View on big data technologies
View on big data technologiesView on big data technologies
View on big data technologies
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
 

Kürzlich hochgeladen

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Kürzlich hochgeladen (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Modern data warehouse

  • 1. Modern Data Warehouse Stephen Alex BI & Big Data Architect
  • 2. AGENDA  History and Milestones  Traditional Data Warehouse  Key trends breaking the traditional data warehouse  Modern Data Warehouse  Multiple parallel processing (MPP) architecture  Hadoop Ecosystem  Technical Innovation on Hadoop  Big Data Value Assessment 2Rolta AdvizeX Confidential & Proprietary 9/11/2016
  • 3. History and Milestones  1970’s: Relational Model Invented  1984: DB2 released, RDBMS declared mainstream  1990: RDBMS takes over 3Rolta AdvizeX Confidential & Proprietary 9/11/2016
  • 4. The Traditional Data Warehouse  Central repository for all internal data in a company.  Overall relational schema.  The predictable data structure and quality optimized processing and reporting.  Data is in disk block formatting  Fundamental operation is read a row  Indexing via B-trees  Dynamic row-level locking  Data transfer usually EOD 4
  • 5. Key Trends Breaking The Traditional Data Warehouse 5
  • 6. Key Related Business and IT Trends  Emerging Technologies are disruptive by nature and play a key role in driving digital business and the related business trends.  Business Ecosystems enable each of the business trends, and organizations are aggressively searching for ways to leverage the role they play in the business ecosystem  Business Moments provide opportunities to capture value by setting in motion a series of events and actions involving a network of people, businesses and things that spans or crosses multiple industries and business ecosystems.  Digital Economics seeks to harvest value from across the business ecosystem by identifying business moments of opportunity and exploiting the economics of connections. This early-stage trend will have increasing importance as business models evolve to leverage algorithmic business.  Algorithmic Business propels organizations to leverage business algorithms to drive value in the business ecosystem. In this early-stage trend, we are starting to see organizations transforming data with algorithms to drive intelligent actions, particularly with the IoT. 6
  • 7. The Risks of Bottlenecks in Data Movement 7
  • 8. Hadoop Changes the Game  Storage and Compute on One Platform 8
  • 9. Modern Data Warehouse 9  Incorporates Hadoop, traditional data warehouses, and other data stores.  Includes multiple repositories may reside in different locations.  Includes Data from cloud, mobile devices, sensors, and the Internet of Things  Includes structured/semi- structured/unstructured, raw data  Inexpensive commodity hardware in cluster mode
  • 10. Multiple parallel processing (MPP) architecture  Multiple parallel processing (MPP) architecture enables extremely powerful distributed computing and scale  Resources can be added for a near linear scale-out to the largest data warehousing projects.  MPP architecture uses a “shared-nothing” There are multiple physical nodes, each running its own instance. This results in performance many times faster than traditional architectures. 10
  • 11. Apache Hadoop Ecosystem  Hadoop ecosystem components as part of Apache Software Foundation projects.  The components are categorized into file system and data store, serialization, job execution, and others as shown on the image. 11
  • 12. Hadoop / BDD Ecosystem Technology Purpose Hadoop Distributed File System Distributed file system that provides high-throughput access to application data. Data is split into blocks and distributed across multiple nodes in the cluster Hadoop YARN Framework for job scheduling/monitoring and cluster resource management Hive Facilitates ad hoc queries over data stored in HDFS. Uses HiveQL which is a SQL-like language. Provides a relational view of data stored in HDFS. HCatalog Hcatalog (aka Hive Metastore) provides a table and storage management layer for Hadoop Spark Spark Powers a stack of high-level tools including Spark SQL, MLlib for machine learning, GraphX, and Spark Streaming Pig Pig is a high level platform for creating MapReduce programs. BDD uses Pig to manipulate data prior to ingesting via data processing.
  • 13. Technology Purpose Oozie Oozie is the workflow scheduler system to manage Apache Hadoop jobs. BDD uses Oozie for workflow management (sampling, profiling, enrichment). Sqoop Tool for efficiently transferring bulk data between Hadoop and structured datastores such a relational database Flume Tool for efficiently collecting, aggregating and moving large amounts of streaming data into the HDFS ZooKeeper Zookeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services Hue Hue is a set of web applications that enable you to interact with CDH cluster. Hadoop / BDD Ecosystem
  • 14. Top Three Hadoop Vendors 14
  • 15. Oracle BDD Technical Innovation on Hadoop 15 Key Features and Functionality: Find • Access a rich, interactive catalog of all data in Hadoop • Use familiar search and guided navigation to find information quickly • See data set summaries, user annotation and recommendations • Provision personal and enterprise data to Hadoop via self-service Explore • Visualize all attributes by type • Sort attributes by information potential • Assess attribute statistics, data quality and outliers • Use a scratch pad to uncover correlations between attributes Transform • Get the data ready for analytics via Intuitive, user driven data wrangling • Leverage an extensive library of data transformations and enrichments • Preview results, undo, commit and replay transforms • Test on sample data in memory then apply to full data set in Hadoop Discover • Join and blend data for deeper perspectives • Compose project pages via drag and drop • Use powerful search and guided navigation to ask questions • See new patterns in rich, interactive data visualizations Share • Share projects, bookmarks and snapshots with others • Build galleries and tell Big Data stories • Collaborate and iterate as a team • Publish blended data to HDFS for leverage in other tools
  • 16. Components of Big Data Discovery 16
  • 17. Big Data Value Assessment 17 Descriptive analytics looks at past performance and understands that performance by mining historical data to look for the reasons behind past success or failure and that is the traditional BI work. Predictive analytics answers the question what will happen. This is when historical performance data is combined with rules, algorithms, and external data to determine the probable future outcome of an event or the likelihood of a situation occurring. Prescriptive analytics not only anticipates what will happen and when it will happen, but also why it will happen. Basic Analytics Advanced Analytics Prescriptive Predictive Descriptive
  • 18. Thank You!!! Stephen Alex BI & Big Data Architect (732) 485-0011(m) 9/11/201618 Rolta AdvizeX Proprietary and Confidential