SlideShare ist ein Scribd-Unternehmen logo
1 von 32
AGILE DATA
WAREHOUSIN
G SOLUTION
Data warehousing + Reporting Solution
Presented by: Sneha Challa
Date: 7/14/2016
Location: San Ramon, CA
Standard approach to DW
Agile Data Warehousing for the
enterprise- Ralph Hughes
Traditional EDW Model
• Brittle to changing requirements
Source: Agile Datawarehousing for the
enterprise
Two traditional
approaches
 Traditional Integration layer – model it in 3NF and
upwards. ETL loads into IL before transforming it to
populate the star schema of the presentation layer.
 Conformed Dimensional data warehouse skips
integration layer to load company’s data directly into star
schemas
Cons:
 Both these approaches lead to DW that are very difficult
to modify once the data is loaded.
 Brittle in the face of changing requirements.
 Costly redesign and data conversion
Agile Data Engineering
 Do not need to have the entire data design model
upfront
 Development that adapts to changing business
requirements.
 Do not need to re engineer the existing schema when
new entities and relationships arise.
 Simple reusable ETL modules.
Agile Data Engineering
Source:
https://www.youtube.com/watch?v=3QO
SOeN8vcY
Example
7 Tables
Source:
https://www.youtube.com/watch?v=3QO
SOeN8vcY
HNF
Source: Agile Datawarehousing for the
enterprise
HNF model
18 Tables. We have hyper
normalized the table.
Source:
https://www.youtube.com/watch?v=3QO
SOeN8vcY
HNF
• Parameter driven data transform using ETL script. One
ETL module for all business key modules. Yellow ETL
module (for linking tables). Take all other attributes from
the source and send it to the target ETL tables.
• Easily adapt to BR even after loading billions of records.
Data model
Source:
https://www.youtube.com/watch?v=3QO
SOeN8vcY
HNF
• Caveat : Data retrieval gets complex. SQL to get data
can get very complex with outer joins and correlated sub
queries. But does it matter so much? Remember HNF is
used for Integration layer not that much presentation and
semantic layer.
• Storing the data into the integration layer from the source
systems using only 3 re usable ETL modules.
• Build DW a slice at a time and adapt to new business
requirements.
• http://www.anchormodeling.com/
Hyper Generalized Form
• Computer generate warehouse presentation and
semantic layers. Labor saving approach.
• Logical and physical data model eliminated
• Can operate at the business level.
• Builds on the notion of special purpose table.
• Need a acquire a automated Data ware house tool that
can generate entire data warehouse infrastructure.
• Entire dataset represented as 6 tables.
• Generate EDW and ETL Schema for all layers.
Step 1
Source:
https://www.youtube.com/watch?v=aNt
UoVkeq_Q
Step 2
Source:
https://www.youtube.com/watch?v=aNt
UoVkeq_Q
Step 3
Source:
https://www.youtube.com/watch?v=aNt
UoVkeq_Q
Step 4
(Add temporality)
Source:
https://www.youtube.com/watch?v=aNt
UoVkeq_Q
Step 5
Source:
https://www.youtube.com/watch?v=aNt
UoVkeq_Q
Step 6
source:
https://www.youtube.com/watch?v=aNt
UoVkeq_Q
Big Data Technologies
 Power an iterative discovery and engineering process.
 Read and transform massive amount of data on cheap
commodity software using Massive parallel processing.
 Schema on Read. Don’t need to impose structure of every
piece of information gathered.
 Hadoop with more SQL like features or a traditional EDW with
big data packages. Which is more useful?
 Complex event analysis.
 Real time analytics of high volume data streams
 Complex event processing
 Data Mining Software
 Text analytics
Big Data Technologies
Products:
 Hadoop and HDFS
 NoSQL databases
 Big Data extensions to RDBMS
Apache Hadoop S/W
components
Source: Agile Datawarehousing for the
enterprise by Ralph Hughes
Hadoop
Reasons to use Hadoop:
 Building a data ware house for the future. Gear up your skills for
Hadoop and Big Data as the data size grows larger. Major distributions
like Horton works, Cloudera, MapR have enterprise Hub editions which
can be deployed.
 There is a complaint that not suitable for quick interactive querying. But
then, Cloudera’s Impala and Horton works Stinger initiative have made
interactive querying much faster.
 Horton works platform provides indexing and search features using
Apache Solr which can make search and querying faster.
 Horton works came up with something like Apache Zeppelin which
brings data visualization and collaboration features to Hadoop and
Spark.
 Provides Apache Sqoop to load data from RDBMS.
 Pig and Map Reduce for ETL
 Weekly, hourly and monthly work flow schedules
 Apache Flume to load web logs data.
Data Virtualization
NO SQL Databases.
Advantages:
 Schema less read
 Auto Sharding
 Cloud computing (AWS)
 Replication
 No separate application or expensive add ons
 Integrated Caching
 In memory caching for high throughput and low latency
 Open Source
 Cassandra, Redshift, Hbase
 Document based
 Graph Stores – Neo4J and Giraph
 Key-Value Stores – Riak and Berkeley DB, Redis. Complex info as
BLOBs in value columns
 Column wide stores – Cassandra and Hbase
Why Implement NOSQL?
 Big Data getting bigger. New sources of data emerge
eventually.
 More users are going online.
 Open Source- downloaded, implemented and scaled at
little cost.
 Viable alternative to expensive proprietary software
 Increase speed and agility of development.
 When requirements change data model also changes.
5 considerations to
evaluate NoSQL
 Data Model
 Document model – MongoDB, CouchDB
 Natural mapping of document object model to OOP
 Query on any field
 Graph Databases- traversing relationships is the key
 Social networks and supply chain
 Columnar and wide column data bases
 Query Model
 Consistency Model
 APIs
 Commercial Support & Community strength
DWaas
Amazon redshift
 Cost effective: $1000 per terabyte per year
 Columnar storage – fast access, parallelize queries
 MPP DW architecture
 Cheap, simple, secure and compatible with a SQL interface
 Automate provisioning, configuring and monitoring of a cloud
data warehouse.
 Integrations to Amazon S3, Amazon DynamoDB, Amazon
Elastic Map reduce, Amazon Kinesis.
 Security is built in.
 Amazon web services management console.
 Network Isolation using Virtual private cloud.
Presentation/Visualization
 Tableau
o Easy to use drag and drop interface
o No Code
o Connect to Hadoop, Cloud, SQL databases
o Offers free training
o Trend analysis, regression and correlation analysis
o In memory data analysis
o Data blending
o Clutter Free GUI
 Qlik View
o Faster in memory computation
Analytics and forecasting
• R, Python, Apache Spark – for predictive modeling and
forecasting
• Connected the data ware house with R, Python and
Spark.
• R libraries - R part, Random Forests, ROCR, mBoost
• Python – Scikit Learn, Numpy, Pandas, Sci-py
• Spark – ML and MLlib
Agile data warehousing
Agile data warehousing

Weitere ähnliche Inhalte

Was ist angesagt?

Attunity Solutions for Teradata
Attunity Solutions for TeradataAttunity Solutions for Teradata
Attunity Solutions for TeradataAttunity
 
Tarun poladi resume
Tarun poladi resumeTarun poladi resume
Tarun poladi resumeTarun P
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWSGary Stafford
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureLuan Moreno Medeiros Maciel
 
AWS Webcast - Tableau Big Data Solution Showcase
AWS Webcast - Tableau Big Data Solution ShowcaseAWS Webcast - Tableau Big Data Solution Showcase
AWS Webcast - Tableau Big Data Solution ShowcaseAmazon Web Services
 
Optimize Data for the Logical Data Warehouse
Optimize Data for the Logical Data WarehouseOptimize Data for the Logical Data Warehouse
Optimize Data for the Logical Data WarehouseAttunity
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataUsing Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataDataWorks Summit/Hadoop Summit
 
Spark SQL | Apache Spark
Spark SQL | Apache SparkSpark SQL | Apache Spark
Spark SQL | Apache SparkEdureka!
 
Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story Roman Chukh
 
When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?DataWorks Summit
 
Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mark Kromer
 
An Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaAn Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaSpark Summit
 
Spark For Faster Batch Processing
Spark For Faster Batch ProcessingSpark For Faster Batch Processing
Spark For Faster Batch ProcessingEdureka!
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnDatabricks
 
Global AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksGlobal AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksAlberto Diaz Martin
 
Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala Edureka!
 

Was ist angesagt? (20)

Attunity Solutions for Teradata
Attunity Solutions for TeradataAttunity Solutions for Teradata
Attunity Solutions for Teradata
 
Tarun poladi resume
Tarun poladi resumeTarun poladi resume
Tarun poladi resume
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
AWS Webcast - Tableau Big Data Solution Showcase
AWS Webcast - Tableau Big Data Solution ShowcaseAWS Webcast - Tableau Big Data Solution Showcase
AWS Webcast - Tableau Big Data Solution Showcase
 
Optimize Data for the Logical Data Warehouse
Optimize Data for the Logical Data WarehouseOptimize Data for the Logical Data Warehouse
Optimize Data for the Logical Data Warehouse
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataUsing Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch data
 
Spark SQL | Apache Spark
Spark SQL | Apache SparkSpark SQL | Apache Spark
Spark SQL | Apache Spark
 
Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story
 
When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?
 
MLeap: Release Spark ML Pipelines
MLeap: Release Spark ML PipelinesMLeap: Release Spark ML Pipelines
MLeap: Release Spark ML Pipelines
 
Google App Engine
Google App EngineGoogle App Engine
Google App Engine
 
Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021
 
High-Scale Entity Resolution in Hadoop
High-Scale Entity Resolution in HadoopHigh-Scale Entity Resolution in Hadoop
High-Scale Entity Resolution in Hadoop
 
Tailored for Spark
Tailored for SparkTailored for Spark
Tailored for Spark
 
An Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaAn Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal Malohlava
 
Spark For Faster Batch Processing
Spark For Faster Batch ProcessingSpark For Faster Batch Processing
Spark For Faster Batch Processing
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
 
Global AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksGlobal AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure Databricks
 
Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala
 

Ähnlich wie Agile data warehousing

Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudDataWorks Summit
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleDatabricks
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...DataWorks Summit
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overviewRohit Jain
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014cdmaxime
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSjavier ramirez
 
Azure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdfAzure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdfpbonillo1
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?Venu Anuganti
 
Rama prasad owk etl hadoop_developer
Rama prasad owk etl hadoop_developerRama prasad owk etl hadoop_developer
Rama prasad owk etl hadoop_developerramaprasad owk
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 

Ähnlich wie Agile data warehousing (20)

Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overview
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWS
 
Azure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdfAzure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdf
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?
 
Rama prasad owk etl hadoop_developer
Rama prasad owk etl hadoop_developerRama prasad owk etl hadoop_developer
Rama prasad owk etl hadoop_developer
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 

Mehr von Sneha Challa

Datawarehousing_Final_ProjectPresentation
Datawarehousing_Final_ProjectPresentationDatawarehousing_Final_ProjectPresentation
Datawarehousing_Final_ProjectPresentationSneha Challa
 
Apache spark sneha challa- google pittsburgh-aug 25th
Apache spark  sneha challa- google pittsburgh-aug 25thApache spark  sneha challa- google pittsburgh-aug 25th
Apache spark sneha challa- google pittsburgh-aug 25thSneha Challa
 
Best practices in Conveying complexity
Best practices in Conveying complexityBest practices in Conveying complexity
Best practices in Conveying complexitySneha Challa
 
Why you should practice Sudarshan Kriya or S.K.Y technique?
Why you should practice Sudarshan Kriya or S.K.Y technique?Why you should practice Sudarshan Kriya or S.K.Y technique?
Why you should practice Sudarshan Kriya or S.K.Y technique?Sneha Challa
 
Why you should practice Sudarshan Kriya or S.K.Y technique?
Why you should practice Sudarshan Kriya or S.K.Y technique?Why you should practice Sudarshan Kriya or S.K.Y technique?
Why you should practice Sudarshan Kriya or S.K.Y technique?Sneha Challa
 
How to perform a Open Heart Surgery
How to perform a Open Heart SurgeryHow to perform a Open Heart Surgery
How to perform a Open Heart SurgerySneha Challa
 

Mehr von Sneha Challa (6)

Datawarehousing_Final_ProjectPresentation
Datawarehousing_Final_ProjectPresentationDatawarehousing_Final_ProjectPresentation
Datawarehousing_Final_ProjectPresentation
 
Apache spark sneha challa- google pittsburgh-aug 25th
Apache spark  sneha challa- google pittsburgh-aug 25thApache spark  sneha challa- google pittsburgh-aug 25th
Apache spark sneha challa- google pittsburgh-aug 25th
 
Best practices in Conveying complexity
Best practices in Conveying complexityBest practices in Conveying complexity
Best practices in Conveying complexity
 
Why you should practice Sudarshan Kriya or S.K.Y technique?
Why you should practice Sudarshan Kriya or S.K.Y technique?Why you should practice Sudarshan Kriya or S.K.Y technique?
Why you should practice Sudarshan Kriya or S.K.Y technique?
 
Why you should practice Sudarshan Kriya or S.K.Y technique?
Why you should practice Sudarshan Kriya or S.K.Y technique?Why you should practice Sudarshan Kriya or S.K.Y technique?
Why you should practice Sudarshan Kriya or S.K.Y technique?
 
How to perform a Open Heart Surgery
How to perform a Open Heart SurgeryHow to perform a Open Heart Surgery
How to perform a Open Heart Surgery
 

Kürzlich hochgeladen

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 

Kürzlich hochgeladen (20)

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 

Agile data warehousing

  • 1. AGILE DATA WAREHOUSIN G SOLUTION Data warehousing + Reporting Solution Presented by: Sneha Challa Date: 7/14/2016 Location: San Ramon, CA
  • 2. Standard approach to DW Agile Data Warehousing for the enterprise- Ralph Hughes
  • 3. Traditional EDW Model • Brittle to changing requirements Source: Agile Datawarehousing for the enterprise
  • 4. Two traditional approaches  Traditional Integration layer – model it in 3NF and upwards. ETL loads into IL before transforming it to populate the star schema of the presentation layer.  Conformed Dimensional data warehouse skips integration layer to load company’s data directly into star schemas Cons:  Both these approaches lead to DW that are very difficult to modify once the data is loaded.  Brittle in the face of changing requirements.  Costly redesign and data conversion
  • 5. Agile Data Engineering  Do not need to have the entire data design model upfront  Development that adapts to changing business requirements.  Do not need to re engineer the existing schema when new entities and relationships arise.  Simple reusable ETL modules.
  • 9. HNF model 18 Tables. We have hyper normalized the table. Source: https://www.youtube.com/watch?v=3QO SOeN8vcY
  • 10. HNF • Parameter driven data transform using ETL script. One ETL module for all business key modules. Yellow ETL module (for linking tables). Take all other attributes from the source and send it to the target ETL tables. • Easily adapt to BR even after loading billions of records.
  • 12. HNF • Caveat : Data retrieval gets complex. SQL to get data can get very complex with outer joins and correlated sub queries. But does it matter so much? Remember HNF is used for Integration layer not that much presentation and semantic layer. • Storing the data into the integration layer from the source systems using only 3 re usable ETL modules. • Build DW a slice at a time and adapt to new business requirements. • http://www.anchormodeling.com/
  • 13. Hyper Generalized Form • Computer generate warehouse presentation and semantic layers. Labor saving approach. • Logical and physical data model eliminated • Can operate at the business level. • Builds on the notion of special purpose table. • Need a acquire a automated Data ware house tool that can generate entire data warehouse infrastructure. • Entire dataset represented as 6 tables. • Generate EDW and ETL Schema for all layers.
  • 20. Big Data Technologies  Power an iterative discovery and engineering process.  Read and transform massive amount of data on cheap commodity software using Massive parallel processing.  Schema on Read. Don’t need to impose structure of every piece of information gathered.  Hadoop with more SQL like features or a traditional EDW with big data packages. Which is more useful?  Complex event analysis.  Real time analytics of high volume data streams  Complex event processing  Data Mining Software  Text analytics
  • 21. Big Data Technologies Products:  Hadoop and HDFS  NoSQL databases  Big Data extensions to RDBMS
  • 22. Apache Hadoop S/W components Source: Agile Datawarehousing for the enterprise by Ralph Hughes
  • 23. Hadoop Reasons to use Hadoop:  Building a data ware house for the future. Gear up your skills for Hadoop and Big Data as the data size grows larger. Major distributions like Horton works, Cloudera, MapR have enterprise Hub editions which can be deployed.  There is a complaint that not suitable for quick interactive querying. But then, Cloudera’s Impala and Horton works Stinger initiative have made interactive querying much faster.  Horton works platform provides indexing and search features using Apache Solr which can make search and querying faster.  Horton works came up with something like Apache Zeppelin which brings data visualization and collaboration features to Hadoop and Spark.  Provides Apache Sqoop to load data from RDBMS.  Pig and Map Reduce for ETL  Weekly, hourly and monthly work flow schedules  Apache Flume to load web logs data.
  • 25. NO SQL Databases. Advantages:  Schema less read  Auto Sharding  Cloud computing (AWS)  Replication  No separate application or expensive add ons  Integrated Caching  In memory caching for high throughput and low latency  Open Source  Cassandra, Redshift, Hbase  Document based  Graph Stores – Neo4J and Giraph  Key-Value Stores – Riak and Berkeley DB, Redis. Complex info as BLOBs in value columns  Column wide stores – Cassandra and Hbase
  • 26. Why Implement NOSQL?  Big Data getting bigger. New sources of data emerge eventually.  More users are going online.  Open Source- downloaded, implemented and scaled at little cost.  Viable alternative to expensive proprietary software  Increase speed and agility of development.  When requirements change data model also changes.
  • 27. 5 considerations to evaluate NoSQL  Data Model  Document model – MongoDB, CouchDB  Natural mapping of document object model to OOP  Query on any field  Graph Databases- traversing relationships is the key  Social networks and supply chain  Columnar and wide column data bases  Query Model  Consistency Model  APIs  Commercial Support & Community strength
  • 28. DWaas Amazon redshift  Cost effective: $1000 per terabyte per year  Columnar storage – fast access, parallelize queries  MPP DW architecture  Cheap, simple, secure and compatible with a SQL interface  Automate provisioning, configuring and monitoring of a cloud data warehouse.  Integrations to Amazon S3, Amazon DynamoDB, Amazon Elastic Map reduce, Amazon Kinesis.  Security is built in.  Amazon web services management console.  Network Isolation using Virtual private cloud.
  • 29. Presentation/Visualization  Tableau o Easy to use drag and drop interface o No Code o Connect to Hadoop, Cloud, SQL databases o Offers free training o Trend analysis, regression and correlation analysis o In memory data analysis o Data blending o Clutter Free GUI  Qlik View o Faster in memory computation
  • 30. Analytics and forecasting • R, Python, Apache Spark – for predictive modeling and forecasting • Connected the data ware house with R, Python and Spark. • R libraries - R part, Random Forests, ROCR, mBoost • Python – Scikit Learn, Numpy, Pandas, Sci-py • Spark – ML and MLlib

Hinweis der Redaktion

  1. Interesting Questions: Which flavor of HNF is best for each use case? What does a physical HNF model look like? What are best platforms to model HNF schema for performance? How do we fold in data governance? Where to place columns that hold applied business rules (derived columns)? How to merge a HNF warehouse with a 3NF EDW? Can a HNF warehouse support self service BI? How do HNF advantages compare to Hyper generalization? Ceregenics
  2. Points of comparison between HNF and HGF What do physical models look like. How do you calculate and store derived values. Performance and platform considerations. Merging a new model style into existing EDWs
  3. The source data is converted into integration layer with 6 tables which contain all the information. This can be conveniently projected into data marts and presentation layers. Convert a drawning to things type, link types,
  4. Latest productivity tools for data analytics such as data virtualization, data warehouse automation and big data management system offers the team a new type of application development cycle that dramatically reduces the labor required design build and deploy each incremental version of EDW
  5. Where you make data from multiple databases be accessible through a single virtualization layer.