SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
qwertyuiopasdfghjklzxcvbnmq
wertyuiopasdfghjklzxcvbnmqw
ertyuiopasdfghjklzxcvbnmqwer
tyuiopasdfghjklzxcvbnmqwerty
uiopasdfghjklzxcvbnmqwertyui
Innovation and New Technologies
Professor: Carlo Vaccari
opasdfghjklzxcvbnmqwertyuiop
asdfghjklzxcvbnmqwertyuiopas
dfghjklzxcvbnmqwertyuiopasdf
ghjklzxcvbnmqwertyuiopasdfgh
jklzxcvbnmqwertyuiopasdfghjkl
zxcvbnmqwertyuiopasdfghjklzx
cvbnmqwertyuiopasdfghjklzxcv
bnmqwertyuiopasdfghjklzxcvbn
mqwertyuiopasdfghjklzxcvbnm
qwertyuiopasdfghjklzxcvbnmq
wertyuiopasdfghjklzxcvbnmqw
2/4/2014

Student: Rando Veizi
Contents
Data Warehouses .................................................................................................................................... 2
History ................................................................................................................................................. 2
Introduction ........................................................................................................................................ 3
Why DW ? ........................................................................................................................................... 5
DW environment................................................................................................................................. 5
Bottom-up Design ............................................................................................................................... 6
Top-down Design ................................................................................................................................ 7
Top-down vs bottom-up ..................................................................................................................... 8
The hybrid design................................................................................................................................ 9
DW vs OS ........................................................................................................................................... 10
Pentaho Suite ........................................................................................................................................ 11
Introduction ...................................................................................................................................... 11
Installing Pentaho Suite .................................................................................................................... 12
Starting the BI Platform: ............................................................................................................... 12
How to Log Into the Pentaho User Console .................................................................................. 12
Trying some tools… ........................................................................................................................... 13
Community Dashboard Editor (CDE)............................................................................................. 13
Saiku .............................................................................................................................................. 14
Data warehouses and Pentaho Suite : .............................................................................................. 15
Data Warehouses
History
The DW notion dates to the late 80s when some IDM researches developed “business data
warehouse” . At first the idea of DW was intended to create a model of architecture for the data flow
that goes from the operational system to the decision support environments.

That concept wanted to support different problems associated with this flow such as the high costs
associated with it. Without DW , o good amount of redundancy was needed to support multiple
decision support environments. In bigger companies it was normal for multiple decision support
environments to operate independently.

Even if each environment served different users, the usually needed much of the same stored data.
The processes of managing data from different sources in most of the cases from long-term existing
OS-s (Legacy Systems) was partially replicated for each one of the environments. Moreover, the
operational systems were frequently re-examined as new decision support requirements emerged.
Often new requirements necessitated gathering, cleaning and integrating new data from DM that were
tailored for ready access by users.
Introduction

Figure 1: All data warehouses processes in one picture

Data warehouse (DW,DWH), or Enterprise DW (EDW),is a database that is used for



Reporting
Data analysis

A central repository of data (DW) is created by integrating data from different disparate sources DW
stores historical data and can be used to create trending reports for senior management reporting
such as annual and quarterly comparisons. The data that is stored in the DW gets uploaded by the
operational system such as sales or marketing. This data itself can pass through (but it is not always
this way) an operational data store for certain operations before it can be used in the DW for
reporting. The ETL-based DW uses staging ,data integration , and access layers to house its key
functions.



The staging database stores data that has been extracted from each of the data systems
The integration layer integrates data sets by transforming this data from the staging layer to
an ODS(operational data store) database




The integrated data then will be moved to another location, to another database called data
warehouse database where it will be divided in groups(called dimensions) in facts and
aggregate facts all arranged into a hierarchical classification. The combination of these facts
and dimensions can also be called star schema.
The function of the access layer is to retrieve data

If a DW is constructed from an integrated data source systems it does not require nor ETL, staging
databases or even ODS databases.
These systems can be considered as a part of a distributed operational store layer. The integrated
data source systems and DW are all integrated since no transformation of dimensional or reference
data is done and this is different from ETL.
This integrated DW architecture supports the drill down from the aggregate of the DW to the
transcriptional data of the integrated source data systems.
A data mart is a DW in “miniature”, and it is focused on a specific area of interest. Essentially DW can
be subdivided in data marts for better performance and in ease of use(easy to use) within the area.
So basically an organization can create 1 to n data marts and it can go towards a larger and more
complex enterprise DW .
In this definition DW is focuses on data storage. To the main source of the data happens the
following:





Is cleaned
transformed
catalogued
made available for use (for managers, business professionals for data mining or analytical
processing)
Why DW ?
DW always keeps a copy from the source transaction systems. This kind of architecture gives us the
possibility to :

1. Group data from different sources into a single database and this way only one query engine
is needed to present the data.
2. Reduce the level of database isolation in the transaction processing systems that is caused
by trying to run large analysis queries in transaction processing databases.
3. Save and keep the data history, even though source transaction systems do not.
4. Takes and integrates the data from many source systems creating a central view across the
enterprise.
5. Provide consistent codes and descriptions that improves the quality of data.
6. Restructure the data so that it can be more user-friendly to the business users.
7. Structure the data so it can have a very good query performance, leaving the OS(Operative
System).
8. Make the decision-support queries user-friendly to write.

DW environment
The environment for DW and DM comprises the following :







Source systems that provide data to the DW or DM
Technologies and processes that prepare data to be used
Ample architectures that store data into an organization’s DW or DM.
Lots of tools and apps for a different range of users.
Metadata, data quality and governance processes should be in the place where they belong
to ensure that DW/DM meets its purposes.

These days the most successful companies are those that can act, respond very quickly and in a
flexible way to market changes and new opportunities. A key to this response is the good and efficient
use of data and the information by analysts and managers.
Bottom-up Design

Figure 2: Bottom-Up
By building a series of data marts to an agreed architecture, the enterprise data warehouse can be
assembled slice by slice, until it is complete enough to regard the data marts as subsets of the now
much greater whole. Architecture is key to success, as the data marts must not be built in isolation.
Users need therefore to design data marts in the knowledge that each will eventually form part of a
larger enterprise data warehouse.
Such an approach can prove attractive to businesses. Each data mart can be implemented within six
to nine months. Each can tackle an identifiable business problem making it possible to calculate
returns on investment (ROI). The approach also offers a valuable learning curve for the build team,
who can test out products and processes until they get it right.
An approach to data warehouse design known as bottom-up was designed by Ralph Kimball.
In this approach DM are first created to provide reporting and analytical capabilities for specific
business processes. Primarily, DM contains dimension and facts. Facts contain either atomic data
and summarized data if necessary. A data mart often models a precise business area that can be
sales or production. All there DM can be summarized(integrated) to create a comprehensive DW. The
DW bus architecture is primarily an “implementation of the bus” , a collection of conformed
dimensions and facts. Those are dimensions that are shared between facts in at least two DM. 7
The integration of the DM in the DW is centered on the conformed dimensions, that define possible
integration points between DM. The process that takes place when more than two DM integrate is
called DRILL-ACROSS(DA) . A DA summarizes the data along the keys of the conformed dimensions
of each fact that participates in the DA always followed by a join on the keys of these grouped facts.
The most important management task is to make sure that the DM dimensions among data marts are
consistent.
Business value can be returned as quickly as the first data marts can be created, and the method
lends itself well to an exploratory and iterative approach to building data warehouses.
Example: DW effort can start in the department of sales, if build a Sales DM. After this DM is
completed it can be expanded in another kind of DM that can be a production one for example. For
DM-ts to be integrable with each other is needed from them to share the same bus.
If the DM integration succeeds, than the DW through this 2 DM-s can deliver integrated information
about sales and production which usually is a very important value for the business.

Top-down Design

Figure 3: Top-down
The opposite of starting with individual business issues and expanding up the organisation hierarchy
is to start at the top. A top down enterprise data warehouse and a subset data marts strategy is "the
most elegant design approach", says Doug Hackney of business intelligence systems specialist, the
Enterprise Group. He says that such an approach would vastly ease maintenance, summarisation,
metadata management and extraction, transformation and loading (ETL) of data.

An approach to data warehouse design known as top-down was designed by Bill Inmon.

This approach is designed using “Atomic” data that is a normalized enterprise data model. Its
function is to store the type of data that is at the lowest level of detail in the DW. Dimensional DM
containing needed for specific business processes of departments are created from the DW.
According to Inmon the DW is the center of CIF(Corporate Information Factory), which provides a
logical framework for delivering business intelligence and business management capabilities.
Top-down vs bottom-up
All in one picture :

Figure 4: T-D vs B-U
The hybrid design
The Hybrid Data Warehouse (Hybrid) is uniquely suited to support both EDW and datamart
applications in one database. It can accommodate large volumes of historical data typically found in
the EDW, while also performing well for OLAP queries typically done in datamarts. The Hybrid
database structure contains both normalized snowflakes and de-normalized star schemas. The
controlled redundancy inherent in this design provides good response time for a variety queries.
The Hybrid architecture can also be used to implement the ODS in the same database, as long as
sub-second response times are not a requirement. Because the ODS can be used by operational
systems, the response time of the database can become an issue.
Because there is only one database schema, the Hybrid model significantly reduces the cost of
developing the ETL processes. Real-time (or near real-time) updates can be supported by pushing
data updates out immediately to the Hybrid Warehouse directly from the operational system, or by
connecting the ETL engine to an Enterprise Service Bus (ESB).
The Hybrid model was used to develop one of the largest databases in Canada. It includes 34
dimensional roles with multiple hierarchies, has over 1500 attributes, and handles 40 million
transactions per day in near real time, which translates into one billion rows per month.
The Hybrid model may not be able to fully replace an ODS requirement for sub-second response
time. But it can offer a one stop solution for organizations that have very large data volumes and are
looking for a cost effective way to support a variety of BI requirements across the organization.
DW vs OS
The fundamental difference between OS and DW system is that the OS are designed to support
transaction processing whereas data warehousing systems are designed to support online analytical
processing(OLAP).
Based on this fundamental difference, data usage patterns associated with operational systems are
significantly different than usage patterns associated with data warehousing systems. As a result,
data warehousing systems are designed and optimized using methodologies that drastically differ
from that of operational systems.
The table below summarizes many of the differences between operational systems and data
warehousing systems.

Operative Systems

Data Warehousing

Operational systems are generally designed to support
high-volume transaction processing with minimal backend reporting.
Operational systems are generally processoriented or process-driven, meaning that they are
focused on specific business processes or tasks.
Example tasks include billing, registration, etc.

Data warehousing systems are generally designed to
support high-volume analytical processing (i.e. OLAP)
and subsequent, often elaborate report generation.
Data warehousing systems are generally subjectoriented, organized around business areas that the
organization needs information about. Such subject
areas are usually populated with data from one or
more operational systems. As an example, revenue
may be a subject area of a data warehouse that
incorporates data from operational systems that
contain student tuition data, alumni gift data, financial
aid data, etc.
Data warehousing systems are generally concerned
with historical data.
Data within a data warehouse is generally non-volatile,
meaning that new data may be added regularly, but
once loaded, the data is rarely changed, thus
preserving an ever-growing history of information. In
short, data within a data warehouse is generally readonly.
Data warehousing systems are generally optimized to
perform fast retrievals of relatively large volumes of
data.
Data warehousing systems are generally integrated at
a layer above the application layer, avoiding data
redundancy problems.

Operational systems are generally concerned
with current data.
Data within operational systems are generally updated
regularly according to need.

Operational systems are generally optimized to
perform fast inserts and updates of relatively small
volumes of data.
Operational systems are generally application-specific,
resulting in a multitude of partially or non-integrated
systems and redundant data (e.g. billing data is not
integrated with payroll data).
Operational systems generally require a non-trivial
level of computing skills amongst the end-user
community.

Data warehousing systems generally appeal to an enduser community with a wide range of computing skills,
from novice to expert users.

Table 1: DW vs OS
Pentaho Suite

Introduction
Pentaho was founded in 2004. It is headquartered in Orlando, FL, USA. One of the most important
advantages that it has is that it offers a suite of open source business intelligence (BI) products.
These products called Pentaho Business Analytics provide data integration , OLAP(online analytical
processing) services, reporting dashboarding and, mining and ETL capabilities.
Pentaho is the open source business intelligence development platform which has different
components integrated with it. You have both open source and commercial versions available to
support your BI need. This article is scoped to help open source business intelligence developer to
integrate CTOOLS on CDF to fulfil their dashboard development BI needs.

Figure 5: Pentaho community edition vs pentaho enterprise edition
Installing Pentaho Suite
Now I will show you how to install Pentaho Suite community edition(CE) along with some tools and
explain their purpose.
a) Download Pentaho Server from http://community.pentaho.com/. Choose zip or
tar.gz according to preferences
b) Tomcat Install
c) Set up MySQL
d) Configure the BI Server

Starting the BI Platform:
In order to use and configure the Pentaho BI Platform, you must start the BI Server, then the
Pentaho Administration Console.
1. To start the BI Server, run the start-pentaho script in the /biserver-ce/ directory.
2. To start the Pentaho Administration Console, run the start script (on Windows) or startup script
(onLinux) in the /biserver-ce/administration-console/ directory.

How to Log Into the Pentaho User Console

1. Open a Web browser and type in the Web or IP address of the Pentaho server, which is
http://localhost:8080/pentaho/ by default.
You'll see an introductory screen with some Pentaho-related information and a Login button in the
center of the screen.
2. Click Login.
The login dialog will appear.
3. For the locally installed version of the BI Suite, select Joe from the user drop-down box, and type
in password into the password field, then click Login. For hosted demo users, select
Guest and type in guest as the password instead. You are now logged into the Pentaho User Console
and ready to start creating and running reports.
Figure 6: Pentaho’s Login interface

Trying some tools…
Community Dashboard Editor (CDE) is one of the plugins designed for Pentaho BI Server,
contributed and maintained by Pentaho Partner webdetails.
-The pourpose of this tool is to create dashboards
-Community Dashboard Editor (CDE) was born to simplify the creation, edition and rendering
processes of the CTools Dashboards.
-CDE is a very powerful and complete tool, combining front end with data sources and custom
components in a seamless way.
Now to create a Dashboard I followed some examples here and here.
First of all after we install CDE our Pentaho interface will change , and this icon will be added :
By experimenting and following guides I was able create something(screenshots below):

And that is a dashboard about how many exams did I take every year in my bachelor degree.

Saiku
Another tool that I studied is saiku. Saiku is a modular open-source analysis suite offering
lightweight OLAP which remains easily embeddable, extendable and configurable. It is similar in
form and function to the Pentaho Analyzer Plugin. It allows a user to visually create queries by
dragging parts of a previously defined OLAP schema onto a canvas, where other activities can take
place like filtering, sorting, creating calculated members from other measures, exporting the result
table to PDF or MS Excel, and optionally graphing the data. A restful server connects to existing OLAP
systems, which then powers user-friendly, intuitive analytics via a lightweight JQuery-based
frontend.
Turning data into information shouldn't be hard, it should be easy and fun. The Saiku project is all
about creating tools that are easy-to-use by anyone who wants to crunch numbers, visualize
information, gain insight from data and act on it.
Follow this link and you will understand much easier how does saiku work
I you are willing to understand more you can go to these web addresses http://pedroalvesbi.blogspot.it/2011/06 or http://codeissue.com/articles/a04e87158bb8552/pentaho-bi-ctools-cdf-cdacde-saiku-analytics-etc-using-cygwin

Data warehouses and Pentaho Suite :
Open-source Pentaho provides business intelligence (BI) and data warehousing solutions at a
fraction of the cost of proprietary solutions. To know more about the fusion of data warehouses and
pentaho suite integration you might like to buy(or downoad) and take a look to Pentaho Solutions:
Business Intelligence and Data Warehousing with Pentaho and MySQL.

Weitere ähnliche Inhalte

Was ist angesagt?

Implementation of Data Marts in Data ware house
Implementation of Data Marts in Data ware houseImplementation of Data Marts in Data ware house
Implementation of Data Marts in Data ware houseIJARIIT
 
Data Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural FrameworkData Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural FrameworkDr. Sunil Kr. Pandey
 
Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best PracticesEduardo Castro
 
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCE
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCESALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCE
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCEcscpconf
 
Building a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperBuilding a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperImpetus Technologies
 
Lecture 03 - The Data Warehouse and Design
Lecture 03 - The Data Warehouse and Design Lecture 03 - The Data Warehouse and Design
Lecture 03 - The Data Warehouse and Design phanleson
 
Data modeling star schema
Data modeling star schemaData modeling star schema
Data modeling star schemaSayed Ahmed
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesIvo Andreev
 
04 Dimensional Analysis - v6
04 Dimensional Analysis - v604 Dimensional Analysis - v6
04 Dimensional Analysis - v6Prithwis Mukerjee
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
Gartner magic quadrant for data warehouse database management systems
Gartner magic quadrant for data warehouse database management systemsGartner magic quadrant for data warehouse database management systems
Gartner magic quadrant for data warehouse database management systemsparamitap
 

Was ist angesagt? (20)

Implementation of Data Marts in Data ware house
Implementation of Data Marts in Data ware houseImplementation of Data Marts in Data ware house
Implementation of Data Marts in Data ware house
 
Presentation
PresentationPresentation
Presentation
 
Data Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural FrameworkData Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural Framework
 
Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best Practices
 
Teradata
TeradataTeradata
Teradata
 
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCE
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCESALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCE
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCE
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Data-ware Housing
Data-ware HousingData-ware Housing
Data-ware Housing
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Building a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperBuilding a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White Paper
 
Data warehouse
Data warehouse Data warehouse
Data warehouse
 
Lecture 03 - The Data Warehouse and Design
Lecture 03 - The Data Warehouse and Design Lecture 03 - The Data Warehouse and Design
Lecture 03 - The Data Warehouse and Design
 
Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
 
Data wirehouse
Data wirehouseData wirehouse
Data wirehouse
 
Unit 1
Unit 1Unit 1
Unit 1
 
Data modeling star schema
Data modeling star schemaData modeling star schema
Data modeling star schema
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 
04 Dimensional Analysis - v6
04 Dimensional Analysis - v604 Dimensional Analysis - v6
04 Dimensional Analysis - v6
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Gartner magic quadrant for data warehouse database management systems
Gartner magic quadrant for data warehouse database management systemsGartner magic quadrant for data warehouse database management systems
Gartner magic quadrant for data warehouse database management systems
 

Ähnlich wie Rando Veizi: Data warehouse and Pentaho suite

DataWarehousingandAbInitioConcepts.ppt
DataWarehousingandAbInitioConcepts.pptDataWarehousingandAbInitioConcepts.ppt
DataWarehousingandAbInitioConcepts.pptPurnenduMaity2
 
Data warehouse presentaion
Data warehouse presentaionData warehouse presentaion
Data warehouse presentaionsridhark1981
 
Top 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdfTop 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdfDatacademy.ai
 
UNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docxUNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docxDURGADEVIL
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse conceptsobieefans
 
Introduction to data warehousing
Introduction to data warehousingIntroduction to data warehousing
Introduction to data warehousinguncleRhyme
 
Unit III Introduction to DWH.pdf
Unit III Introduction to DWH.pdfUnit III Introduction to DWH.pdf
Unit III Introduction to DWH.pdfShivarkarSandip
 
Data warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designData warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designSarita Kataria
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseSOMASUNDARAM T
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Materialobieefans
 
Data warehouse
Data warehouseData warehouse
Data warehouseRajThakuri
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsJane Roberts
 
TDWI checklist 2018 - Data Warehouse Infrastructure
TDWI checklist 2018 - Data Warehouse InfrastructureTDWI checklist 2018 - Data Warehouse Infrastructure
TDWI checklist 2018 - Data Warehouse InfrastructureJeannette Browning
 

Ähnlich wie Rando Veizi: Data warehouse and Pentaho suite (20)

DataWarehousingandAbInitioConcepts.ppt
DataWarehousingandAbInitioConcepts.pptDataWarehousingandAbInitioConcepts.ppt
DataWarehousingandAbInitioConcepts.ppt
 
Oracle sql plsql & dw
Oracle sql plsql & dwOracle sql plsql & dw
Oracle sql plsql & dw
 
Data warehouse presentaion
Data warehouse presentaionData warehouse presentaion
Data warehouse presentaion
 
Course Outline Ch 2
Course Outline Ch 2Course Outline Ch 2
Course Outline Ch 2
 
Top 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdfTop 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdf
 
DW 101
DW 101DW 101
DW 101
 
UNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docxUNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docx
 
DATA WAREHOUSE.pptx
DATA WAREHOUSE.pptxDATA WAREHOUSE.pptx
DATA WAREHOUSE.pptx
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
Introduction to data warehousing
Introduction to data warehousingIntroduction to data warehousing
Introduction to data warehousing
 
Unit III Introduction to DWH.pdf
Unit III Introduction to DWH.pdfUnit III Introduction to DWH.pdf
Unit III Introduction to DWH.pdf
 
Data warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designData warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-design
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Material
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
DMDW 1st module.pdf
DMDW 1st module.pdfDMDW 1st module.pdf
DMDW 1st module.pdf
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
 
Data Warehouse 101
Data Warehouse 101Data Warehouse 101
Data Warehouse 101
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
TDWI checklist 2018 - Data Warehouse Infrastructure
TDWI checklist 2018 - Data Warehouse InfrastructureTDWI checklist 2018 - Data Warehouse Infrastructure
TDWI checklist 2018 - Data Warehouse Infrastructure
 

Mehr von Carlo Vaccari

HLG Big Data project and Sandbox
HLG Big Data project and SandboxHLG Big Data project and Sandbox
HLG Big Data project and SandboxCarlo Vaccari
 
I Big Data e la Statistica: un progetto internazionale
I Big Data e la Statistica: un progetto internazionaleI Big Data e la Statistica: un progetto internazionale
I Big Data e la Statistica: un progetto internazionaleCarlo Vaccari
 
Andrea Talamonti: CKAN a tool for Open Data
Andrea Talamonti: CKAN a tool for Open DataAndrea Talamonti: CKAN a tool for Open Data
Andrea Talamonti: CKAN a tool for Open DataCarlo Vaccari
 
Fabrizio Allegretto: Open Data & University
Fabrizio Allegretto: Open Data & UniversityFabrizio Allegretto: Open Data & University
Fabrizio Allegretto: Open Data & UniversityCarlo Vaccari
 
Yapo Juares Tanguy: RSS environment
Yapo Juares Tanguy: RSS environmentYapo Juares Tanguy: RSS environment
Yapo Juares Tanguy: RSS environmentCarlo Vaccari
 
Matteo Marchionne: Foaf e feed reader
Matteo Marchionne: Foaf e feed readerMatteo Marchionne: Foaf e feed reader
Matteo Marchionne: Foaf e feed readerCarlo Vaccari
 
Alex Haechler: China vs USA social networks
Alex Haechler: China vs USA social networksAlex Haechler: China vs USA social networks
Alex Haechler: China vs USA social networksCarlo Vaccari
 
Carlo Colicchio: Big Data for business
Carlo Colicchio: Big Data for businessCarlo Colicchio: Big Data for business
Carlo Colicchio: Big Data for businessCarlo Vaccari
 
Yves Studer: Big Data in practice
Yves Studer: Big Data in practiceYves Studer: Big Data in practice
Yves Studer: Big Data in practiceCarlo Vaccari
 
Klevis Mino: MongoDB
Klevis Mino: MongoDBKlevis Mino: MongoDB
Klevis Mino: MongoDBCarlo Vaccari
 
Unkan Erol: Xing vs Linkedin
Unkan Erol: Xing vs LinkedinUnkan Erol: Xing vs Linkedin
Unkan Erol: Xing vs LinkedinCarlo Vaccari
 
Big Data Conference Ottobre 2013
Big Data Conference Ottobre 2013Big Data Conference Ottobre 2013
Big Data Conference Ottobre 2013Carlo Vaccari
 
Big data analytics vaccari oct2013
Big data analytics vaccari oct2013Big data analytics vaccari oct2013
Big data analytics vaccari oct2013Carlo Vaccari
 
Serena Carota: Open Data nella Regione Marche
Serena Carota: Open Data nella Regione MarcheSerena Carota: Open Data nella Regione Marche
Serena Carota: Open Data nella Regione MarcheCarlo Vaccari
 
Introduzione ai Social network
Introduzione ai Social network  Introduzione ai Social network
Introduzione ai Social network Carlo Vaccari
 
Social network ,ricerca di lavoro e ricerca scientifica
Social network ,ricerca di lavoro e ricerca scientificaSocial network ,ricerca di lavoro e ricerca scientifica
Social network ,ricerca di lavoro e ricerca scientificaCarlo Vaccari
 
Social network and job searching and SN for researchers
Social network and job searching and SN for researchersSocial network and job searching and SN for researchers
Social network and job searching and SN for researchersCarlo Vaccari
 
Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8Carlo Vaccari
 
Social networks , Job Searching and Research - 1
Social networks , Job Searching and Research - 1Social networks , Job Searching and Research - 1
Social networks , Job Searching and Research - 1Carlo Vaccari
 

Mehr von Carlo Vaccari (20)

HLG Big Data project and Sandbox
HLG Big Data project and SandboxHLG Big Data project and Sandbox
HLG Big Data project and Sandbox
 
I Big Data e la Statistica: un progetto internazionale
I Big Data e la Statistica: un progetto internazionaleI Big Data e la Statistica: un progetto internazionale
I Big Data e la Statistica: un progetto internazionale
 
Andrea Talamonti: CKAN a tool for Open Data
Andrea Talamonti: CKAN a tool for Open DataAndrea Talamonti: CKAN a tool for Open Data
Andrea Talamonti: CKAN a tool for Open Data
 
Fabrizio Allegretto: Open Data & University
Fabrizio Allegretto: Open Data & UniversityFabrizio Allegretto: Open Data & University
Fabrizio Allegretto: Open Data & University
 
Yapo Juares Tanguy: RSS environment
Yapo Juares Tanguy: RSS environmentYapo Juares Tanguy: RSS environment
Yapo Juares Tanguy: RSS environment
 
Matteo Marchionne: Foaf e feed reader
Matteo Marchionne: Foaf e feed readerMatteo Marchionne: Foaf e feed reader
Matteo Marchionne: Foaf e feed reader
 
Alex Haechler: China vs USA social networks
Alex Haechler: China vs USA social networksAlex Haechler: China vs USA social networks
Alex Haechler: China vs USA social networks
 
Carlo Colicchio: Big Data for business
Carlo Colicchio: Big Data for businessCarlo Colicchio: Big Data for business
Carlo Colicchio: Big Data for business
 
Yves Studer: Big Data in practice
Yves Studer: Big Data in practiceYves Studer: Big Data in practice
Yves Studer: Big Data in practice
 
Klevis Mino: MongoDB
Klevis Mino: MongoDBKlevis Mino: MongoDB
Klevis Mino: MongoDB
 
Unkan Erol: Xing vs Linkedin
Unkan Erol: Xing vs LinkedinUnkan Erol: Xing vs Linkedin
Unkan Erol: Xing vs Linkedin
 
Big Data Conference Ottobre 2013
Big Data Conference Ottobre 2013Big Data Conference Ottobre 2013
Big Data Conference Ottobre 2013
 
Big data analytics vaccari oct2013
Big data analytics vaccari oct2013Big data analytics vaccari oct2013
Big data analytics vaccari oct2013
 
Serena Carota: Open Data nella Regione Marche
Serena Carota: Open Data nella Regione MarcheSerena Carota: Open Data nella Regione Marche
Serena Carota: Open Data nella Regione Marche
 
Introduzione ai Social network
Introduzione ai Social network  Introduzione ai Social network
Introduzione ai Social network
 
Start up innovative
Start up innovativeStart up innovative
Start up innovative
 
Social network ,ricerca di lavoro e ricerca scientifica
Social network ,ricerca di lavoro e ricerca scientificaSocial network ,ricerca di lavoro e ricerca scientifica
Social network ,ricerca di lavoro e ricerca scientifica
 
Social network and job searching and SN for researchers
Social network and job searching and SN for researchersSocial network and job searching and SN for researchers
Social network and job searching and SN for researchers
 
Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8
 
Social networks , Job Searching and Research - 1
Social networks , Job Searching and Research - 1Social networks , Job Searching and Research - 1
Social networks , Job Searching and Research - 1
 

Kürzlich hochgeladen

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Kürzlich hochgeladen (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Rando Veizi: Data warehouse and Pentaho suite

  • 1. qwertyuiopasdfghjklzxcvbnmq wertyuiopasdfghjklzxcvbnmqw ertyuiopasdfghjklzxcvbnmqwer tyuiopasdfghjklzxcvbnmqwerty uiopasdfghjklzxcvbnmqwertyui Innovation and New Technologies Professor: Carlo Vaccari opasdfghjklzxcvbnmqwertyuiop asdfghjklzxcvbnmqwertyuiopas dfghjklzxcvbnmqwertyuiopasdf ghjklzxcvbnmqwertyuiopasdfgh jklzxcvbnmqwertyuiopasdfghjkl zxcvbnmqwertyuiopasdfghjklzx cvbnmqwertyuiopasdfghjklzxcv bnmqwertyuiopasdfghjklzxcvbn mqwertyuiopasdfghjklzxcvbnm qwertyuiopasdfghjklzxcvbnmq wertyuiopasdfghjklzxcvbnmqw 2/4/2014 Student: Rando Veizi
  • 2. Contents Data Warehouses .................................................................................................................................... 2 History ................................................................................................................................................. 2 Introduction ........................................................................................................................................ 3 Why DW ? ........................................................................................................................................... 5 DW environment................................................................................................................................. 5 Bottom-up Design ............................................................................................................................... 6 Top-down Design ................................................................................................................................ 7 Top-down vs bottom-up ..................................................................................................................... 8 The hybrid design................................................................................................................................ 9 DW vs OS ........................................................................................................................................... 10 Pentaho Suite ........................................................................................................................................ 11 Introduction ...................................................................................................................................... 11 Installing Pentaho Suite .................................................................................................................... 12 Starting the BI Platform: ............................................................................................................... 12 How to Log Into the Pentaho User Console .................................................................................. 12 Trying some tools… ........................................................................................................................... 13 Community Dashboard Editor (CDE)............................................................................................. 13 Saiku .............................................................................................................................................. 14 Data warehouses and Pentaho Suite : .............................................................................................. 15
  • 3. Data Warehouses History The DW notion dates to the late 80s when some IDM researches developed “business data warehouse” . At first the idea of DW was intended to create a model of architecture for the data flow that goes from the operational system to the decision support environments. That concept wanted to support different problems associated with this flow such as the high costs associated with it. Without DW , o good amount of redundancy was needed to support multiple decision support environments. In bigger companies it was normal for multiple decision support environments to operate independently. Even if each environment served different users, the usually needed much of the same stored data. The processes of managing data from different sources in most of the cases from long-term existing OS-s (Legacy Systems) was partially replicated for each one of the environments. Moreover, the operational systems were frequently re-examined as new decision support requirements emerged. Often new requirements necessitated gathering, cleaning and integrating new data from DM that were tailored for ready access by users.
  • 4. Introduction Figure 1: All data warehouses processes in one picture Data warehouse (DW,DWH), or Enterprise DW (EDW),is a database that is used for   Reporting Data analysis A central repository of data (DW) is created by integrating data from different disparate sources DW stores historical data and can be used to create trending reports for senior management reporting such as annual and quarterly comparisons. The data that is stored in the DW gets uploaded by the operational system such as sales or marketing. This data itself can pass through (but it is not always this way) an operational data store for certain operations before it can be used in the DW for reporting. The ETL-based DW uses staging ,data integration , and access layers to house its key functions.   The staging database stores data that has been extracted from each of the data systems The integration layer integrates data sets by transforming this data from the staging layer to an ODS(operational data store) database
  • 5.   The integrated data then will be moved to another location, to another database called data warehouse database where it will be divided in groups(called dimensions) in facts and aggregate facts all arranged into a hierarchical classification. The combination of these facts and dimensions can also be called star schema. The function of the access layer is to retrieve data If a DW is constructed from an integrated data source systems it does not require nor ETL, staging databases or even ODS databases. These systems can be considered as a part of a distributed operational store layer. The integrated data source systems and DW are all integrated since no transformation of dimensional or reference data is done and this is different from ETL. This integrated DW architecture supports the drill down from the aggregate of the DW to the transcriptional data of the integrated source data systems. A data mart is a DW in “miniature”, and it is focused on a specific area of interest. Essentially DW can be subdivided in data marts for better performance and in ease of use(easy to use) within the area. So basically an organization can create 1 to n data marts and it can go towards a larger and more complex enterprise DW . In this definition DW is focuses on data storage. To the main source of the data happens the following:     Is cleaned transformed catalogued made available for use (for managers, business professionals for data mining or analytical processing)
  • 6. Why DW ? DW always keeps a copy from the source transaction systems. This kind of architecture gives us the possibility to : 1. Group data from different sources into a single database and this way only one query engine is needed to present the data. 2. Reduce the level of database isolation in the transaction processing systems that is caused by trying to run large analysis queries in transaction processing databases. 3. Save and keep the data history, even though source transaction systems do not. 4. Takes and integrates the data from many source systems creating a central view across the enterprise. 5. Provide consistent codes and descriptions that improves the quality of data. 6. Restructure the data so that it can be more user-friendly to the business users. 7. Structure the data so it can have a very good query performance, leaving the OS(Operative System). 8. Make the decision-support queries user-friendly to write. DW environment The environment for DW and DM comprises the following :      Source systems that provide data to the DW or DM Technologies and processes that prepare data to be used Ample architectures that store data into an organization’s DW or DM. Lots of tools and apps for a different range of users. Metadata, data quality and governance processes should be in the place where they belong to ensure that DW/DM meets its purposes. These days the most successful companies are those that can act, respond very quickly and in a flexible way to market changes and new opportunities. A key to this response is the good and efficient use of data and the information by analysts and managers.
  • 7. Bottom-up Design Figure 2: Bottom-Up By building a series of data marts to an agreed architecture, the enterprise data warehouse can be assembled slice by slice, until it is complete enough to regard the data marts as subsets of the now much greater whole. Architecture is key to success, as the data marts must not be built in isolation. Users need therefore to design data marts in the knowledge that each will eventually form part of a larger enterprise data warehouse. Such an approach can prove attractive to businesses. Each data mart can be implemented within six to nine months. Each can tackle an identifiable business problem making it possible to calculate returns on investment (ROI). The approach also offers a valuable learning curve for the build team, who can test out products and processes until they get it right. An approach to data warehouse design known as bottom-up was designed by Ralph Kimball. In this approach DM are first created to provide reporting and analytical capabilities for specific business processes. Primarily, DM contains dimension and facts. Facts contain either atomic data and summarized data if necessary. A data mart often models a precise business area that can be sales or production. All there DM can be summarized(integrated) to create a comprehensive DW. The DW bus architecture is primarily an “implementation of the bus” , a collection of conformed dimensions and facts. Those are dimensions that are shared between facts in at least two DM. 7 The integration of the DM in the DW is centered on the conformed dimensions, that define possible integration points between DM. The process that takes place when more than two DM integrate is called DRILL-ACROSS(DA) . A DA summarizes the data along the keys of the conformed dimensions of each fact that participates in the DA always followed by a join on the keys of these grouped facts. The most important management task is to make sure that the DM dimensions among data marts are consistent. Business value can be returned as quickly as the first data marts can be created, and the method lends itself well to an exploratory and iterative approach to building data warehouses.
  • 8. Example: DW effort can start in the department of sales, if build a Sales DM. After this DM is completed it can be expanded in another kind of DM that can be a production one for example. For DM-ts to be integrable with each other is needed from them to share the same bus. If the DM integration succeeds, than the DW through this 2 DM-s can deliver integrated information about sales and production which usually is a very important value for the business. Top-down Design Figure 3: Top-down The opposite of starting with individual business issues and expanding up the organisation hierarchy is to start at the top. A top down enterprise data warehouse and a subset data marts strategy is "the most elegant design approach", says Doug Hackney of business intelligence systems specialist, the Enterprise Group. He says that such an approach would vastly ease maintenance, summarisation, metadata management and extraction, transformation and loading (ETL) of data. An approach to data warehouse design known as top-down was designed by Bill Inmon. This approach is designed using “Atomic” data that is a normalized enterprise data model. Its function is to store the type of data that is at the lowest level of detail in the DW. Dimensional DM containing needed for specific business processes of departments are created from the DW. According to Inmon the DW is the center of CIF(Corporate Information Factory), which provides a logical framework for delivering business intelligence and business management capabilities.
  • 9. Top-down vs bottom-up All in one picture : Figure 4: T-D vs B-U
  • 10. The hybrid design The Hybrid Data Warehouse (Hybrid) is uniquely suited to support both EDW and datamart applications in one database. It can accommodate large volumes of historical data typically found in the EDW, while also performing well for OLAP queries typically done in datamarts. The Hybrid database structure contains both normalized snowflakes and de-normalized star schemas. The controlled redundancy inherent in this design provides good response time for a variety queries. The Hybrid architecture can also be used to implement the ODS in the same database, as long as sub-second response times are not a requirement. Because the ODS can be used by operational systems, the response time of the database can become an issue. Because there is only one database schema, the Hybrid model significantly reduces the cost of developing the ETL processes. Real-time (or near real-time) updates can be supported by pushing data updates out immediately to the Hybrid Warehouse directly from the operational system, or by connecting the ETL engine to an Enterprise Service Bus (ESB). The Hybrid model was used to develop one of the largest databases in Canada. It includes 34 dimensional roles with multiple hierarchies, has over 1500 attributes, and handles 40 million transactions per day in near real time, which translates into one billion rows per month. The Hybrid model may not be able to fully replace an ODS requirement for sub-second response time. But it can offer a one stop solution for organizations that have very large data volumes and are looking for a cost effective way to support a variety of BI requirements across the organization.
  • 11. DW vs OS The fundamental difference between OS and DW system is that the OS are designed to support transaction processing whereas data warehousing systems are designed to support online analytical processing(OLAP). Based on this fundamental difference, data usage patterns associated with operational systems are significantly different than usage patterns associated with data warehousing systems. As a result, data warehousing systems are designed and optimized using methodologies that drastically differ from that of operational systems. The table below summarizes many of the differences between operational systems and data warehousing systems. Operative Systems Data Warehousing Operational systems are generally designed to support high-volume transaction processing with minimal backend reporting. Operational systems are generally processoriented or process-driven, meaning that they are focused on specific business processes or tasks. Example tasks include billing, registration, etc. Data warehousing systems are generally designed to support high-volume analytical processing (i.e. OLAP) and subsequent, often elaborate report generation. Data warehousing systems are generally subjectoriented, organized around business areas that the organization needs information about. Such subject areas are usually populated with data from one or more operational systems. As an example, revenue may be a subject area of a data warehouse that incorporates data from operational systems that contain student tuition data, alumni gift data, financial aid data, etc. Data warehousing systems are generally concerned with historical data. Data within a data warehouse is generally non-volatile, meaning that new data may be added regularly, but once loaded, the data is rarely changed, thus preserving an ever-growing history of information. In short, data within a data warehouse is generally readonly. Data warehousing systems are generally optimized to perform fast retrievals of relatively large volumes of data. Data warehousing systems are generally integrated at a layer above the application layer, avoiding data redundancy problems. Operational systems are generally concerned with current data. Data within operational systems are generally updated regularly according to need. Operational systems are generally optimized to perform fast inserts and updates of relatively small volumes of data. Operational systems are generally application-specific, resulting in a multitude of partially or non-integrated systems and redundant data (e.g. billing data is not integrated with payroll data). Operational systems generally require a non-trivial level of computing skills amongst the end-user community. Data warehousing systems generally appeal to an enduser community with a wide range of computing skills, from novice to expert users. Table 1: DW vs OS
  • 12. Pentaho Suite Introduction Pentaho was founded in 2004. It is headquartered in Orlando, FL, USA. One of the most important advantages that it has is that it offers a suite of open source business intelligence (BI) products. These products called Pentaho Business Analytics provide data integration , OLAP(online analytical processing) services, reporting dashboarding and, mining and ETL capabilities. Pentaho is the open source business intelligence development platform which has different components integrated with it. You have both open source and commercial versions available to support your BI need. This article is scoped to help open source business intelligence developer to integrate CTOOLS on CDF to fulfil their dashboard development BI needs. Figure 5: Pentaho community edition vs pentaho enterprise edition
  • 13. Installing Pentaho Suite Now I will show you how to install Pentaho Suite community edition(CE) along with some tools and explain their purpose. a) Download Pentaho Server from http://community.pentaho.com/. Choose zip or tar.gz according to preferences b) Tomcat Install c) Set up MySQL d) Configure the BI Server Starting the BI Platform: In order to use and configure the Pentaho BI Platform, you must start the BI Server, then the Pentaho Administration Console. 1. To start the BI Server, run the start-pentaho script in the /biserver-ce/ directory. 2. To start the Pentaho Administration Console, run the start script (on Windows) or startup script (onLinux) in the /biserver-ce/administration-console/ directory. How to Log Into the Pentaho User Console 1. Open a Web browser and type in the Web or IP address of the Pentaho server, which is http://localhost:8080/pentaho/ by default. You'll see an introductory screen with some Pentaho-related information and a Login button in the center of the screen. 2. Click Login. The login dialog will appear. 3. For the locally installed version of the BI Suite, select Joe from the user drop-down box, and type in password into the password field, then click Login. For hosted demo users, select Guest and type in guest as the password instead. You are now logged into the Pentaho User Console and ready to start creating and running reports.
  • 14. Figure 6: Pentaho’s Login interface Trying some tools… Community Dashboard Editor (CDE) is one of the plugins designed for Pentaho BI Server, contributed and maintained by Pentaho Partner webdetails. -The pourpose of this tool is to create dashboards -Community Dashboard Editor (CDE) was born to simplify the creation, edition and rendering processes of the CTools Dashboards. -CDE is a very powerful and complete tool, combining front end with data sources and custom components in a seamless way. Now to create a Dashboard I followed some examples here and here. First of all after we install CDE our Pentaho interface will change , and this icon will be added :
  • 15. By experimenting and following guides I was able create something(screenshots below): And that is a dashboard about how many exams did I take every year in my bachelor degree. Saiku Another tool that I studied is saiku. Saiku is a modular open-source analysis suite offering lightweight OLAP which remains easily embeddable, extendable and configurable. It is similar in form and function to the Pentaho Analyzer Plugin. It allows a user to visually create queries by dragging parts of a previously defined OLAP schema onto a canvas, where other activities can take place like filtering, sorting, creating calculated members from other measures, exporting the result table to PDF or MS Excel, and optionally graphing the data. A restful server connects to existing OLAP systems, which then powers user-friendly, intuitive analytics via a lightweight JQuery-based frontend.
  • 16. Turning data into information shouldn't be hard, it should be easy and fun. The Saiku project is all about creating tools that are easy-to-use by anyone who wants to crunch numbers, visualize information, gain insight from data and act on it. Follow this link and you will understand much easier how does saiku work I you are willing to understand more you can go to these web addresses http://pedroalvesbi.blogspot.it/2011/06 or http://codeissue.com/articles/a04e87158bb8552/pentaho-bi-ctools-cdf-cdacde-saiku-analytics-etc-using-cygwin Data warehouses and Pentaho Suite : Open-source Pentaho provides business intelligence (BI) and data warehousing solutions at a fraction of the cost of proprietary solutions. To know more about the fusion of data warehouses and pentaho suite integration you might like to buy(or downoad) and take a look to Pentaho Solutions: Business Intelligence and Data Warehousing with Pentaho and MySQL.