The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
DataWarehousingandAbInitioConcepts.ppt
1. Accenture Ab Initio Training 1
Introduction to
Ab Initio
Prepared By : Ashok Chanda
2. Accenture Ab Initio Training 2
Ab initio Session 1
Introduction to DWH
Explanation of DW Architecture
Operating System / Hardware Support
Introduction to ETL Process
Introduction to Ab Initio
Explanation of Ab Initio Architecture
3. Accenture Ab Initio Training 3
What is Data Warehouse
A data warehouse is a copy of transaction data
specifically structured for querying and
reporting.
A data warehouse is a subject-oriented,
integrated, time-variant and non-volatile
collection of data in support of management's
decision making process.
A data warehouse is a central repository for all
or significant parts of the data that an
enterprise's various business systems collect.
4. Accenture Ab Initio Training 4
Data Warehouse-Definitions
A data warehouse is a database geared towards
the business intelligence requirements of an
organization. The data warehouse integrates
data from the various operational systems and is
typically loaded from these systems at regular
intervals. Data warehouses contain historical
information that enables analysis of business
performance over time. A collection of databases
combined with a flexible data extraction system.
5. Accenture Ab Initio Training 5
Data Warehouse
A data warehouse can be normalized or
denormalized. It can be a relational
database, multidimensional database, flat
file, hierarchical database, object
database, etc. Data warehouse data often
gets changed. And data warehouses often
focus on a specific activity or entity.
6. Accenture Ab Initio Training 6
Why Use a Data Warehouse?
Data Exploration and Discovery
Integrated and Consistent data
Quality assured data
Easily accessible data
Production and performance awareness
Access to data in a timely manner
8. Accenture Ab Initio Training 8
Data warehouse Architecture
Data Warehouses can be architected in many different
ways, depending on the specific needs of a
business. The model shown below is the "hub-and-
spokes" Data Warehousing architecture that is popular in
many organizations.
In short, data is moved from databases used in
operational systems into a data warehouse staging area,
then into a data warehouse and finally into a set of
conformed data marts. Data is copied from one
database to another using a technology called ETL
(Extract, Transform, Load).
10. Accenture Ab Initio Training 10
The ETL Process
Capture
Scrub or Data cleansing
Transform
Load and Index
11. Accenture Ab Initio Training 11
ETL Technology
ETL Technology is an important component of the Data
Warehousing Architecture. It is used to copy data from
Operational Applications to the Data Warehouse Staging
Area, from the DW Staging Area into the Data
Warehouse and finally from the Data Warehouse into a
set of conformed Data Marts that are accessible by
decision makers.
The ETL software extracts data, transforms values of
inconsistent data, cleanses "bad" data, filters data and
loads data into a target database. The scheduling of
ETL jobs is critical. Should there be a failure in one ETL
job, the remaining ETL jobs must respond appropriately.
12. Accenture Ab Initio Training 12
Data Warehouse Staging Area
The Data Warehouse Staging Area is temporary location
where data from source systems is copied. A staging
area is mainly required in a Data Warehousing
Architecture for timing reasons. In short, all required
data must be available before data can be integrated
into the Data Warehouse.
Due to varying business cycles, data processing cycles,
hardware and network resource limitations and
geographical factors, it is not feasible to extract all the
data from all Operational databases at exactly the same
time
13. Accenture Ab Initio Training 13
Examples- Staging Area
For example, it might be reasonable to extract sales data on a daily
basis, however, daily extracts might not be suitable for financial
data that requires a month-end reconciliation process. Similarly, it
might be feasible to extract "customer" data from a database in
Singapore at noon eastern standard time, but this would not be
feasible for "customer" data in a Chicago database.
Data in the Data Warehouse can be either persistent (i.e. remains
around for a long period) or transient (i.e. only remains around
temporarily).
Not all business require a Data Warehouse Staging Area. For many
businesses it is feasible to use ETL to copy data directly from
operational databases into the Data Warehouse.
14. Accenture Ab Initio Training 14
Data warehouse
The purpose of the Data Warehouse in the overall Data
Warehousing Architecture is to integrate corporate
data. It contains the "single version of truth" for the
organization that has been carefully constructed from
data stored in disparate internal and external operational
databases.
The amount of data in the Data Warehouse is
massive. Data is stored at a very granular level of
detail. For example, every "sale" that has ever occurred
in the organization is recorded and related to dimensions
of interest. This allows data to be sliced and diced,
summed and grouped in unimaginable ways.
15. Accenture Ab Initio Training 15
Data Warehouse
Contrary to popular opinion, the Data Warehouses does
not contain all the data in the organization. It's purpose
is to provide key business metrics that are needed by
the organization for strategic and tactical decision
making.
Decision makers don't access the Data Warehouse
directly. This is done through various front-end Data
Warehouse Tools that read data from subject specific
Data Marts.
The Data Warehouse can be either "relational" or
"dimensional". This depends on how the business
intends to use the information.
16. Accenture Ab Initio Training 16
Data Warehouse Environment
In addition to a
relational/multidimensional database, a
data warehouse environment often
consists of an ETL solution, an OLAP
engine, client analysis tools, and other
applications that manage the process of
gathering data and delivering it to
business users.
17. Accenture Ab Initio Training 17
Data Mart
A subset of a data warehouse, for use by a
single department or function.
A repository of data gathered from operational
data and other sources that is designed to serve
a particular community of knowledge workers.
A subset of the information contained in a data
warehouse.
Data marts have the same definition as the data
warehouse (see below), but data marts have a
more limited audience and/or data content.
18. Accenture Ab Initio Training 18
Data Mart
ETL (Extract Transform Load) jobs extract data from the Data
Warehouse and populate one or more Data Marts for use by groups
of decision makers in the organizations. The Data Marts can be
Dimensional (Star Schemas) or relational, depending on how the
information is to be used and what "front end" Data Warehousing
Tools will be used to present the information.
Each Data Mart can contain different combinations of tables,
columns and rows from the Enterprise Data Warehouse. For
example, an business unit or user group that doesn't require a lot of
historical data might only need transactions from the current
calendar year in the database. The Personnel Department might
need to see all details about employees, whereas data such as
"salary" or "home address" might not be appropriate for a Data Mart
that focuses on Sales.
19. Accenture Ab Initio Training 19
Star Schema
The star schema is perhaps the simplest data
warehouse schema.
It is called a star schema because the entity-
relationship diagram of this schema resembles a
star, with points radiating from a central table.
The center of the star consists of a large fact
table and the points of the star are the
dimension tables.
20. Accenture Ab Initio Training 20
Star Schema – continued
A star schema is characterized by one or
more very large fact tables that contain
the primary information in the data
warehouse, and a number of much
smaller dimension tables (or lookup
tables), each of which contains
information about the entries for a
particular attribute in the fact table.
21. Accenture Ab Initio Training 21
Advantages of Star Schemas
Provide a direct and intuitive mapping between
the business entities being analyzed by end
users and the schema design.
Provide highly optimized performance for typical
star queries.
Are widely supported by a large number of
business intelligence tools, which may anticipate
or even require that the data-warehouse schema
contain dimension tables
Star schemas are used for both simple data
marts and very large data warehouses.
22. Accenture Ab Initio Training 22
Star schema
Diagrammatic representation of star
schema
23. Accenture Ab Initio Training 23
Snowflake Schema
The snowflake schema is a more complex
data warehouse model than a star
schema, and is a type of star schema.
It is called a snowflake schema because
the diagram of the schema resembles a
snowflake.
Snowflake schemas normalize dimensions
to eliminate redundancy.
24. Accenture Ab Initio Training 24
Snowflake Schema - Example
That is, the dimension data has been grouped
into multiple tables instead of one large table.
For example, a product dimension table in a star
schema might be normalized into a products
table, a product_category table, and a
product_manufacturer table in a snowflake
schema. While this saves space, it increases the
number of dimension tables and requires more
foreign key joins. The result is more complex
queries and reduced query performance.
25. Accenture Ab Initio Training 25
Diagrammatic representation
for Snowflake Schema
26. Accenture Ab Initio Training 26
Fact Table
The centralized table in a star schema is
called as FACT table. A fact table typically
has two types of columns: those that
contain facts and those that are foreign
keys to dimension tables. The primary key
of a fact table is usually a composite key
that is made up of all of its foreign keys.
27. Accenture Ab Initio Training 27
What happens during the ETL
process?
During extraction, the desired data is identified and
extracted from many different sources, including
database systems and applications. Depending on the
source system's capabilities (for example, operating
system resources), some transformations may take place
during this extraction process. The size of the extracted
data varies from hundreds of kilobytes up to gigabytes,
depending on the source system and the business
situation. After extracting data, it has to be physically
transported to the target system or an intermediate
system for further processing.
28. Accenture Ab Initio Training 28
Examples of Second-
Generation ETL Tools
Powermart 4.5 – Informatica Corporation
Pioneer due to market share
Ardent DataStage – Ardent Software, Inc.
General-purpose tool oriented to data marts
Sagent Data Mart Solution 3.0 – Sagent
Technology
Progressively integrated with Microsoft
Ab Initio 2.2 – Ab Initio Software
A kit of tools that can be used to build applications
Tapestry 2.1 – D2K, Inc
End-to-end data warehousing solution from a single vendor
29. Accenture Ab Initio Training 29
What to look for in ETL tools
Use optional data cleansing tool to clean-up source data
Use extraction/transformation/load tool to retrieve,
cleanse, transform, summarize, aggregate, and load data
Use modern, engine-driven technology for fast, parallel
operation
Goal: define 100% of the transform rule with point and
click interface
Support development of logical and physical data models
Generate and manage central metadata repository
Open metadata exchange architecture to integrate central
metadata with local metadata.
Support metadata standards
Provide end users access to metadata in business terms
30. Accenture Ab Initio Training 30
Operating System / Hardware
Support
This section discusses how a DBMS utilizes
OS/hardware features such as parallel
functionality, SMP/MPP support, and
clustering. These OS/hardware features
greatly extend the scalability and improve
performance. However, managing an
environment with these features is difficult
and expensive.
31. Accenture Ab Initio Training 31
Parallel Functionality
The introduction and maturation of parallel
processing environments are key enablers of
increasing database sizes, as well as providing
acceptable response times for storing, retrieving,
and administrating data. DBMS vendors are
continually bringing products to market that take
advantage of multi-processor hardware
platforms. These products can perform table
scans, backups, loads, and queries in parallel.
32. Accenture Ab Initio Training 32
Parallel Features
An overview of typical parallel functionality is given below :
Queries — Parallel queries can enhance scalability for many query
operations
Data load — Performance is always a serious issue when loading
large databases. Meeting response time requirements is the
overriding factor for determining the best load method and should
be a key part of a performance benchmark
Create table as select — This feature makes it possible to create
aggregated tables in parallel
Index creation — Parallel index creation exploits the benefits of
parallel hardware by distributing the workload generated by a large
index created for a large number of processors .
33. Accenture Ab Initio Training 33
Which parallel processor
configuration, SMP or MPP ?
SMP and clustered SMP environments , have the
flexibility and ability to scale in small increments.
SMP environments are often useful for the large,
but static data warehouse, where the data
cannot be easily partitioned, due to the
unpredictable nature of how the data is joined
over multiple tables for complex searches and
ad-hoc queries.
34. Accenture Ab Initio Training 34
Which parallel processor
configuration, SMP or MPP ?
MPP works well in environments where growth is potentially
unlimited, access patterns to the database are predictable, and the
data can be easily partitioned across different MPP nodes with
minimal data accesses crossing between them. This often occurs in
large OLTP environments, where transactions are generally small
and predictable, as opposed to decision support and data
warehouse environments, where multiple tables can be joined in
unpredictable ways.
In fact, data warehousing and decision support are the areas most
vendors of parallel hardware platforms and DBMSs are targeting.
MPP does not scale well if heavy data warehouse database accesses
must cross MPP nodes, causing I/O bottlenecks over the MPP
interconnect, or if multiple MPP nodes are continually locked for
concurrent record updates.
38. Accenture Ab Initio Training 38
Parallel Computer Architecture
Computers come in many “shapes and sizes”:
Single-CPU, Multi-CPU
Network of single-CPU computers
Network of multi-CPU computers
Multi-CPU machines are often called SMP’s (for
Symmetric Multi Processors).
Specially-built networks of machines are often called
MPP’s (for Massively Parallel Processors).
40. Accenture Ab Initio Training 40
History of Ab Initio
Ab Initio Software Corporation was founded
in the mid 1990's by Sheryl Handler, the former
CEO at Thinking Machines Corporation, after
TMC filed for bankruptcy. In addition to Handler,
other former TMC people involved in the
founding of Ab Initio included Cliff Lasser,
Angela Lordi, and Craig Stanfill.
Ab Initio is known for being very secretive in the
way that they run their business, but their
software is widely regarded as top notch.
41. Accenture Ab Initio Training 41
History of Ab Initio
The Ab Initio software is a fourth generation
data analysis, batch processing, data
manipulation graphical user interface (GUI)-
based parallel processing tool that is used
mainly to extract, transform and load data.
The Ab Initio software is a suite of products that
together provides platform for robust data
processing applications. The Core Ab Initio
Products are: The [Co>Operating System] The
Component Library The Graphical Development
Environment
42. Accenture Ab Initio Training 42
What Does “Ab Initio” Mean?
Ab Initio is Latin for “From the Beginning.”
From the beginning our software was designed to
support a complete range of business applications, from
simple to the most complex. Crucial capabilities like
parallelism and checkpointing can’t be added after the
fact.
The Graphical Development Environment and a powerful
set of components allow our customers to get valuable
results from the beginning.
43. Accenture Ab Initio Training 43
Ab Initio’s focus
“Moving Data”
move small and large volumes of data in an
efficient manner
deal with the complexity associated with business
data
High Performance
scalable solutions
Better productivity
44. Accenture Ab Initio Training 44
Ab Initio’s Software
Ab Initio software is a general-purpose
data processing platform for mission-
critical applications such as:
Data warehousing
Batch processing
Click-stream analysis
Data movement
Data transformation
45. Accenture Ab Initio Training 45
Applications of Ab Initio
Software
Processing just about any form and volume of data.
Parallel sort/merge processing.
Data transformation.
Rehosting of corporate data.
Parallel execution of existing applications.
46. Accenture Ab Initio Training 46
Ab Initio Provides For:
Distribution - a platform for applications to
execute across a collection of processors within
the confines of a single machine or across
multiple machines.
Reduced Run Time Complexity - the ability for
applications to run in parallel on any
combination of computers where the Ab Initio
Co>Operating System is installed from a single
point of control.
47. Accenture Ab Initio Training 47
Applications of Ab Initio
Software in terms of Data
Warehouse
Front end of Data Warehouse:
Transformation of disparate sources
Aggregation and other preprocessing
Referential integrity checking
Database loading
Back end of Data Warehouse:
Extraction for external processing
Aggregation and loading of Data Marts
48. Accenture Ab Initio Training 48
Ab Initio or Informatica-
Powerful ETL
Informatica and Ab Initio both support parallelism. But Informatica
supports only one type of parallelism but the Ab Initio supports
three types of parallelism. In Informatica the developer need to do
some partitions in server manager by using that you can achieve
parallelism concepts. But in Ab Initio the tool it self take care of
parallelism we have three types of parallelisms in Ab Initio 1.
Component 2. Data Parallelism 3. Pipe Line parallelism this is the
difference in parallelism concepts.
2. We don't have scheduler in Ab Initio like Informatica you need to
schedule through script or u need to run manually.
3. Ab Initio supports different types of text files means you can read
same file with different structures that is not possible in Informatica,
and also Ab Initio is more user friendly than Informatica so there is
a lot of differences in Informatica and Ab initio.
8. AbInitio doesn't need a dedicated administrator, UNIX or NT Admin will suffice, where as other ETL tools do have administrative work.
49. Accenture Ab Initio Training 49
Ab Initio or Informatica-
Powerful ETL-continued
Error Handling - In Ab Initio you can attach error and reject files to
each transformation and capture and analyze the message and data
separately. Informatica has one huge log! Very inefficient when
working on a large process, with numerous points of failure.
Robust transformation language - Informatica is very basic as far as
transformations go. While I will not go into a function by function
comparison, it seems that Ab Initio was much more robust.
Instant feedback - On execution, Ab Initio tells you how many
records have been processed/rejected/etc. and detailed
performance metrics for each component. Informatica has a debug
mode, but it is slow and difficult to adapt to.
50. Accenture Ab Initio Training 50
Both tools are fundamentally
different
Which one to use depends on the work at hand and
existing infrastructure and resources available.
Informatica is an engine based ETL tool, the power this
tool is in it's transformation engine and the code that it
generates after development cannot be seen or
modified. Ab Initio is a code based ETL tool, it generates
ksh or bat etc. code, which can be modified to achieve
the goals, if any that cannot be taken care through the
ETL tool itself.
Ab Initio doesn't need a dedicated administrator, UNIX
or NT Admin will suffice, where as other ETL tools do
have administrative work.
51. Accenture Ab Initio Training 51
Ab Initio Product Architecture
Native Operating System (Unix, Windows, OS/390)
The Ab Initio Co>Operating® System
Component
Library
Development Environments
GDE Shell
3rd Party
Components
User-defined
Components
User Applications
Ab Initio
EME
52. Accenture Ab Initio Training 52
Ab Initio Architecture-
Explanation
The Ab Initio Cooperating system unites the network of
computing resources-CPUs,storage disks , programs ,
datasets into a production quality data processing
system with scalable performance and mainframe class
reliability.
The Cooperating system is layered on the top of the
native operating systems of the collection of servers .It
provides a distributed model for process execution, file
management ,debugging, process monitoring ,
checkpointing .A user may perform all these functions
from a single point of control.
53. Accenture Ab Initio Training 53
Co>Operating System Services
Parallel and distributed application execution
Control
Data Transport
Transactional semantics at the application level.
Checkpointing.
Monitoring and debugging.
Parallel file management.
Metadata-driven components.
54. Accenture Ab Initio Training 54
Ab Initio: What We Do
Ab Initio software helps you build large-scale data
processing applications and run them in parallel
environments. Ab Initio software consists of two main
programs:
Co>Operating System:
which your system administrator installs on a host Unix
or Windows NT server, as well as on processing
computers.
The Graphical Development Environment (GDE):
which you install on your PC (GDE Computer) and
configure to communicate with the host.
55. Accenture Ab Initio Training 55
The Ab Initio Co>Operating®
System
The Co>Operating System Runs across
a variety of Operating Systems and
Hardware Platforms including OS/390 on
Mainframe, Unix, and Windows. Supports
distributed and parallel execution. Can
provide scalability proportional to the
hardware resources provided. Supports
platform independent data transport.
56. Accenture Ab Initio Training 56
The Ab Initio Co>Operating®
System-Continued
The Ab Initio Co>Operating System
depends on parallelism to connect (i.e.,
cooperate with) diverse databases. It
extracts,
transforms and loads data to and from
Teradata and other data sources.
57. Accenture Ab Initio Training 57
Solaris,
AIX, NT,
Linux,
NCR
Top Layer
Co-Op System
Any OS
Same Co-Op Command
On any OS.
Graphs can be moved from
One OS to another w/o any
Changes.
Co-Operating System Layer
GDE
GDE
GDE
GDE
58. Accenture Ab Initio Training 58
The Ab Initio Co>Operating System
Runs on:
Sun Solaris
IBM AIX
Hewlett-Packard HP-
UX
Siemens Pyramid
Reliant UNIX
IBM DYNIX/ptx
Silicon Graphics IRIX
Red Hat Linux
Windows NT 4.0
(x86)
Windows NT 2000
(x86)
Compaq Tru64 UNIX
IBM OS/390
NCR MP-RAS
59. Accenture Ab Initio Training 59
Connectivity to Other Software
Common, high performance database
interfaces:
IBM DB2, DB2/PE, DB2EEE, UDB, IMS
Oracle, Informix XPS,Sybase,Teradata,MS SQL
Server 7
OLE-DB
ODBC
Other software packages:
Connectors to many other third party products
Trillium, ErWin, Siebel, etc.
60. Accenture Ab Initio Training 60
Ab Initio Cooperating System
Ab Initio Software Corporation, headquartered in Lexington, MA, develops
software solutions that process vast amounts of data (well into the terabyte
range) in a timely fashion by employing many (often hundreds) of server
processors in parallel. Major corporations worldwide use Ab Initio software
in mission critical, enterprise-wide, data processing systems. Together,
Teradata and Ab Initio
deliver:
• End-to-end solutions for integrating and processing data throughout
the enterprise
• Software that is flexible, efficient, and robust, with unlimited scalability
• Professional and highly responsive support
The Co>Operating System executes your application by creating and managing
the processes and data flows that the components and arrows represent.
62. Accenture Ab Initio Training 62
The GDE
The Graphical Development Environment (GDE) provides
a graphical user interface into the services of the
Co>Operating System. The Graphical Development
Environment Enables you to create applications by
dragging and dropping Components. Allows you to point
and click operations on executable flow charts. The
Co>Operating System can execute these flowcharts
directly. Graphical monitoring of running applications
allows you to quantify data volumes and execution
times, helping spot opportunities for improving
performance.
64. Accenture Ab Initio Training 64
The Component Library:
The Component Library: Reusable software
Modules for Sorting, Data Transformation,
database Loading Etc. The components adapt at
runtime to the record formats and business rules
controlling their behavior.
Ab Initio products have helped reduce a
project’s development and research time
significantly.
65. Accenture Ab Initio Training 65
Components
Components may run on any computer running
the Co>Operating System.
Different components do different jobs.
The particular work a component accomplishes
depends upon its parameter settings.
Some parameters are data transformations, that
is business rules to be applied to an input (s) to
produce a required output.
67. Accenture Ab Initio Training 67
EME
The Enterprise Meta>Environment (EME) is a high-
performance object-oriented storage system that
inventories and manages various kinds of information
associated with Ab Initio applications. It provides storage
for all aspects of your data processing system, from
design information to operations data.
The EME also provides rich store for the applications
themselves, including data formats and business rules. It
acts as hub for data and definitions . Integrated
metadata management provides the global and
consolidated view of the structure and meaning of
applications and data- information that is usually
scattered throughout you business .
68. Accenture Ab Initio Training 68
Benefits of EME
The Enterprise Meta>Environment provides a rich store
for applications and all of their associated information
including :
Technical Metadata-Applications related business rules
,record formats and execution statistics
Business Metadata-User defined documentations of job
functions ,roles and responsibilities.
Metadata is data about data and is critical to understanding
and driving your business process and computational
resources .Storing and using metadata is as important to
your business as storing and using data.
69. Accenture Ab Initio Training 69
EME-Ab Initio Relevance
By integrating technical and business
metadata ,you can grasp the entirety of
your data processing – from operational to
analytical systems.
The EME is completely integrated
environment. The following figure shows
how it fits in to the high level architecture
of Ab Initio software.
71. Accenture Ab Initio Training 71
Stepwise explanation of Ab
Initio Architecture
You construct your application from the building blocks
called components, manipulating them through the
Graphical Development Environment (GDE).
You check in your applications to the EME.
The EME and GDE uses the underlining functionality of
the Co>Operating System to perform many of their
tasks. The Cooperating System units the distributed
resources into a single “virtual computer” to run
applications in parallel.
Ab Initio software runs on Unix ,Windows NT,MVS
operating systems.
72. Accenture Ab Initio Training 72
Stepwise explanation of Ab
Initio Architecture - continued
Ab Initio connector applications extract
metadata from third part metadata sources into
the EME or extract it from the EME into a third
party destination.
You view the results of project and application
dependency analysis through a Web user
interface .You also view and edit your business
metadata through a web user interface.
73. Accenture Ab Initio Training 73
EME :Various users
constituency served
The EME addresses the metadata needs of
three different constituencies:
Business Users
Developers
System Administrators
74. Accenture Ab Initio Training 74
EME :Various users
constituency served
Business users are interested in exploiting data
for analysis, in particular with regard to
databases ,tables and columns.
Developers tend to be oriented towards
applications ,needing to analyze the impact of
potential program changes.
System Administrator and production personnel
want job status information and run statistics.
75. Accenture Ab Initio Training 75
EME Interfaces
We can create and manage EME through
3 interfaces:
GDE
Web User Interface
Air Utility