SAP Data Services

SAP DATA SERVICES
A PRESENTATION BY GEETIKA

13 November 2019 Presentation titlePage 2
CONTENTS
1. Data Warehousing Overview
2. OLTP vs Data Warehouse
3. Data Mart
4. Data Warehousing Objects
5. Data Warehousing Schemas
6. Business Intelligence Overview
7. Operational Data Store
8. Fact Types
9. Slowly Changing Dimensions
10. ETL Overview
11. Datastores
12. Types of Datastores
13. Metadata Import
14. Data Services Object Hierarchy
15. Project
16. Jobs
17. Workflows
18. Dataflows
19. Embedded Dataflows
20. ABAP Dataflows
21. Log Files
22. Variables
23. Parameters
24. What is ETL ?

CONTENTS
25. SAP BODS Transforms Overview
26. Platform Transform
27. Data Integrator Transform
28. Query Transform
29. Case Transform
30. Map Operation Transform
31. Merge Transform
32. SQL Transform
33. Validation Transform
34. Data Integrator Transform
Geetika
SAP BI Consultant
35. Table Comparison Transform
36. History Preserving Transform
37. Key Generation Transform
38. Date Generation Transform
39. Pivot Transform
40. Reserve Pivot Transform

NEED FOR DATA WAREHOUSING
• Difficulty in obtaining integrated information
• Information structure not able to provide ‘full and dynamic’ analysis of information available
• Inconsistent results obtained from queries and reports arising from heterogeneous data sources
• Increased difficulty in delivering consistent comprehensive information in a timely fashion

WHY DATA WAREHOUSING?
Who are the potential
Customers ?
Which Products are sold the
most ?
What are the region-wise
preferences ?
What are the competitor
products ?
What are the projected
sales ?
What if you sale more
quantity of a particular
product ?
What will be the impact
on revenue ?
Results of promotion
schemes introduced ?
Need of Intelligent Information in Competitive Market

DATA WAREHOUSING OVERVIEW
• A data warehouse is a relational database that is designed for query and analysis rather than for
transaction processing. It usually contains historical data derived from transaction data.
• A data warehouse environment includes an extraction, transportation, transformation, and loading
(ETL) solution, online analytical processing (OLAP) and data mining capabilities, client analysis
tools, and other applications that manage the process of gathering data and delivering it to
business users.
• A data warehouse is a subject oriented, integrated, time-variant, and non-volatile collection of
data. This data helps analysts to take informed decisions in an organization.
• It is a series of processes, procedures and tools (h/w & s/w) that help the enterprise understand
more about itself, its products, its customers and the market it services

SUBJECT ORIENTED
• A data warehouse is subject
oriented because it provides
information around a subject
rather than the organization's
ongoing operations.
• These subjects can be product,
customers, suppliers, sales,
revenue, etc.
• A data warehouse does not focus
on the ongoing operations,
rather it focuses on modelling
and analysis of data for decision
making.
Operational
Systems
Data
Warehouse
Customer
Supplier
Product
Organized by processes
or tasks
Organized by
subject

INTEGRATED
• A data warehouse is constructed
by integrating data from
heterogeneous sources such as
relational databases, flat files,
etc.
• This integration enhances the
effective analysis of data.
• Data is stored once in a single
integrated location
• It is closely related with subject
orientation.
• Data from disparate sources need to
be put in a consistent format.
• Resolving of problems such as
naming conflicts and inconsistencies
Subject = Customer
Legacy
Mainframe
Customer
data
stored
in several
databases
RDBMS
Flat Files

TIME VARIANT
• The data collected in a data
warehouse is identified with a
particular time period.
• The data in a data warehouse
provides information from the
historical point of view.
• Data is stored as a series of
snapshots or views which record
how it is collected across time.
• It helps in Business trend analysis
• In contrast to OLTP environment,
data warehouse’s focus on change
over time that is what we mean by
time variant.
Data Warehouse
Time Data
{
Key

NON-VOLATILE
• Non-volatile means the previous
data is not erased when new data
is added to it.
• A data warehouse is kept
separate from the operational
database and therefore frequent
changes in operational database
is not reflected in the data
warehouse.
• This is logical because the
purpose of a data warehouse is
to enable you to analyze what
has occurred.

OLTP VS DATA WAREHOUSE
• OLTP systems are tuned for known transactions and workloads while workload is not known in a
data warehouse
• Special data organization, access methods and implementation methods are needed to support
data warehouse queries (typically multidimensional queries)
• OLTP
• Application Oriented
• Used to run business
• Detailed data
• Current up to date
• Isolated Data
• Repetitive access
• Clerical User
► Data warehouse
► Subject Oriented
► Used to analyze business
► Summarized and refined
► Snapshot data
► Integrated Data
► Ad-hoc access
► Knowledge User (Manager)

OLTP VS DATA WAREHOUSE (TO SUMMARIZE)
• OLTP Systems are
used to “run” a business
► The Data Warehouse helps to
“optimize” the business

DATA MART
• The data mart is a subset of the data warehouse and is usually oriented to a specific business line
or team.
• A data mart is a repository of data that is designed to serve a particular community of knowledge
workers.
• The goal of a data mart is to meet the particular demands of a specific group of users within the
organization, such as human resource management, sales etc.
• Data marts improve end-user response time by allowing users to have access to the specific type
of data they need to view most often by providing the data in a way that supports the collective
view of a group of users.

DATA WAREHOUSE END TO END
Metadata
Data Sources Data Management Access
Operational Data
Legacy Data
The Post
External Data
Sources
Enterprise
Data
Warehouse
Organizationally
structured
Extract
Transform
Load
Data
Mart
Data
Mart
Departmentally
structured
Data
Mart
Sales
Inventory
Purchase

DATA WAREHOUSING

DATA WAREHOUSING SCHEMAS
• A schema is a collection of database objects, including tables, views,
indexes, and synonyms.
• There is a variety of ways of arranging schema objects in the schema
models designed for data warehousing. The are:
Star Schema
Snowflake Schema
Galaxy Schema

STAR SCHEMA
• It Consists of a fact table connected to a set of dimensional tables
• Data is in Dimension tables is De-Normalized

SNOWFLAKE SCHEMA
 It is refinement of star schema where some dimensional hierarchy is normalized in to a set of
dimensional tables

GALAXY SCHEMA
 Multiple fact tables share dimension tables viewed as a collection of stars, therefore called galaxy
schema

BUSINESS INTELLIGENCE
• How intelligent can you make your business processes?
• What insight can you gain into your business?
• How integrated can your business processes be?
• How much more interactive can your business be with customers, partners, employees and
managers?

WHAT IS BUSINESS INTELLIGENCE (BI)?
• Business Intelligence is a generalized term applied to a broad category of applications and
technologies for gathering, storing, analyzing and providing access to data to help enterprise
users make better business decisions
• Business Intelligence applications include the activities of decision support systems, query and
reporting, online analytical processing (OLAP), statistical analysis, forecasting, and data mining
• An alternative way of describing BI is: the technology required to turn raw data into information
to support decision-making within corporations and business processes

OPERATIONAL DATA STORE (ODS)
An Operational Data Store (ODS) integrates data from multiple business operation sources to address
operational problems that span one or more business functions.
An ODS has the following features:
• Subject-oriented — Organized around major subjects of an organization (customer, product,
etc.), not specific applications (order entry, accounts receivable, etc.).
• Integrated — Presents an integrated image of subject-oriented data which is pulled from
fragmented operational source systems.
• Current — Contains a snapshot of the current content of legacy source systems. History is not
kept, and might be moved to the data warehouse for analysis.
• Volatile — Since ODS content is kept current, it changes frequently. Identical queries run at
different times may yield different results.
• Detailed — ODS data is generally more detailed than data warehouse data. Summary data is
usually not stored in an ODS; the exact granularity depends on the subject that is being
supported.

OPERATIONAL DATA STORE (ODS) CONTD..
The ODS provides an integrated view of data in operational systems.
As the figure below indicates, there is a clear separation between the ODS and the data warehouse.

BENEFITS OF ODS
• Supports operational reporting needs of the organization
• Operates as a store for detailed data, updated frequently and used for drill-downs from the data
warehouse which contains summary data.
• Reduces the burden placed on other operational or data warehouse platforms by providing an
additional data store for reporting.
• Provides more current data than in a data warehouse and more integrated than an OLTP system
• Feeds other operational systems in addition to the data warehouse

DATA WAREHOUSING OBJECTS
Fact Tables:
• Represent a business process, i.e., models the business process as an artifact in the data model
• Contain the measurements or metrics or facts of business processes
• "monthly sales number" in the Sales business process
• most are additive (sales this month), some are semi-additive (balance as of), some are not
additive (unit price)
• The level of detail is called the “grain” of the table
• Contain foreign keys for the dimension tables

DATA WAREHOUSING OBJECTS (CONTD..)
Dimension Tables:
• Dimension tables
• Define business in terms already familiar to users
• Wide rows with lots of descriptive text
• Small tables (about a million rows)
• Joined to fact table by a foreign key
• heavily indexed
• typical dimensions
• time periods, geographic region (markets, cities), products, customers,
salesperson, etc.

FACT TYPES
• Additive facts:
Additive facts are facts that can be summed up through all of the dimensions in the fact table
• Semi-Additive facts:
Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table
• Non-additive facts:
Non-additive facts are facts that cannot be summed up for any of the dimensions Present in the
fact table

EXAMPLE OF ADDITIVE FACT
Fact Table :
• The purpose of this table is to record the Sales_Amount for each product in each store
on a daily basis. Sales_Amount is the fact.
• In this case, Sales_Amount is an additive fact, because we can sum up this fact along
with any of the 3 dimensions present in the fact table – date, store, and product
Date
Store
Product
Sales_Amount

EXAMPLE OF SEMI ADDITIVE & NON-ADDITIVE FACTS
Fact Table :
 The purpose of this table is to record the current balance for each account at the end of
each day, as well as the profit margin for each account for each day
 Current_Balance & Profit_Margin are the facts
 Current_Balance is a semi additive fact, as it makes sense to add them up for all
accounts (what’s the total current balance for all accounts in the bank?), but it does not
make sense to add them up through time
 Profit_Margin is a non additive fact, for it does not make sense to add them up for the
Date
Account
Current_Balance
Profit_Margin

SLOWLY CHANGING DIMENSIONS
• Various data elements in the dimension undergo changes (e.g. changes in attributes, hierarchical
structures) which need to be captured for analysis.
• In a nutshell, this applies to cases where the attribute for a record varies over time.
• Example :
• Christina is a customer who first lived in Chicago, Illinois.
At a later date, she moved to Los Angeles, California.
Now how to modify the table to reflect this change?
This is a “Slowly Changing Dimension problem”
Customer key Name State
1001 Christina Illinois

TYPES OF SCD
• There are 3 types of SCDs :-
• Type 1
• Type 2
• Type 3
Type 1: New record places the original record. No trace of the old record exists.
Type 2: A new record is added to the dimension table
Type 3: The Original record is modified to reflect the change

SCD TYPE 1
New record places the original record. No trace of the old record exists
Eg:
1001 Christina California
Advantages:
This is the easiest way to handle the Slowly Changing dimension, Since there is no need to
keep track of the old information.
Disadvantages:
All the history is lost. By applying this methodology, it is not possible to track back in history. For
eg In the
above case, the company would not able to know that Christina lived in Illinois before.

SCD TYPE 2
In type 2 SCD a new record is added to the table to represent the new
Information. Therefore both the original & the new record will be present
Eg:
1005
Christina
California
After Christina moved from Illinois to California, we add the new information as a new row into the
table
Advantages:
This allows us to accurately keep all historical information
Disadvantages:
This will cause the size of the table to grow fast where the number of rows for the
table is very high to start with, storage and performance can become a concern

SCD TYPE 3
In type 3 SCD there will be two columns to indicate the particular attribute of
interest, one indicating the original value, and one indicating the current value.
There will also be a column that indicates when the current value becomes active.
Eg: Customer key Name Original State Current State Effective Date
California 15-Jan-03
After Christina moved from Illinois to California, the original information gets updated,
And we have the above table (Assuming the effective date of change is January 15,2003)
Advantages:
 This does not increase the size of the table, since new information is updated.
 This allows us to keep some part of history
Disadvantages:
Type 3 will not be able to keep all history where an attribute is changed more than
Once. For eg, if Christina later moves from to Texas on December 15,2003 the
California information is lost.

WHAT IS ETL ?
• ETL stands for extract, transform, and load.
• ETL is software that enables businesses to consolidate their disparate data while moving it from place to
place, and it doesn't really matter that that data is in different forms or formats
• We have many ETL Tools e.g. BODS, Informatica, IBM InfoSphere Data stage, AbInitio, Oracle Warehouse
Builder (OWB)
• It can be used for below purpose
• As Middle ware
• In Data warehouse
• SAP Data Conversion/Migration

ETL PROCESS

ETL TERMS
• Source System
A database, application, file, or other storage facility from which the data in a data warehouse is
derived.
• Mapping
The definition of the relationship and data flow between source and target objects.
• Metadata
Data that describes data and other structures, such as objects, business rules, and processes. For
example, the schema design of a data warehouse is typically stored in a repository as metadata,
which is used to generate scripts used to build and populate the data warehouse. A repository
contains metadata.
• Staging Area
A place where data is processed before entering the warehouse.
• Cleansing
The process of resolving inconsistencies and fixing the anomalies in source data, typically as part of
the ETL process.
• Transformation
The process of manipulating data. Any manipulation beyond copying is a transformation. Examples
include cleansing, aggregating, and integrating data from multiple sources.

DATASTORES
• Datastores
• Are used to setup connection between an application and the database.
• Must be specified for every source and target database.
• Are used to import metadata for source and target databases and tables into the repository.
• Are used by Data Services to read data from source tables or load data to target tables.
• In Business Objects Data Services, you can connect to the following systems using Datastore :−
• Mainframe systems and Database
• Applications and software with user written adapters
• SAP Applications, SAP BW, Oracle Apps, Siebel, etc.

TYPES OF DATASTORES
• Custom Datastores provide a simple way to import metadata directly from a broad variety of
relational database management systems (RDBMS)
• Application Datastores lets users easily import metadata from most Enterprise Resource Planning
(ERP) systems
• Adapter Datastores allow users to import metadata from any source. Specific adapters may be
purchased from Business Objects, or can be developed by customers or third parties as
documented in Business Objects Adapter Development Kit (ADK)

DEFINING A DATASTORE
• To define a Datastore, you must have an
account with access privileges to the
database or application hosting the data
you need to access (user name and
password).
• Datastores are defined in the Datastores tab
of the object library using the Datastore
Editor.
• The Datastore options available depend on
which RDBMS or application is used for the
Datastore.

DEFINING A DATASTORE (CONT.)
• Datastore Editor
• Used to define/edit a Datastore
• Give the Datastore a meaningful name
• Choose the application type of your
Datastore
• You must enter the parameters of the
database to which you are connecting.

DATASTORE ADVANCED CONFIGURATION
• You can toggle the Advanced button to hide
and show the grid of additional Datastore
editor options.
• The grid displays Datastore configurations as
column headings and lists Datastore options
in the left column. Each row represents a
configuration option.
• Different options appear depending upon
Datastore type and (if applicable) database
type and version. Specific options appear
under group headings such as Connection,
General, and Locale

DATASTORE ADVANCED CONFIGURATION (CONT.)
• You can toggle the Advanced button to hide
and show the grid of additional Datastore
editor options.
• The grid displays Datastore configurations as
column headings and lists Datastore options
in the left column. Each row represents a
configuration option.
• Different options appear depending upon
Datastore type and (if applicable) database
type and version. Specific options appear
under group headings such as Connection,
General, and Locale

METADATA IMPORT
• Data Services stores the following table information :
• Table name, attributes, indexes
• Column names, descriptions, data types, primary keys
• Data Services only updates the loaded table information manually.
• Changes made to underlying table schema’s or functions are not automatically imported into
Business Objects Data Services.

SELECTIVE IMPORT
• Import metadata by Browsing :
• 1. In the object library, Datastores tab,
right-click on Datastore you want to
import to and select Open
• 2. From the workspace, right-click the
required table and select Import
Note: Only metadata is imported

SELECTIVE IMPORT (CONT.)
• Import metadata by Name-
1. In the object library, Datastores tab,
right-click the Datastore you want to
import to and select Import By Name
2. Complete the information in
Import By Name dialog box

SELECTIVE IMPORT (CONT.)
• Import Search for data
• Basic search of external or
imported (internal) data
• Advanced search of imported
(internal) data only

DATA SERVICES OBJECT HIERARCHY

PROJECT
• Project is the highest level of object offered by Data Services.
• They are listed in the object library under Project tab.
• Are used to group and organize related objects
• May contain any number of: Jobs, Workflows, Data flows etc.
• Only one project can be open at a time.
• Can be shared among multiple users using ATL files or a Central Repository
• Steps to create a new project :
• Choose Project > New > Project.
• Enter the name of your new project. The name can include alphanumeric
characters and underscores (_). It cannot contain blank spaces.

JOBS
• Jobs are the only executable object in the SAP BODS.
• Are reusable objects and next level of organization below a project.
• Contain Workflows (optional) and/or Dataflows.
• Can call many Workflows.
• Can be assigned in any projects available in local repository by dragging it from local object library.
• Are the highest level of logging which happens at this level.

JOBS (CONT.)
• Batch Job
A batch job extracts, transforms, and loads data. It is something that you start, it
does the processing like reading tables and loading the data warehouse, and then it
stops until it is started again, e.g. every night, twice a day, every 4 hours, or
manually started.
• Real Time Job
Like a batch job, a real-time job also extracts, transforms, and loads data. A Real
Time job is started once at the beginning and keeps running as long as the server is
active. Whenever a new message is sent through a SOAP request, it will get processed
and then the Real Time job sends a SOAP response and waits for the next request.

JOBS (CONT.)
Jobs are created in the Project area or in the Object Library.
• Create Job in Project area:
1. In the Project Area, select the Project Name.
2. Right-click and choose New Batch Job or New Real Time Job and then edit the
name.
3. SAP BODS opens a new workspace which is ready to define the job.
• Create Job in Object Library:
1. In Object Library, select Job tab.
2. Right-click Batch Jobs or Real Time Jobs and choose New.
3. A new job with a default name appears.
4. Right-click and select Properties to change the object's name and add a
description.

WORKFLOWS
• A Workflow defines the decision-making process for executing Dataflows.
• It is a reusable component used to group Dataflows and or Workflows together.
• The Workflow helps to define the execution order of the Dataflows and supporting operations.
• Defined System Parameters can be used to pass values into the workflow.
• Variables can also be defined for use inside the workflow.
• Workflows may contain the following objects : Workflow, Dataflow, Script, Conditional, Try, Catch,
While

DATAFLOWS
• Data flows extract, transform, and load data; reading sources, transforming data, and loading
targets, occurs inside a data flow.
• A data flow can be added to a job or a work flow.
• Dataflows are reusable objects.
• Workflows and Jobs call Dataflows to perform data movement operations.

EMBEDDED DATAFLOWS
• An embedded Dataflow is a Dataflow that is called from inside another Dataflow. Data passes into
or out of the embedded Dataflow from the parent flow through a single source or target.
• The embedded Dataflow can contain any number of sources or targets, but only one input or one
output can pass data to or from the parent Dataflow.
• An embedded Dataflow is a design aid that has no effect on job execution.
• When SAP BODS executes the parent Dataflow, it expands any embedded Dataflows, optimizes the
parent Dataflow, then executes it.

EMBEDDED DATAFLOWS (EXAMPLE)
• The Example of when to use embedded Dataflows:
• In this example, a Dataflow uses a single source to load three different target systems. The Case
transform sends each row from the source to different transforms that process it to get a unique
target output.
• You can simplify the parent Dataflow by using embedded Dataflows for the three different cases.

EMBEDDED DATAFLOWS (CONT.)
There are two ways to create embedded Dataflows:
• Select objects within a Dataflow, right-click, and select Make Embedded
Dataflow.
• Drag a complete and fully validated Dataflow from the object library into an
open Dataflow in the workspace. Then- Open the Dataflow you just added.
Right-click one object you want to use as an input or as an output port and
select Make Port for that object.

ABAP DATAFLOWS
• An ABAP Dataflow extracts and transforms data from SAP application tables, files, and hierarchies.
• The ABAP Dataflow produces a data set that you can use as input to other transforms, save to a file
that resides on an SAP application server, or save to an SAP table.
• When SAP BODS executes ABAP Dataflows, it translates the extraction requirements into ABAP
programs and passes them to SAP to execute.
• ABAP Dataflows generate ABAP code. ABAP Dataflows and data transport objects also appear in the
tool palette.
• In the ABAP Dataflow, after specifying the source and transformations, specify the target, which is
the data transport object.
• The data transport object in ABAP Dataflows makes the data set available to the calling Dataflow, it
will be a file on the SAP application server.
• Once the ABAP Dataflow is defined, in the normal Dataflow, connect the ABAP Dataflow to the
downstream transforms and the Target object where the data has to be loaded.

LOG FILES
• As a Job executes, Data Integrator produces the three types of log files that can be viewed in the
Designer Project Area :
• Monitor Log
• Statistics Log
• Error Log
• The log files are, by default, also set to display automatically in the workspace when you execute a
Job.

LOG FILES (CONT.)
Monitor Log : Displays each step of each data flow in the job, the number of rows streamed through
each step, and the duration of each step.

LOG FILES (CONT.)
Statistics log: Itemizes the steps executed in the job and the time execution begins and ends.

LOG FILES (CONT.)
Error log : Displays the name of the object being executed when a Data Integrator error occurred. Also
displays the text of the resulting error message.

VARIABLES
• Variables are symbolic placeholders for values.
• Local variables are restricted to the object in which they are created (job or work flow). You must use
parameters to pass local variables
• Global variables are restricted to the job in which they are created. However, they do not require
parameters to be passed to work flows and data flows.
• The data type of a variable can be any supported by the software such as an integer, decimal, date,
or text string.
• You can increase the flexibility and reusability of work flows and data flows by using local and
global variables when you design your jobs.

VARIABLES (CONT.)
• If you define variables in a job or work flow, the software typically uses them in a script, catch, or
conditional process.
• You can use variables inside data flows. For example, use them in a custom function or in the
WHERE clause of a query transform.

PARAMETERS
• Parameters can be defined to:
• Pass their values into and out of work flows
• Pass their values into data flows
• Each parameter is assigned a type: input, output, or input/output. The value
passed by the parameter can be used by any object called by the work flow
or data flow.

VARIABLES & PARAMETERS
• Variables and parameters are used differently based on the object type and whether the variable is
local or global.
• The following table lists the types of variables and parameters you can create based on the object
type and how you use them.

DATA SERVICES TRANSFORMS
• Data Services offers a number of pre-defined transformations and functional objects for both Data
Integration and Data Quality, that allow modelling of the ETL flows.
• Each transform is a step in a Dataflow that acts on a data set.
• A transform enables you to control how data sets change in a Dataflow.
• Transforms operate on data sets by manipulating input sets and producing one or more output
sets.
• The software includes many built-in transforms. These transforms are available from the object
library on the Transforms tab.
• Transforms are divided into 3 categories : Platform, Data Integrator & Data Quality

QUERY TRANSFORM
• The Query transform retrieves a data set that satisfies conditions that you specify. A Query
transform is similar to a SQL SELECT statement.
• A query has data outputs, which are data sets based on the conditions that you specify using the
schema specified in the output schema area.
• Query transform has the following tabs : SELECT, FROM, WHERE, GROUP BY, ORDER BY, ADVANCED
Platform Transforms

CASE TRANSFORM
• Specifies multiple paths in a single transform (different rows are processed in different ways).
• The Case transform simplifies branch logic in data flows by consolidating case or decision making
logic in one transform.
• Paths are defined in an expression table.
Platform Transforms

MAP_OPERATION TRANSFORM
• Modifies data based on mapping expressions and current operation codes. The operation codes can be
converted between data manipulation operations.
• This transform can also change operation codes on data sets to produce the desired output. For example,
if a row in the input data set has been updated in some previous operation in the data flow, you can use
this transform to map the UPDATE operation to an INSERT. The result of converting UPDATE rows into
INSERT rows is the preservation the rows in the target.
• Data Services can push Map_Operation transforms to the source database.
• The Map Operation tab would have the following settings:
Platform Transforms

MERGE TRANSFORM
• Combines incoming data sets, producing a single output data set with the same schema as the
input data sets.
• All sources must have the same schema, including:
• The same number of columns
• The same column names
• The same data type of columns
• A data set consisting of rows from all sources, with any operation codes. The output data has the
same schema as the source data, including nested schemas.
Platform Transforms

SQL TRANSFORM
• Performs the indicated SQL query operation.
• Use this transform to perform standard SQL operations when other built-in transforms cannot
perform them.
• The options for the SQL transform include specifying a Datastore, join rank, cache, array fetch size,
and entering SQL text.
• There are two ways of defining the output schema for a SQL transform:
• Automatic: After you type the SQL statement, click Update schema to execute a described select
statement against the database which obtains column information returned by the select statement and
populates the output schema.●
• Manual: Output columns must be defined in the output portion of the SQL transform. The number of
columns defined in the output of the SQL transform must equal the number of columns returned by the
SQL query.
Platform Transforms

VALIDATION TRANSFORM
• The Validation transform qualifies a data set based on
rules for input schema columns.
• You can apply multiple rules per column or bind a
single reusable rule (in the form of a validation
function) to multiple columns.
• The Validation transform can identify the row, column,
or columns for each validation failure.
• You can also use the Validation transform to filter or
replace (substitute) data that fails your criteria.
• When you enable a validation rule for a column, a
check mark appears next to it in the input schema.
Platform Transforms

TABLE COMPARISON TRANSFORM
• Compares two data sets and produces the difference between them as a data set with rows flagged
as INSERT or UPDATE.
• The Table_Comparison transform allows you to detect and forward changes that have occurred
since the last time a target was updated.
• Allows you to identify changes to a target table for incremental updates
• Three possible outcomes from this transform:
• New row can be added
• Existing record can be updated
• Row is can be ignored
Data Integrator Transforms

TABLE COMPARISON TRANSFORM (CONT.)
Presence of output
columns indicates
proper configuration
Comparison table
(usually the target)

HISTORY PRESERVING TRANSFORM
• The History_Preserving transform allows you to produce a new row in your target rather than
updating an existing row. You can indicate in which columns the transform identifies changes to be
preserved.
• The History_Preserving transform requires input rows flagged as inserts and updates.
• The History_Preserving transform is usually preceded by a Table_Comparison, which provides the
required input row types.

KEY GENERATION TRANSFORM
• Generates sequential key values for new rows, starting
from the maximum existing key value in a specified table
• Allows you to build a new physical primary key, e.g. for
preserving history
• When it is necessary to generate artificial keys in a table,
the Key_Generation transform looks up the maximum
existing key value from a table and uses it as the starting
value to generate new keys.
• The transform expects the generated key column to be
part of the input schema.

DATE GENERATION TRANSFORM
• Produces a series of dates incremented as you specify.
• Use this transform to produce the key values for a time dimension target.
• From this generated sequence you can populate other fields in the time dimension (such as
day_of_week) using functions in a query.
• To create a time dimension target with dates from the beginning of the year 1997 to the end of
the year 2000, place a Date_Generation transform, a query, and a target in a data flow. Inside
the Date_Generation transform, specify the following Options :
• Start date: 1997.01.01
• End date: 2000.12.31 (A variable can also be used.)
• Increment: Daily (A variable can also be used.)

PIVOT TRANSFORM (COLUMNS TO ROWS)
• Creates a new row for each value in a column that you identify as a pivot column.
• The Pivot transform allows you to change how the relationship between rows is displayed.
• For each value in each pivot column, Data Services produces a row in the output data set.
• You can create pivot sets to specify more than one pivot column.
Name Jan Feb Mar
Joe 1100 500 900
Sid 500 1200 300
Dolly 900 1300 200
Name0 Sequence Month Q1
Joe 1 Jan 1100
Joe 2 Feb 500
Joe 3 Mar 900
Sid 1 Jan 500
Sid 2 Feb 1200
Sid 3 Mar 300
Dolly 1 Jan 900
Dolly 2 Feb 1300
Dolly 3 Mar 200
Input: Output:
Non Pivot columns: Name
Pivot columns: Jan,
Feb, Mar
Sequence name:
Sequence
Pivot data field:
Q1_Expenses
Pivot header name: Month

REVERSE PIVOT TRANSFORM (ROWS TO COLUMNS)
• Creates one row of data from several existing rows.
• The Reverse Pivot transform allows you to combine data from several rows into one row by
creating new columns.
• For each unique value in a pivot axis column and each selected pivot column, Data Services
produces a column in the output data set.

REVERSE PIVOT TRANSFORM (ROWS TO COLUMNS) (CONT.)
Input:
Output:

SAP Data Services

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie SAP Data Services

Ähnlich wie SAP Data Services (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

SAP Data Services