3. Antonios Chatzipavlis
Solution Architect – SQL Server Evangelist
• I have been started with computers since 1982.
• In 1988 I started my professional carrier in computers industry.
• In 1996 I have been started to work with SQL Server version 6.0
• In 1998 I earned my first certification at Microsoft as
Microsoft Certified Solution Developer (3rd in Greece).
• Since then I earned more than 30 certifications (and counting!)
• MCT, MCITP, MCPD, MCSD, MCDBA, MCSA, MCTS, MCAD, MCP, OCA,
• MCSE : Data Platform
• In 1998 started my carrier as Microsoft Certified Trainer (MCT).
• Since then I have more than 15.000 hours of training.
• In 2010 I became for first time Microsoft MVP on SQL Server
• In 2010 I created the SQL School Greece (www.sqlschool.gr)
• I am board member of IAMCT organization and Leader of Greek
Chapter of IAMCT organization.
• In 2012 I became MCT Regional Lead by Microsoft Learning Program.
• I am moderator of autoexec.gr and member of dotnetzone.gr
SP_WHO
8. WHAT IS SSIS?
• A platform for ETL operations
• Installed as a feature of SQL Server
• Useful for DBAs, Developers, Data Analysts
• DTS Evolution
9. SSIS ARCHITECTURE (1)
• SSIS Project
• A versioned container for parameters and packages
• A unit of deployment to an SSIS Catalog
• SSIS Package
• A unit of task flow execution
• A unit of deployment (package deployment model)
10. SSIS ARCHITECTURE (2)
Control Flow
• It is the brain of package
• Orchestrates the order of execution for all its
components
• Tasks
• Individual units of work
• Precedence constraints
• Directs tasks to execute in a give order
• Defines the workflow of SSIS package
• Containers
• Core units for grouping tasks together logically into units of work
• Give us the abilityto define variables and events within container scope
11. SSIS ARCHITECTURE (3)
Data Flow
• It is the heart of package
• Provides the capability to implements ETL
• Sources
• A component which specify the location of the source data
• Transformation
• Allow changes to the data within the data pipeline
• Destinations
• A component which specify the destination of the transformed data
12. SSIS ARCHITECTURE (4)
• Variables
• SSIS variables can be set to evaluate to an expression at runtime
• Parameters
• Parameters behave much like variables but with a few main exceptions.
• Parameters can make a package dynamic.
• Parameters can be set outside the package easily
• Can be designated as values that must be passed in for the package to
start
• Parameters are new to SSIS in SQL Server 2012 and replace the capabilities
of Configurations in previous releases of SQL Server.
• Error Handling and Logging
19. SSIS DATA TYPES
• Named differently than similar types in .NET
or T-SQL
• C:Program FilesMicrosoft SQL Server110DTSMappingFiles
• The .NET managed types are important only
if you are using:
• Script component
• CLR
• .NET-based coding to manipulate your Data Flows.
20. SSIS DATA TYPE SQL SERVER DATA TYPE .NET DATA TYPE
DT_NUMERIC numeric System.Decimal
DT_DECIMAL Decimal
DT_CY numeric, decimal
DT_I1 System.Sbyte
DT_I2 smallint System.Int16
DT_I4 int System.Int32
DT_BOOL Bit System.Boolean
DT_I8 bigint System.Int64
DT_R4 real System.Single
DT_R8 float System.Double
DT_U1 tinyint System.Byte
DT_U2 System.UInt16
DT_U4 System.UInt32
DT_U8 System.UInt64
DT_GUID Uniqueidentifier System.Guid
SSIS NUMERIC DATA TYPES TABLE MAPPINGS
21. SSIS DATA TYPE SQL SERVER DATA TYPE .NET DATA TYPE
DT_WSTR nvarchar, nchar System.String
DT_STR varchar, char
DT_TEXT text
DT_NTEXT ntext, sql_variant, xml
DT_BYTES binary, varbinary System.Byte()
DT_IMAGE timestamp, image
DT_DBTIMESTAMP smalldatetime, datetime System.DateTime
DT_DBTIMESTAMP2 datetime
DT_DBDATE Date
DT_DATE
DT_FILETIME
DT_DBDATETIMESTAMPOFFSET datetimeoffset
DT_DBTIME2 time System.TimeSpan
DT_DBTIME
SSIS STRING AND DATE-TIME DATA TYPES TABLE MAPPINGS
22. PERFORMANCE AND DATA TYPES
• Convert only when necessary.
• Don’t convert columns from a data source that will be dropped from the data
stream.
• Each conversion costs something.
• Convert to the closest type for your destination
source using the mapping files.
• If a value is converted to a non-supported data type, you’ll incur an additional
conversion internal to SSIS to the mapped data type.
• Convert using the closest size and precision.
• Don’t import all columns as 50-character data columns if you are working with
a fixed or reliable file format with columns that don’t require as much space.
• Evaluate the option to convert after the fact.
• Remember that SSIS is still an ETL tool and sometimes it is more efficient to
stage the data and convert it using set-based methods.
23. UNICODE AND NON-UNICODE
• Unicode data type is the default
• Default Import behavior
• All the string SSIS functions expect Unicode strings as input
• Use Data Conversion Transformation to
convert non-Unicode data types to
appropriate Unicode data type according to
the mapping table
24. CONVERSION IN SSIS EXPRESSIONS
• Use explicit casting to avoid troubles
• Casting is easy
• It looks like .NET
• (DT_I4) 32
• Casting operators parameters
• Length – Final string length
• Code_Page – Unicode character set
• Precision
• Scale
25. VARIABLES
• Key feature in SSIS
• User Variables
• Variables created by an SSIS developer to hold dynamic values
• Defined in the User namespace by default User::MyVar
• Defined at a specified scope
• Syntax: User::MyVar
• System Variables
• Built-in variables with dynamic system values
• Defined in the System namespace
• Syntax: System::MyVar
26. PARAMETERS
• New feature in SQL Server 2012.
• Similar to a variable
• Can store information
• It has a few different properties and uses
• Parameters are set externally
• Project parameters
• Package parameters
• Replace package configurations
• when using the project deployment model
• Required property of the parameter
• Necessitatethe caller of the package to pass in a value for the parameter
27. PROJECT PARAMETERS
• Created at the project level
• Can be used in all packages
that are included in that
project.
• Best used for values that
should be shared among
packages such as e-mail
addresses for error
messages.
28. PACKAGE PARAMETERS
• Created at the package
level
• Can be used only in that
package
• Best used for values specific
to that package, such as
directory locations.
29. PARAMETERS VS VARIABLES
Parameters
If you want to set the
value of something
from outside the
package
Variables
If you want to create
or store values only
within a package
30. VARIABLE & PARAMETERS DATA TYPES
VARIABLE DATA TYPE SSIS DATA TYPE REMARKS
Boolean DT_BOOL Be careful setting these data types in code because the expression language and
.NET languages define these differently
Byte DT_UI1 A 1-byte unsigned int. (Note this is not a byte array.)
Char DT_UI2
DateTime DT_DBTIMESTAMP
DBNull N/A
Double DT_R8
Int16 DT_I2
Int32 DT_I4
Int64 DT_I8
Object N/A An object reference. Typically used to store data sets or large object structures
Sbyte DT_I1
Single DT_R4
String DT_WSTR
UInt32 DT_UI4
UInt64 DT_UI8
31. EXPRESSIONS
• Used to set values dynamically
• Properties
• Conditional split criteria
• Derived column values
• Precedence constraints
• Based on Integration Services expression syntax
• Can include variables and parameters
• The language is heavily C#-like but contains a Visual Basic flavor and sometimes
T-SQL for functions
• Can be created graphically by using Expression
Builder
• Expression adorners, a new feature in SQL Server
2012
35. DEALING WITH NULL IN SSIS
• In SSIS variable can not be set to NULL
• IsNull SSIS function <> ISNULL T-SQL
• Each data type maintains a default value
VARIABLE DATA TYPE DEFAULT VALUE VARIABLE DATA TYPE DEFAULT VALUE
Boolean False DBNull N/A in an expression
Byte 0 Object N/A in an expression
Char 0 String “”
DateTime 30/12/1899 Sbyte 0
Double 0 Single 0
Int16 0
Int32 0 UInt32 0
Int64 0 UInt64 0
36. STRING FUNCTIONS
• SSIS String Functions are UNICODE
• Is different from SQL Server string functions
• The comparison is case and padding
sensitive
• Comparing strings requires that you have two strings of the same padding
length and case.
37. USING EXPRESSIONS IN SSIS PACKAGE
• Variable and Parameters as Expressions
• Expressions in Connection Manager
Properties
• Expressions in Control Flow Tasks
• Expressions in Control Flow Precedence
• Expression Task new in SQL Server 2012
• Expression in Data Flow
39. WHAT IS A TASK
• A foundation of the Control Flow in SSIS
• A discrete unit of work that can perform
typical actions
• Each task has a set of setup parameters
• Setup parameters are visible at Task Editor
• Most of these parameters can be dynamic by
using expression on Task Editor Expression
Tab
40. CONTROL FLOW TASKS
Data Flow Tasks Database Tasks File & Internet Tasks
• Data Flow • Data Profiling
• Bulk Insert
• Execute SQL
• Execute T-SQL
• CDC Control
• File System
• FTP
• XML
• Web Service
• Send Mail
Process Execution Task WMI Tasks Custom Logic Tasks
• Execute Package
• Execute Process
• WMI Data Reader
• WMI Event Watcher
• Script
• Custom Tasks
Database Transfer Tasks Analysis Services Tasks SQL Server Maintenance Tasks
• Transfer Database
• Transfer Error Messages
• Transfer Jobs
• Transfer Logins
• Transfer Master Stored
Procedures
• Transfer SQL Server Objects
• Analysis Services Execute DDL
• Analysis Services Processing
• Data Mining Query
• Back Up Database
• Check Database Integrity
• History Cleanup
• Maintenance Cleanup
• Notify Operator
• Rebuild Index
• Reorganize Index
• Shrink Database
• Update Statistics
45. COMMON PROPERTIES (1)
• If set to true, SSIS will not validate any of the
properties set in the task until runtime.
• This is useful if you are operating in a
disconnected mode
• Want to enter a value for production that cannot be validated
until the package is deployed, or you are dynamically setting
the properties using expressions
• The default value for this property is false.
DelayValidation
46. COMMON PROPERTIES (2)
• The description of what the instance of the task does.
• The default name for this is <task name>
• if you have multiple tasks of the same type, it would read <task name 1> (where the number 1
increments).
• This property does not have to be unique and is optional.
• If you do provide details here, it will display in the tooltip
when hovering over the task object.
• For consistency, the property should accurately describe
what the task does for people who may be monitoring
the package in your operations group.
Description
47. COMMON PROPERTIES (3)
• If set to true, the task is disabled and will not
execute.
• Helpful if you are testing a package and want
to disable the execution of a task temporarily.
• Is the equivalent of commenting a task out
temporarily.
• Is set to false by default.
Disable
48. COMMON PROPERTIES (4)
• Contains the name of the custom variable
that will store the output of the task’s
execution.
• The default value of this property is <none>
means the execution output is not stored
• Enables the task to expose information
related to the results of the internal actions
within the task.
ExecValueVariable
49. COMMON PROPERTIES (5)
• If set to true
• The entire package will fail if the individual task fails.
• Typically, you want to control what happens to a package if a task fails with
a custom error handler or Control Flow.
• By default, this property is set to false.
FailPackageOnFailure
50. COMMON PROPERTIES (6)
• If set to true
• The task’s parent will fail if the individual task reports an error.
• The task’s parent can be a package or container
• By default, this property is set to false.
FailParentOnFailure
51. COMMON PROPERTIES (7)
• Read-only
• Automatically generated unique ID
• Associated with an instance of a task.
• The ID is in GUID format
• {BK4FH3I-RDN3-I8RF-KU3F-JF83AFJRLS}.
ID
52. COMMON PROPERTIES (8)
• Specifies the isolation level of the transaction
• The values are
• Chaos
• ReadCommitted
• ReadUncommitted
• RepeatableRead
• Serializable
• Unspecified
• Snapshot.
• The default value of this property is Serializable
IsolationLevel
53. COMMON PROPERTIES (9)
• Specifies the type of logging that will be
performed for this task.
• The values are
• UseParentSetting
• Enabled
• Disabled.
• The default value is UseParentSetting
• Basic logging is turned on at the package level by
default in SQL Server 2012.
LoggingMode
54. COMMON PROPERTIES (10)
• The name associated with the task.
• The default name for this is <task name>
• Ιf you have multiple tasks of the same type, <task name 1> (where the number
1 increments)
• Change this name to make it more readable to an
operator at runtime
• Ιt must be unique inside your package.
• Used to identify the task programmatically
Name
55. COMMON PROPERTIES (11)
• Specifies the transaction attribute for the task.
• Possible values are
• NotSupported
• Supported
• Required.
• The default value is Supported
• Supported enables the option for you to use transactions in your task.
TransactionOption
56. DATA FLOW TASK
• The heart of SSIS
• Has its own design surface
• Encapsulates all the data transformation
aspects of ETL
• Each Data Flow tasks corresponding to
separate Data Flow surface
• Split and handle data in pipelines based on
data element
57. Database Tasks
• Data Profiling
• Bulk Insert
• Execute SQL
• Execute T-SQL
• CDC Control
58. DATA PROFILER TASK
• Examining data and collecting
metadata about the quality of the
data
• About frequency of statistical patterns,
interdependencies, uniqueness, and redundancy.
• Important for the overall quality and health of an
operational data store (ODS) or data warehouse.
• Doesn’t have built-in conditional
workflow logic
• but technically you can use XPath queries on the
results
• Creates a report on data statistics
• You still need to make judgments about these
statistics.
• For example, a column may contain an overwhelming
amount of NULL values, but the profiler doesn’t know
whether this reflects a valid business scenario.
EXAMINING AREAS
o Candidate Key Profile
Request
o ColumnLength Distribution
Profile
o ColumnNull Ratio Profile
Request
o ColumnPattern Profile
Request
o ColumnStatistics Profile
Request
o FunctionalDependency
Profile Request
o ValueInclusionProfile
Request
60. BULK INSERT TASK
• Inserts data from a text or flat file into a SQL
Server database
• Similar to BULK INSERT statement or BCP.exe
• Very fast operation especially with large amount of data
• In fact this is a wizard to store the information needed to create and
execute a bulk copying command at runtime
• Has no ability to transform data
• Because of this give us the fastest way to load data
61. EXECUTE SQL TASK
• Is one of the most widely used task in SSIS
• Used for
• Truncating a staging data table prior to importing
• Retrieving row counts to determine the next step in a workflow
• Calling stored proc to perform business logic against sets of stage data
• Retrieve information from a database repository
• Executing a Parameterized SQL Statement
• ? Indicates the parameter when we are using ADO, ODBC, OLEDB and
EXCEL
• ADO, OLEDB and EXCEL is zero base
• ODBC starts from 1
• @<Real Param Name> when we are using ADO.NET
63. EXECUTE T-SQL STATEMENT TASK
• Similar to Execute SQL Task
• Supports only T-SQL for SQL Server
64. CDC CONTROL TASK
• Used to control the life cycle of change data
capture (CDC)
• Handles CDC package synchronization
• Maintains the state of the CDC package
• Supports two groups of operations.
• One group handles the synchronization of initial load and change
processing
• The other manages the change-processing range of LSNs for a run of a
CDC package and keeps track of what was processed successfully.
65. File & Internet Tasks
• File System
• FTP
• XML
• Web Service
• Send Mail
66. FILE SYSTEM TASK
• Performs file operations available in the
System.IO.File .NET class.
• The creation of directory structures does not
have to be made recursively as we did in the
DTS legacy product.
• It is written for a single operation
• If you need to iterate over a series of fi les or directories, the File System
Task can be simply placed within a Looping Container
68. FTP TASK
• Enables the use of the File Transfer Protocol
(FTP) in your package development tasks.
• Exposes more FTP command capability
• Enabling you to create or remove local and remote directories and files.
• Another change from the legacy DTS FTP Task is the capability to use FTP in
passive mode.
• This solves the problem that DTS had in communicating with FTP servers when the
firewalls filtered the incoming data port connection to the server.
71. WEB SERVICE TASK
• Used to retrieve XML-based result sets by
executing a method on a web service
• Only retrieves the data
• It doesn’t yet address the need to navigate through the data, or extract
sections of the resulting documents.
• Can be used in SSIS to provide real-time validation of data in your ETL
processes or to maintain lookup or dimensional data.
• Requires creation of an HTTP Connection
Manager
• To a specific HTTP endpoint on a website or to a specific Web Services
Description Language (WSDL) file on a website
72. SEND MAIL TASK
• Sending e-mail
messages via SMTP
• Only supports
Windows and
Anonymous
authentication.
• Google mail needs basic
authentication.
• So you cannot configure
SMTP Connection Manager
in SSIS with an external SMTP
server like Gmail, Yahoo etc
75. EXECUTE PACKAGE TASK
• Enables you to build SSIS solutions called
parent packages that execute other packages
called child packages
• Several improvements have simplified the
task:
• The child packages can be run as either in-process or out-of-process
executables.
• A big difference in this release of the task compared to its 2008 or 2005
predecessor is that you execute packages within a project to make migrating the
code from development to QA much easier.
• The task enables you to easily map parameters in the parent package to
the child packages now too.
76. EXECUTE PROCESS TASK
• Executes a Windows or console application
inside of the Control Flow.
• The most common example would have to be unzipping packed or
encrypted data files with a command-line tool
• The configuration items for this task are:
• RequireFullFileName
• Executable
• WorkingDirectory
• StandardInputVariable
• StandardOutputVariable
• StandardErrorVariable
• FailTaskIfReturnCodeIsNotSuccessValue
• Timeout
• TerminateProcessAfterTimeOut
• WindowStyle
79. WMI DATA READER TASK
• Οne of the best-kept secrets in Windows
• Enables you to manage Windows servers and workstations through a scripting
interface similar to running a T-SQL query
• This task enables you to interface with this
environment by writing WQL queries
• The output of this query can be written to a file or variable for later
consumption
• You could use the WMI Data Reader Task to:
• Read the event log looking for a given error.
• Query the list of applications that are running.
• Query to see how much RAM is available at package execution for debugging.
• Determine the amount of free space on a hard drive.
80. WMI EVENT WATCHER TASK
• The WMI Event Watcher Task empowers SSIS
to wait for and respond to certain WMI
events that occur in the operating system.
• The following are some of the useful things
you can do with this task:
• Watch a directory for a certain fi le to be written.
• Wait for a given service to start.
• Wait for the memory of a server to reach a certain level before executing
the rest of the package or before transferring fi les to the server.
• Watch for the CPU to be free.
83. SCRIPT TASK
• Enables you to access the VSTA environment
to write and execute scripts using the VB and
C# languages
• In the latest SSIS edition
• Solidifies the connection to the full .NET 4.0 libraries for both VB and C#.
• A coding environment with the advantage of IntelliSense
• An integrated Visual Studio design environment within SSIS
• An easy-to-use methodology for passing parameters into the script
• The capability to add breakpoints to your code for testing and debugging
purposes
• The automatic compiling of your script into binary format for increased
speed
85. CUSTOM TASK
• In a real-world integration solution, you may have
requirements that the built-in functionality in SSIS
does not meet
• Use Visual Studio and Class Library project
template
• Reference the following assemblies
• Microsoft.SqlServer.DTSPipelineWrap
• Microsoft.SqlServer.DTSRuntimeWrap
• Microsoft.SqlServer.ManagedDTS
• Microsoft.SqlServer.PipelineHost
• In addition the component needs
• Provide a strong name key for signing the assembly.
• Set the build output location to the PipelineComponentsfolder.
• Use a post-build event to install the assembly into the GAC.
• Set assembly-level attributes in the AssemblyInfo.csfile.
86. Database Transfer
Tasks
• Transfer Database
• Transfer Error Messages
• Transfer Jobs
• Transfer Logins
• Transfer Master Stored Procedures
• Transfer SQL Server Objects
108. SEQUENCE CONTAINER
• Handle the flow of a subset of a package
• Can help you divide a package into smaller and
more manageable pieces
• Usage of Sequence Container include:
• Grouping tasks so that you can disable a part of the package that’s no longer
needed
• Narrowing the scope of the variable to a container
• Managing the properties of multiple tasks in one step by setting the properties
of the container
• Using one method to ensure that multiple tasks have to execute successfully
before the next task executes
• Creating a transaction across a series of data-related tasks, but not on the entire
package
• Creating event handlers on a single container, wherein you could send an email
if anything inside one container fails and perhaps page if anything else fails
109. GROUPS
• Are not actually containers but simply a way to
group components together
• A key difference between groups and containers
is that properties cannot be delegated through a
container
• Don’t have precedence constraints originating from them (only from the tasks).
• You cannot disable the entire group, as you can with a Sequence Container.
• Groups are good for quick compartmentalization of tasks for aesthetics.
• Their only usefulness is to quickly group
components in either a Control Flow or a Data
Flow together.
110. FOR LOOP CONTAINER
• Enables you to create looping in your
package similar to how you would loop in
nearly any programming language.
• In this looping style, SSIS optionally initializes
an expression and continues to evaluate it
until the expression evaluates to false.
111. FOREACH LOOP CONTAINER
• Enables you to loop through a collection of objects.
• Foreach File Enumerator: Performs an action for each fi le in a directorywith a given file
extension
• Foreach Item Enumerator: Loops through a list of items that are set manually in the
container
• Foreach ADO Enumerator: Loops through a list of tables or rows in a table from an ADO
recordset
• Foreach ADO.NET Schema Rowset Enumerator: Loops through an ADO.NETschema
• Foreach From Variable Enumerator: Loops through an SSIS variable
• Foreach Nodelist Enumerator: Loops through a node list in an XML document
• Foreach SMO Enumerator: Enumerates a list of SQL Management Objects (SMO)
• As you loop through the collection
• the containerassigns the value from the collection to a variable
• which can later be used by tasks or connections insideor outsidethe container
• also you can also map the value to a variable
114. DATA FLOW TASK
• The heart of SSIS
• Has its own design surface
• Encapsulates all the data transformation
aspects of ETL
• Each Data Flow tasks corresponding to
separate Data Flow surface
• Split and handle data in pipelines based on
data element
115. DATA SOURCES
• Databases
• ADO.NET
• OLE DB
• CDC Source
• Files
• Excel
• Flat files
• XML
• Raw files
• Others
• Custom
117. CONNECTION MANAGERS
• A connection to a data source or destination:
• Provider (for example, ADO.NET, OLE DB, or flat file)
• Connection string
• Credentials
• Project or package level:
• Project-level connection managers:
• Can be shared across packages
• Are listed in Solution Explorer and the Connection Managers pane for packages in
which they are used
• Package-level connection managers:
• Can be shared across objects in the package
• Are listed only in the Connection Managers pane for packages in which they are
used
118. DATA FLOW TRANSFORMATIONS
Row Transformations Rowset Transformations Split & Join Transformations
• Character Map
• Copy Column
• Data Conversion
• Derived Column
• Export Column
• Import Column
• OLE DB Command
• Aggregate
• Sort
• Percentage Sampling
• Row Sampling
• Pivot
• Unpivot
• Conditional Split
• Multicast
• Union All
• Merge
• Merge Join
• Lookup
• Cache
• CDC Splitter
Audit Transformations BI Transformations Custom Transformations
• Audit
• RowCount
• Slowly Changing Dimension
• Fuzzy Grouping
• Fuzzy Lookup
• Term Extraction
• Term Lookup
• Data Mining Query
• Data Cleansing
• Script Component
• Custom Component
119. SYNCHRONOUS VS ASYNCHRONOUS TRANSFORMATIONS
• Synchronous
• Synchronous transformations such as the Derived Column and Data Conversion, where
rows flow into memory buffers in the transformation, and the same buffers come out.
• No rows are held, and typically these transformations perform very quickly,with minimal
impact to your Data Flow.
• Asynchronous
• Asynchronous transformations can cause a block in your Data Flowand slow down your
runtime.
• There are two types of asynchronous transformations:
• Partially blocking transformations
• Create new memory buffers for the output of the transformation.
• e.g. Union All transformation
• Fully blocking transformations
• e.g. Sort and Aggregate Transformations
• Create new memory buffers for the output of the transformation but cause a full block of the
data.
• These fully blocking transformations represent the single largest slowdown in SSIS and should be
considered carefully in terms of any architecture decisions you must make.
120. Row Transformations
• Character Map
• Copy Column
• Data Conversion
• Derived Column
• Export Column
• Import Column
• OLE DB Command
121. CHARACTER MAP
• Performs common character translations
• Modified column to be added
• as a new column or
• to update the original column
• The available operation types are:
• Byte Reversal: Reverses the order of the bytes.
• For example, for the data 0x1234 0x9876, the result is 0x4321 0x6789
• Full Width: Converts the half-width charactertype to full width.
• Half Width: Converts the full-width character type to half width.
• Hiragana: Converts the Katakanastyle of Japanese characters to Hiragana.
• Katakana: Converts the Hiraganastyle of Japanese characters to Katakana.
• Linguistic Casing: Applies the regional linguistic rules for casing.
• Lowercase: Changes all letters in the input to lowercase.
• Traditional Chinese: Converts the simplified Chinese characters to traditional Chinese.
• Simplified Chinese: Converts the traditional Chinesecharacters to simplified Chinese.
• Uppercase: Changes all letters in the input to uppercase.
122. COPY COLUMN
• A very simple transformation that copies the
output of a column to a clone of itself.
• This is useful if you wish to create a copy of a
column before you perform some elaborate
transformations.
• You could then keep the original value as your control subject and the copy
as the modified column.
123. DATA CONVERSION
• Performs a similar function to the CONVERT
or CAST functions in T-SQL.
• The Output Alias is the column name you
want to assign to the column after it is
transformed.
• If you don’t assign it a new name, it will later be displayed as Data
Conversion: ColumnName in the Data Flow.
124. DERIVED COLUMN
• Creates a new column that is calculated
(derived) from the output of another column
or set of columns.
• One of the most important transformations
in Data Flow.
• Examples
• To multiply the quantity of orders by the cost of the order to derive the
total cost of the order
• You can also use it to find out the current date or to fill in the blanks in the
data by using the ISNULL function.
125. EXPORT COLUMN
• Exports data to a file from the Data Flow
• Unlike the other transformations, the Export Column Transformation
doesn’t need a destination to create the file
• A common example is to extract blob-type
data from fields in a database and create files
in their original formats to be stored in a file
system or viewed by a format viewer, such as
Microsoft Word or Microsoft Paint.
126. IMPORT COLUMN
• The Import Column Transformation is a
partner to the Export Column transformation.
• These transformations do the work of
translating physical files from system file
storage paths into database blob-type fields,
and vice versa
127. OLE DB COMMAND
• Designed to execute a SQL statement for
each row in an input stream.
• This task is analogous to an ADO Command object being created,
prepared, and executed for each row of a result set.
• This transformation should be avoided
whenever possible.
• It’s a better practice to land the data into a staging table using an OLE DB
Destination and perform an update with a set-based process in the Control
Flow with an Execute SQL Task.
129. AGGREGATE
• Aggregate data from the Data Flow to apply
certain T-SQL functions that are done in a GROUP
BY statement
• The most important option is Operation.
• Group By: Breaks the data set into groups by the column you specify
• Average: Averages the selected column’s numeric data
• Count: Counts the records in a group
• Count Distinct: Counts the distinct non-NULL values in a group
• Minimum: Returns the minimum numeric value in the group
• Maximum: Returns the maximum numeric value in the group
• Sum: Returns sum of the selected column’s numeric data in the group
• It is a Fully Blocking transformation
130. SORT
• Ιt is a fully blocking asynchronous transformation
• Enables you to sort data based on any column in
the path.
• When possible avoid using it, because of speed
• However, some transformations, like the Merge Join and Merge, require the
data to be sorted.
• If you place an ORDER BY statement in the OLE DB Source, SSIS is not aware of
the ORDER BY statement.
• If you have an ORDER BY clause in your T-SQL, you can notify SSIS that the data
is already sorted, obviating the need for the Sort Transformation in the
Advanced Editor.
131. PERCENTAGE & ROW SAMPLING
• Enable you to take the data from the source and randomly select a
subset of data.
• The transformation produces two outputs that you can select.
• One output is the data that was randomly selected,
• and the other is the data that was not selected.
• You can use this to send a subset of data to a development or test server.
• The most useful application of this transformation is to train a data-mining model.
• You can use one output path to train your data-mining model,
• and the sampling to validate the model.
• The Percentage Sampling enables you to select the percentage of
rows
• The Row Sampling Transformation enables you to specify how
many rows you wish to be outputted randomly.
• You can specify the seed that will randomize the data.
• If you select a seed and run the transformation multiple times, the same data will be outputted to the
destination.
• If you uncheck this option, which is the default, the seed will be automatically incremented by one at runtime,
and you will see random data each time.
132. PIVOT & UNPIVOT
• A pivot table is a result of cross-tabulated
columns generated by summarizing data
from a row format.
• Unpivot is the opposite of Pivot
134. CONDITIONAL SPLIT
• Add complex logic to your Data Flow.
• This transformation enables you to send the data from a single data path to
various outputs or paths based on conditions that use the SSIS expression
language.
• Is similar to a CASE decision structure in a
programming language
• Also provides a default output
• If a row matches no expression it is directed to the default output.
135. MULTICAST
• Send a single data input to multiple output
paths
• Is similar to the Conditional Split
• Both transformations send data to multiple outputs.
• The Multicast will send all the rows down every output path, whereas the
Conditional Split will conditionally send each row down exactly one output
path.
136. MERGE
• Merge data from two paths into a single output
• Is useful when you
• wish to break out your Data Flow into a path that handles certain errors and
then merge it back into the main Data Flow downstream after the errors have
been handled
• wish to merge data from two Data Sources.
• But has some restrictions:
• The data must be sorted before.
• You can do this by using the Sort
• Transformation prior to the merge or by specifying an ORDER BY clause in the source
connection.
• The metadata must be the same between both paths.
• For example,the CustomerIDcolumn can’t be a numeric column in one path and a
charactercolumn in another path.
• If you have more than two paths, you should choose the Union All
Transformation.
137. UNION ALL
• Similar to Merge Transformation
• You can merge data from two or more paths into
a single output
• Does not require sorted data.
• This transformation fixes minor metadata issues.
• For example, if you have one input that is a 20-character string and another that
is 50 characters, the output of this from the Union All Transformation will be the
longer 50-character column.
• You need to open the Union All Transformation Editor only if the column names
from one of the transformations that feed the Union All Transformation have
different column names.
138. MERGE JOIN
• Merge the output of two inputs and perform
an INNER or OUTER join on the data
• If both inputs are in the same database, then
it would be faster to perform a join at the
OLE DB Source level, rather than use a
transformation through T-SQL.
• Useful when you have two different Data
Sources you wish to merge
139. LOOKUP
• Performs lookups by joining data in input
columns with columns in a reference dataset
• You use the lookup to access additional
information in a related table that is based on
values in common columns.
• Lookup Caching mechanism
• Full-Cache Mode: stores all the rows resulting from a specified query in
memory
• No-Cache Mode: you can choose to cache nothing, for each input row
component sends a request to the reference table in the database server to ask
for a match
• Partial-Cache Mode: this mode caches only the most recently used data within
the memory, as soon as the cache grows too big, the least-used cache data is
thrown away.
140. CACHE
• Generates a reference dataset for the Lookup
Transformation
• by writing data from a connected data source in the data flow to a Cache
connection manager.
• You can use the Cache connection manager
• When you want to configure the Lookup Transformation to run in the full
cache mode.
• In this mode, the reference dataset is loaded into cache before the Lookup
Transformation runs.
141. CDC SPLITTER
• Splits a single flow of change rows from a
CDC source data flow into different data
flows for Insert, Update and Delete
operations.
• The data flow is split based on the required column __$operation and its
standard values in SQL Server 2012 change tables.
143. AUDIT
• Allows you to add auditing data to your Data
Flow
• Because of acts such as HIPPA and Sarbanes-Oxley (SOX) governing audits, you
often must be able to track who inserted data into a table and when.
• The task is easy to configure
• Simply select the type of data you want to audit in the Audit Type column and
then name the column that will be outputted to the flow.
• Following are some of the available options:
• Execution instance GUID: GUID that identifies the execution instanceof the package
• Package ID: Unique ID for the package
• Package name: Name of the package
• Version ID: Version GUID of the package
• Execution start time: Time the package began
• Machine name: Machine on which the package ran
• User name: User who started the package
• Task name: Data Flow Task name that holds the Audit Task
• Task ID: Unique identifierfor the Data Flow Task that holds the Audit Task
144. ROWCOUNT
• Provides the capability to count rows in a
stream that is directed to its input source.
• This transformation must place that count
into a variable that could be used in the
Control Flow
145. BI Transformations
• Slowly Changing Dimension
• Fuzzy Grouping
• Fuzzy Lookup
• Term Extraction
• Term Lookup
• Data Mining Query
• Data Cleansing
146. SLOWLY CHANGING DIMENSION
• Provides a great head start in helping to solve a common, classic
changing-dimension problem that occurs in the outer edge of your
data model, the dimension or lookup tables.
• A dimension table contains a set of discrete values with a
description and often other measurable attributes such as price,
weight, or sales territory.
• The classic problem is what to do in your dimension data when an
attribute in a row changes particularly when you are loading data
automatically through an ETL process
• This transformation can shave days off of your development time in
relation to creating the load manually through T-SQL, but it can
add time because of how it queries your destination and how it
updates with the OLE DB Command Transform (row by row)
147. FUZZY LOOKUP
• Performs data cleaning tasks
• standardizing data, correcting data, and providing missing values
• Uses an equi-join to locate matching records in
the reference table
• It returns records with at least one matching record,
• and returns records with no matching records.
• Uses fuzzy matching to return one or more close matches in the reference table.
• Usually follows a Lookup transformation in a
package data flow
• First, the Lookup transformation tries to find an exact match.
• If it fails, the Fuzzy Lookup transformation provides close matches from the
reference table.
148. FUZZY GROUPING
• Performs data cleaning tasks
• by identifying rows of data that are likely to be duplicates and
• selecting a canonical row of data to use in standardizing the data.
• Requires a connection to an instance of SQL Server
• to create the temporary SQL Server tables that the transformation algorithm requires to
do its work
• The connection must resolve to a user who has permission to create tables in the database.
• Produces one output row for each input row
• Each row has the following additional columns:
• _key_in: a column that uniquely identifies each row.
• _key_out:,a column that identifies a group of duplicaterows.
• The _key_out column has the value of the _key_in column in the canonical data row.
• Rows with the same value in _key_out are part of the same group
• The _key_out value for a group corresponds to the value of _key_in in the canonical data row.
• _score: a value between 0 and 1 that indicates the similarity of the input row to the
canonical row.
149. TERM EXTRACTION
• A tool to mine free-flowing text for English
word and phrase frequency
• If you have ever done some word and phrase analysis on websites for
better search engine placement, you are familiar with the job that this
transformation performs
• Based on Term Frequency and Inverse
Document Frequency formula
• TDIDF = (frequency of term) * log((# rows in sample) / (#rows with term or
phrase))
• Output two columns:
• a text phrase
• a statistical value for the phrase relative to the total input stream
150. TERM LOOKUP
• Uses the same algorithms and statistical models
as the Term Extraction Transformation to break
up an incoming stream into noun or noun phrase
tokens
• It is designed to compare those tokens to a stored
word list and output a matching list of terms and
phrases with simple frequency counts.
• Generating statistics on known phrases of
importance.
• A real world application of this would be to pull out all the customer service
notes that had a given set of terms or that mention a competitor’s name.
151. DATA MINING QUERY
• Typically is used to fill in gaps in your data or
predict a new column for your Data Flow
• Optionally you can add columns
• such as the probability of a certain condition being true.
• Usage Examples
• You could take columns, such as number of children, household income,
and marital income, to predict a new column that states whether the
person owns a house or not.
• You could predict what customers would want to buy based on their
shopping cart items.
• You could fill the gaps in your data where customers didn’t enter all the
fields in a questionnaire.
152. DATA CLEANSING
• Performs advanced data cleansing on data
• A business analyst create a series of business
rules that declare what good data looks like
• Create domains that define data in your
company
• such as what a Company Name column should always look like.
155. SCRIPT COMPONENT
• Enables you to write custom .NET scripts as
• Transformations
• Sources
• Destinations
• Some of the things you can do with this
transformation
• Create a custom transformation that would use a .NET assembly to validate
credit card numbers or mailing addresses.
• Validate data and skip records that don’t seem reasonable.
• Read from a proprietary system for which no standard provider exists.
• Write a custom component to integrate with a third-party vendor.
• Scripts used as sources can support multiple
outputs
• You have the option of precompiling the scripts for runtime efficiency.
156. CUSTOM COMPONENT
• In a real-world integration solution, you may have
requirements that the built-in functionality in SSIS
does not meet
• Use Visual Studio and Class Library project
template
• Reference the following assemblies
• Microsoft.SqlServer.DTSPipelineWrap
• Microsoft.SqlServer.DTSRuntimeWrap
• Microsoft.SqlServer.ManagedDTS
• Microsoft.SqlServer.PipelineHost
• In addition the component needs
• Provide a strong name key for signing the assembly.
• Set the build output location to the PipelineComponentsfolder.
• Use a post-build event to install the assembly into the GAC.
• Set assembly-level attributes in the AssemblyInfo.csfile.
158. OPTIMIZING DATA FLOW PERFORMANCE
• Optimize queries:
• Select only the rows and columns that you need
• Avoid unnecessary sorting:
• Use presorted data where possible
• Set the IsSorted property where applicable
• Configure Data Flow task properties:
• Buffer size
• Temporary storage location
• Parallelism
• Optimized mode