SlideShare ist ein Scribd-Unternehmen logo
1 von 162
Downloaden Sie, um offline zu lesen
SQL Saturday Night
SQL Server 2012
Integration Services
for Beginners
June 01, 2013
20th
SQL Server 2012
Integration Services
for Beginners
20th SQL Saturday Night
May 25, 2013
Antonios Chatzipavlis
Solution Architect – SQL Server Evangelist
• I have been started with computers since 1982.
• In 1988 I started my professional carrier in computers industry.
• In 1996 I have been started to work with SQL Server version 6.0
• In 1998 I earned my first certification at Microsoft as
Microsoft Certified Solution Developer (3rd in Greece).
• Since then I earned more than 30 certifications (and counting!)
• MCT, MCITP, MCPD, MCSD, MCDBA, MCSA, MCTS, MCAD, MCP, OCA,
• MCSE : Data Platform
• In 1998 started my carrier as Microsoft Certified Trainer (MCT).
• Since then I have more than 15.000 hours of training.
• In 2010 I became for first time Microsoft MVP on SQL Server
• In 2010 I created the SQL School Greece (www.sqlschool.gr)
• I am board member of IAMCT organization and Leader of Greek
Chapter of IAMCT organization.
• In 2012 I became MCT Regional Lead by Microsoft Learning Program.
• I am moderator of autoexec.gr and member of dotnetzone.gr
SP_WHO
@antoniosch
@sqlschool SQL School Greece
www.sqlschool.gr help@sqlschool.gr
COMMUNICATION
Agenda
SQL Server 2012 Integration Services for Beginners
• Introduction to SSIS
• SSIS Tools
• Variables, Parameter, Expressions
• SSIS Tasks
• Containers
• Data Flows
AGENDA
Introduction to SSIS
SQL Server 2012 Integration Services for Beginners
WHAT IS SSIS?
• A platform for ETL operations
• Installed as a feature of SQL Server
• Useful for DBAs, Developers, Data Analysts
• DTS Evolution
SSIS ARCHITECTURE (1)
• SSIS Project
• A versioned container for parameters and packages
• A unit of deployment to an SSIS Catalog
• SSIS Package
• A unit of task flow execution
• A unit of deployment (package deployment model)
SSIS ARCHITECTURE (2)
Control Flow
• It is the brain of package
• Orchestrates the order of execution for all its
components
• Tasks
• Individual units of work
• Precedence constraints
• Directs tasks to execute in a give order
• Defines the workflow of SSIS package
• Containers
• Core units for grouping tasks together logically into units of work
• Give us the abilityto define variables and events within container scope
SSIS ARCHITECTURE (3)
Data Flow
• It is the heart of package
• Provides the capability to implements ETL
• Sources
• A component which specify the location of the source data
• Transformation
• Allow changes to the data within the data pipeline
• Destinations
• A component which specify the destination of the transformed data
SSIS ARCHITECTURE (4)
• Variables
• SSIS variables can be set to evaluate to an expression at runtime
• Parameters
• Parameters behave much like variables but with a few main exceptions.
• Parameters can make a package dynamic.
• Parameters can be set outside the package easily
• Can be designated as values that must be passed in for the package to
start
• Parameters are new to SSIS in SQL Server 2012 and replace the capabilities
of Configurations in previous releases of SQL Server.
• Error Handling and Logging
SSIS Tools
SQL Server 2012 Integration Services for Beginners
IMPORT – EXPORT WIZARD
The easiest method to
move data
Uses SSIS as a framework
Optionally we can save a
package
Import – Export
Wizard
SQL SERVER DATA TOOLS
Using
SQL Server Data Tools
to create
SSIS Project & Package
Variables,
Parameter,
Expressions
SQL Server 2012 Integration Services for Beginners
SSIS DATA TYPES
• Named differently than similar types in .NET
or T-SQL
• C:Program FilesMicrosoft SQL Server110DTSMappingFiles
• The .NET managed types are important only
if you are using:
• Script component
• CLR
• .NET-based coding to manipulate your Data Flows.
SSIS DATA TYPE SQL SERVER DATA TYPE .NET DATA TYPE
DT_NUMERIC numeric System.Decimal
DT_DECIMAL Decimal
DT_CY numeric, decimal
DT_I1 System.Sbyte
DT_I2 smallint System.Int16
DT_I4 int System.Int32
DT_BOOL Bit System.Boolean
DT_I8 bigint System.Int64
DT_R4 real System.Single
DT_R8 float System.Double
DT_U1 tinyint System.Byte
DT_U2 System.UInt16
DT_U4 System.UInt32
DT_U8 System.UInt64
DT_GUID Uniqueidentifier System.Guid
SSIS NUMERIC DATA TYPES TABLE MAPPINGS
SSIS DATA TYPE SQL SERVER DATA TYPE .NET DATA TYPE
DT_WSTR nvarchar, nchar System.String
DT_STR varchar, char
DT_TEXT text
DT_NTEXT ntext, sql_variant, xml
DT_BYTES binary, varbinary System.Byte()
DT_IMAGE timestamp, image
DT_DBTIMESTAMP smalldatetime, datetime System.DateTime
DT_DBTIMESTAMP2 datetime
DT_DBDATE Date
DT_DATE
DT_FILETIME
DT_DBDATETIMESTAMPOFFSET datetimeoffset
DT_DBTIME2 time System.TimeSpan
DT_DBTIME
SSIS STRING AND DATE-TIME DATA TYPES TABLE MAPPINGS
PERFORMANCE AND DATA TYPES
• Convert only when necessary.
• Don’t convert columns from a data source that will be dropped from the data
stream.
• Each conversion costs something.
• Convert to the closest type for your destination
source using the mapping files.
• If a value is converted to a non-supported data type, you’ll incur an additional
conversion internal to SSIS to the mapped data type.
• Convert using the closest size and precision.
• Don’t import all columns as 50-character data columns if you are working with
a fixed or reliable file format with columns that don’t require as much space.
• Evaluate the option to convert after the fact.
• Remember that SSIS is still an ETL tool and sometimes it is more efficient to
stage the data and convert it using set-based methods.
UNICODE AND NON-UNICODE
• Unicode data type is the default
• Default Import behavior
• All the string SSIS functions expect Unicode strings as input
• Use Data Conversion Transformation to
convert non-Unicode data types to
appropriate Unicode data type according to
the mapping table
CONVERSION IN SSIS EXPRESSIONS
• Use explicit casting to avoid troubles
• Casting is easy
• It looks like .NET
• (DT_I4) 32
• Casting operators parameters
• Length – Final string length
• Code_Page – Unicode character set
• Precision
• Scale
VARIABLES
• Key feature in SSIS
• User Variables
• Variables created by an SSIS developer to hold dynamic values
• Defined in the User namespace by default User::MyVar
• Defined at a specified scope
• Syntax: User::MyVar
• System Variables
• Built-in variables with dynamic system values
• Defined in the System namespace
• Syntax: System::MyVar
PARAMETERS
• New feature in SQL Server 2012.
• Similar to a variable
• Can store information
• It has a few different properties and uses
• Parameters are set externally
• Project parameters
• Package parameters
• Replace package configurations
• when using the project deployment model
• Required property of the parameter
• Necessitatethe caller of the package to pass in a value for the parameter
PROJECT PARAMETERS
• Created at the project level
• Can be used in all packages
that are included in that
project.
• Best used for values that
should be shared among
packages such as e-mail
addresses for error
messages.
PACKAGE PARAMETERS
• Created at the package
level
• Can be used only in that
package
• Best used for values specific
to that package, such as
directory locations.
PARAMETERS VS VARIABLES
Parameters
If you want to set the
value of something
from outside the
package
Variables
If you want to create
or store values only
within a package
VARIABLE & PARAMETERS DATA TYPES
VARIABLE DATA TYPE SSIS DATA TYPE REMARKS
Boolean DT_BOOL Be careful setting these data types in code because the expression language and
.NET languages define these differently
Byte DT_UI1 A 1-byte unsigned int. (Note this is not a byte array.)
Char DT_UI2
DateTime DT_DBTIMESTAMP
DBNull N/A
Double DT_R8
Int16 DT_I2
Int32 DT_I4
Int64 DT_I8
Object N/A An object reference. Typically used to store data sets or large object structures
Sbyte DT_I1
Single DT_R4
String DT_WSTR
UInt32 DT_UI4
UInt64 DT_UI8
EXPRESSIONS
• Used to set values dynamically
• Properties
• Conditional split criteria
• Derived column values
• Precedence constraints
• Based on Integration Services expression syntax
• Can include variables and parameters
• The language is heavily C#-like but contains a Visual Basic flavor and sometimes
T-SQL for functions
• Can be created graphically by using Expression
Builder
• Expression adorners, a new feature in SQL Server
2012
EXPRESSION OPERATORS
• || Logical OR
• && Logical AND
• == Determine equivalency
• != Determine inequality
• ? : Conditional operator
EXPRESSION BUILDER
• Friendly UI
• Provides
• Syntax check
• Expression evaluation
EXPRESSION SYNTAX BASICS
• Equivalence Operator (==)
• String Concatenation (+)
• Line Continuation (n)
• Literals
• Numeric Literals
• 30L -> DT_I8
• 30U -> DT_R8
• 30.0F -> DT_R4
• 30.3E -> DT_R8
• String Literals
• n CRLF
• t TAB
• ” Quotation mark
•  Backslash
• Boolean Literals
• @User::Var == True
DEALING WITH NULL IN SSIS
• In SSIS variable can not be set to NULL
• IsNull SSIS function <> ISNULL T-SQL
• Each data type maintains a default value
VARIABLE DATA TYPE DEFAULT VALUE VARIABLE DATA TYPE DEFAULT VALUE
Boolean False DBNull N/A in an expression
Byte 0 Object N/A in an expression
Char 0 String “”
DateTime 30/12/1899 Sbyte 0
Double 0 Single 0
Int16 0
Int32 0 UInt32 0
Int64 0 UInt64 0
STRING FUNCTIONS
• SSIS String Functions are UNICODE
• Is different from SQL Server string functions
• The comparison is case and padding
sensitive
• Comparing strings requires that you have two strings of the same padding
length and case.
USING EXPRESSIONS IN SSIS PACKAGE
• Variable and Parameters as Expressions
• Expressions in Connection Manager
Properties
• Expressions in Control Flow Tasks
• Expressions in Control Flow Precedence
• Expression Task new in SQL Server 2012
• Expression in Data Flow
SSIS Tasks
SQL Server 2012 Integration Services for Beginners
WHAT IS A TASK
• A foundation of the Control Flow in SSIS
• A discrete unit of work that can perform
typical actions
• Each task has a set of setup parameters
• Setup parameters are visible at Task Editor
• Most of these parameters can be dynamic by
using expression on Task Editor Expression
Tab
CONTROL FLOW TASKS
Data Flow Tasks Database Tasks File & Internet Tasks
• Data Flow • Data Profiling
• Bulk Insert
• Execute SQL
• Execute T-SQL
• CDC Control
• File System
• FTP
• XML
• Web Service
• Send Mail
Process Execution Task WMI Tasks Custom Logic Tasks
• Execute Package
• Execute Process
• WMI Data Reader
• WMI Event Watcher
• Script
• Custom Tasks
Database Transfer Tasks Analysis Services Tasks SQL Server Maintenance Tasks
• Transfer Database
• Transfer Error Messages
• Transfer Jobs
• Transfer Logins
• Transfer Master Stored
Procedures
• Transfer SQL Server Objects
• Analysis Services Execute DDL
• Analysis Services Processing
• Data Mining Query
• Back Up Database
• Check Database Integrity
• History Cleanup
• Maintenance Cleanup
• Notify Operator
• Rebuild Index
• Reorganize Index
• Shrink Database
• Update Statistics
PRECEDENCE CONSTRAINTS
Connect sequences
of tasks
Three control flow
conditions
Success
Failure
Completion
Multiple constraints
Logical AND
Logical OR
Task 1
Task 2
Task 3 Task 4
Task 5
Task 10
Task 6
Task 7
Success (AND)
Failure (AND)
Completion (AND)
Success (OR)
Failure (OR)
Completion (OR)
Task 9 Task 8
TASK EDITOR
• Easy to
access
• Consistent
design
• Name and
Description
generic
properties
TASK EDITOR
SSIS uses a concept
of setting the value
of most task
properties to a
dynamic expression
that is evaluated at
runtime
PROPERTIES WINDOW
COMMON PROPERTIES (1)
• If set to true, SSIS will not validate any of the
properties set in the task until runtime.
• This is useful if you are operating in a
disconnected mode
• Want to enter a value for production that cannot be validated
until the package is deployed, or you are dynamically setting
the properties using expressions
• The default value for this property is false.
DelayValidation
COMMON PROPERTIES (2)
• The description of what the instance of the task does.
• The default name for this is <task name>
• if you have multiple tasks of the same type, it would read <task name 1> (where the number 1
increments).
• This property does not have to be unique and is optional.
• If you do provide details here, it will display in the tooltip
when hovering over the task object.
• For consistency, the property should accurately describe
what the task does for people who may be monitoring
the package in your operations group.
Description
COMMON PROPERTIES (3)
• If set to true, the task is disabled and will not
execute.
• Helpful if you are testing a package and want
to disable the execution of a task temporarily.
• Is the equivalent of commenting a task out
temporarily.
• Is set to false by default.
Disable
COMMON PROPERTIES (4)
• Contains the name of the custom variable
that will store the output of the task’s
execution.
• The default value of this property is <none>
means the execution output is not stored
• Enables the task to expose information
related to the results of the internal actions
within the task.
ExecValueVariable
COMMON PROPERTIES (5)
• If set to true
• The entire package will fail if the individual task fails.
• Typically, you want to control what happens to a package if a task fails with
a custom error handler or Control Flow.
• By default, this property is set to false.
FailPackageOnFailure
COMMON PROPERTIES (6)
• If set to true
• The task’s parent will fail if the individual task reports an error.
• The task’s parent can be a package or container
• By default, this property is set to false.
FailParentOnFailure
COMMON PROPERTIES (7)
• Read-only
• Automatically generated unique ID
• Associated with an instance of a task.
• The ID is in GUID format
• {BK4FH3I-RDN3-I8RF-KU3F-JF83AFJRLS}.
ID
COMMON PROPERTIES (8)
• Specifies the isolation level of the transaction
• The values are
• Chaos
• ReadCommitted
• ReadUncommitted
• RepeatableRead
• Serializable
• Unspecified
• Snapshot.
• The default value of this property is Serializable
IsolationLevel
COMMON PROPERTIES (9)
• Specifies the type of logging that will be
performed for this task.
• The values are
• UseParentSetting
• Enabled
• Disabled.
• The default value is UseParentSetting
• Basic logging is turned on at the package level by
default in SQL Server 2012.
LoggingMode
COMMON PROPERTIES (10)
• The name associated with the task.
• The default name for this is <task name>
• Ιf you have multiple tasks of the same type, <task name 1> (where the number
1 increments)
• Change this name to make it more readable to an
operator at runtime
• Ιt must be unique inside your package.
• Used to identify the task programmatically
Name
COMMON PROPERTIES (11)
• Specifies the transaction attribute for the task.
• Possible values are
• NotSupported
• Supported
• Required.
• The default value is Supported
• Supported enables the option for you to use transactions in your task.
TransactionOption
DATA FLOW TASK
• The heart of SSIS
• Has its own design surface
• Encapsulates all the data transformation
aspects of ETL
• Each Data Flow tasks corresponding to
separate Data Flow surface
• Split and handle data in pipelines based on
data element
Database Tasks
• Data Profiling
• Bulk Insert
• Execute SQL
• Execute T-SQL
• CDC Control
DATA PROFILER TASK
• Examining data and collecting
metadata about the quality of the
data
• About frequency of statistical patterns,
interdependencies, uniqueness, and redundancy.
• Important for the overall quality and health of an
operational data store (ODS) or data warehouse.
• Doesn’t have built-in conditional
workflow logic
• but technically you can use XPath queries on the
results
• Creates a report on data statistics
• You still need to make judgments about these
statistics.
• For example, a column may contain an overwhelming
amount of NULL values, but the profiler doesn’t know
whether this reflects a valid business scenario.
EXAMINING AREAS
o Candidate Key Profile
Request
o ColumnLength Distribution
Profile
o ColumnNull Ratio Profile
Request
o ColumnPattern Profile
Request
o ColumnStatistics Profile
Request
o FunctionalDependency
Profile Request
o ValueInclusionProfile
Request
Data Profiler Task
BULK INSERT TASK
• Inserts data from a text or flat file into a SQL
Server database
• Similar to BULK INSERT statement or BCP.exe
• Very fast operation especially with large amount of data
• In fact this is a wizard to store the information needed to create and
execute a bulk copying command at runtime
• Has no ability to transform data
• Because of this give us the fastest way to load data
EXECUTE SQL TASK
• Is one of the most widely used task in SSIS
• Used for
• Truncating a staging data table prior to importing
• Retrieving row counts to determine the next step in a workflow
• Calling stored proc to perform business logic against sets of stage data
• Retrieve information from a database repository
• Executing a Parameterized SQL Statement
• ? Indicates the parameter when we are using ADO, ODBC, OLEDB and
EXCEL
• ADO, OLEDB and EXCEL is zero base
• ODBC starts from 1
• @<Real Param Name> when we are using ADO.NET
Bulk Insert Task
Execute SQL Task
EXECUTE T-SQL STATEMENT TASK
• Similar to Execute SQL Task
• Supports only T-SQL for SQL Server
CDC CONTROL TASK
• Used to control the life cycle of change data
capture (CDC)
• Handles CDC package synchronization
• Maintains the state of the CDC package
• Supports two groups of operations.
• One group handles the synchronization of initial load and change
processing
• The other manages the change-processing range of LSNs for a run of a
CDC package and keeps track of what was processed successfully.
File & Internet Tasks
• File System
• FTP
• XML
• Web Service
• Send Mail
FILE SYSTEM TASK
• Performs file operations available in the
System.IO.File .NET class.
• The creation of directory structures does not
have to be made recursively as we did in the
DTS legacy product.
• It is written for a single operation
• If you need to iterate over a series of fi les or directories, the File System
Task can be simply placed within a Looping Container
File System Task
FTP TASK
• Enables the use of the File Transfer Protocol
(FTP) in your package development tasks.
• Exposes more FTP command capability
• Enabling you to create or remove local and remote directories and files.
• Another change from the legacy DTS FTP Task is the capability to use FTP in
passive mode.
• This solves the problem that DTS had in communicating with FTP servers when the
firewalls filtered the incoming data port connection to the server.
XML TASK
Validate,
modify, extract,
create files in
an XML format.
XML Task
WEB SERVICE TASK
• Used to retrieve XML-based result sets by
executing a method on a web service
• Only retrieves the data
• It doesn’t yet address the need to navigate through the data, or extract
sections of the resulting documents.
• Can be used in SSIS to provide real-time validation of data in your ETL
processes or to maintain lookup or dimensional data.
• Requires creation of an HTTP Connection
Manager
• To a specific HTTP endpoint on a website or to a specific Web Services
Description Language (WSDL) file on a website
SEND MAIL TASK
• Sending e-mail
messages via SMTP
• Only supports
Windows and
Anonymous
authentication.
• Google mail needs basic
authentication.
• So you cannot configure
SMTP Connection Manager
in SSIS with an external SMTP
server like Gmail, Yahoo etc
Send Mail Task
Process Execution Task
• Execute Package
• Execute Process
EXECUTE PACKAGE TASK
• Enables you to build SSIS solutions called
parent packages that execute other packages
called child packages
• Several improvements have simplified the
task:
• The child packages can be run as either in-process or out-of-process
executables.
• A big difference in this release of the task compared to its 2008 or 2005
predecessor is that you execute packages within a project to make migrating the
code from development to QA much easier.
• The task enables you to easily map parameters in the parent package to
the child packages now too.
EXECUTE PROCESS TASK
• Executes a Windows or console application
inside of the Control Flow.
• The most common example would have to be unzipping packed or
encrypted data files with a command-line tool
• The configuration items for this task are:
• RequireFullFileName
• Executable
• WorkingDirectory
• StandardInputVariable
• StandardOutputVariable
• StandardErrorVariable
• FailTaskIfReturnCodeIsNotSuccessValue
• Timeout
• TerminateProcessAfterTimeOut
• WindowStyle
Process Execution Task
• Execute Package
• Execute Process
WMI Tasks
• WMI Data Reader
• WMI Event Watcher
WMI DATA READER TASK
• Οne of the best-kept secrets in Windows
• Enables you to manage Windows servers and workstations through a scripting
interface similar to running a T-SQL query
• This task enables you to interface with this
environment by writing WQL queries
• The output of this query can be written to a file or variable for later
consumption
• You could use the WMI Data Reader Task to:
• Read the event log looking for a given error.
• Query the list of applications that are running.
• Query to see how much RAM is available at package execution for debugging.
• Determine the amount of free space on a hard drive.
WMI EVENT WATCHER TASK
• The WMI Event Watcher Task empowers SSIS
to wait for and respond to certain WMI
events that occur in the operating system.
• The following are some of the useful things
you can do with this task:
• Watch a directory for a certain fi le to be written.
• Wait for a given service to start.
• Wait for the memory of a server to reach a certain level before executing
the rest of the package or before transferring fi les to the server.
• Watch for the CPU to be free.
WMI Data Reader
Custom Logic Tasks
• Script
• Custom Tasks
SCRIPT TASK
• Enables you to access the VSTA environment
to write and execute scripts using the VB and
C# languages
• In the latest SSIS edition
• Solidifies the connection to the full .NET 4.0 libraries for both VB and C#.
• A coding environment with the advantage of IntelliSense
• An integrated Visual Studio design environment within SSIS
• An easy-to-use methodology for passing parameters into the script
• The capability to add breakpoints to your code for testing and debugging
purposes
• The automatic compiling of your script into binary format for increased
speed
Script Task
CUSTOM TASK
• In a real-world integration solution, you may have
requirements that the built-in functionality in SSIS
does not meet
• Use Visual Studio and Class Library project
template
• Reference the following assemblies
• Microsoft.SqlServer.DTSPipelineWrap
• Microsoft.SqlServer.DTSRuntimeWrap
• Microsoft.SqlServer.ManagedDTS
• Microsoft.SqlServer.PipelineHost
• In addition the component needs
• Provide a strong name key for signing the assembly.
• Set the build output location to the PipelineComponentsfolder.
• Use a post-build event to install the assembly into the GAC.
• Set assembly-level attributes in the AssemblyInfo.csfile.
Database Transfer
Tasks
• Transfer Database
• Transfer Error Messages
• Transfer Jobs
• Transfer Logins
• Transfer Master Stored Procedures
• Transfer SQL Server Objects
TRANSFER DATABASE TASK
TRANSFER ERROR MESSAGE TASK
TRANSFER JOBS TASK
TRANSFER LOGINS TASK
TRANSFER MASTER STORED PROCEDURE TASK
TRANSFER SQL OBJECTS TASK
Analysis Services Tasks
• Analysis Services Execute DDL
• Analysis Services Processing
• Data Mining Query
ANALYSIS SERVICES EXECUTE DDL TASK
ANALYSIS SERVICES PROCESSING TASK
DATA MINING QUERY TASK
SQL Server
Maintenance Tasks
• Back Up Database
• Check Database Integrity
• History Cleanup
• Maintenance Cleanup
• Notify Operator
• Rebuild Index
• Reorganize Index
• Shrink Database
• Update Statistics
BACK UP DATABASE TASK
CHECK DATABASE INTEGRITY TASK
HISTORY CLEANUP TASK
MAINTENANCE CLEANUP TASK
NOTIFY OPERATOR TASK
REBUILD INDEX TASK
REORGANIZE INDEX TASK
SHRINK DATABASE TASK
UPDATE STATISTICS TASK
Containers
SQL Server 2012 Integration Services for Beginners
SEQUENCE CONTAINER
• Handle the flow of a subset of a package
• Can help you divide a package into smaller and
more manageable pieces
• Usage of Sequence Container include:
• Grouping tasks so that you can disable a part of the package that’s no longer
needed
• Narrowing the scope of the variable to a container
• Managing the properties of multiple tasks in one step by setting the properties
of the container
• Using one method to ensure that multiple tasks have to execute successfully
before the next task executes
• Creating a transaction across a series of data-related tasks, but not on the entire
package
• Creating event handlers on a single container, wherein you could send an email
if anything inside one container fails and perhaps page if anything else fails
GROUPS
• Are not actually containers but simply a way to
group components together
• A key difference between groups and containers
is that properties cannot be delegated through a
container
• Don’t have precedence constraints originating from them (only from the tasks).
• You cannot disable the entire group, as you can with a Sequence Container.
• Groups are good for quick compartmentalization of tasks for aesthetics.
• Their only usefulness is to quickly group
components in either a Control Flow or a Data
Flow together.
FOR LOOP CONTAINER
• Enables you to create looping in your
package similar to how you would loop in
nearly any programming language.
• In this looping style, SSIS optionally initializes
an expression and continues to evaluate it
until the expression evaluates to false.
FOREACH LOOP CONTAINER
• Enables you to loop through a collection of objects.
• Foreach File Enumerator: Performs an action for each fi le in a directorywith a given file
extension
• Foreach Item Enumerator: Loops through a list of items that are set manually in the
container
• Foreach ADO Enumerator: Loops through a list of tables or rows in a table from an ADO
recordset
• Foreach ADO.NET Schema Rowset Enumerator: Loops through an ADO.NETschema
• Foreach From Variable Enumerator: Loops through an SSIS variable
• Foreach Nodelist Enumerator: Loops through a node list in an XML document
• Foreach SMO Enumerator: Enumerates a list of SQL Management Objects (SMO)
• As you loop through the collection
• the containerassigns the value from the collection to a variable
• which can later be used by tasks or connections insideor outsidethe container
• also you can also map the value to a variable
Containers
Data Flows
SQL Server 2012 Integration Services for Beginners
DATA FLOW TASK
• The heart of SSIS
• Has its own design surface
• Encapsulates all the data transformation
aspects of ETL
• Each Data Flow tasks corresponding to
separate Data Flow surface
• Split and handle data in pipelines based on
data element
DATA SOURCES
• Databases
• ADO.NET
• OLE DB
• CDC Source
• Files
• Excel
• Flat files
• XML
• Raw files
• Others
• Custom
DATA DESTINATIONS
• Database
• ADO.NET
• OLE DB
• SQL Server
• ORACLE
• File
• SSAS
• Rowset
• Other
CONNECTION MANAGERS
• A connection to a data source or destination:
• Provider (for example, ADO.NET, OLE DB, or flat file)
• Connection string
• Credentials
• Project or package level:
• Project-level connection managers:
• Can be shared across packages
• Are listed in Solution Explorer and the Connection Managers pane for packages in
which they are used
• Package-level connection managers:
• Can be shared across objects in the package
• Are listed only in the Connection Managers pane for packages in which they are
used
DATA FLOW TRANSFORMATIONS
Row Transformations Rowset Transformations Split & Join Transformations
• Character Map
• Copy Column
• Data Conversion
• Derived Column
• Export Column
• Import Column
• OLE DB Command
• Aggregate
• Sort
• Percentage Sampling
• Row Sampling
• Pivot
• Unpivot
• Conditional Split
• Multicast
• Union All
• Merge
• Merge Join
• Lookup
• Cache
• CDC Splitter
Audit Transformations BI Transformations Custom Transformations
• Audit
• RowCount
• Slowly Changing Dimension
• Fuzzy Grouping
• Fuzzy Lookup
• Term Extraction
• Term Lookup
• Data Mining Query
• Data Cleansing
• Script Component
• Custom Component
SYNCHRONOUS VS ASYNCHRONOUS TRANSFORMATIONS
• Synchronous
• Synchronous transformations such as the Derived Column and Data Conversion, where
rows flow into memory buffers in the transformation, and the same buffers come out.
• No rows are held, and typically these transformations perform very quickly,with minimal
impact to your Data Flow.
• Asynchronous
• Asynchronous transformations can cause a block in your Data Flowand slow down your
runtime.
• There are two types of asynchronous transformations:
• Partially blocking transformations
• Create new memory buffers for the output of the transformation.
• e.g. Union All transformation
• Fully blocking transformations
• e.g. Sort and Aggregate Transformations
• Create new memory buffers for the output of the transformation but cause a full block of the
data.
• These fully blocking transformations represent the single largest slowdown in SSIS and should be
considered carefully in terms of any architecture decisions you must make.
Row Transformations
• Character Map
• Copy Column
• Data Conversion
• Derived Column
• Export Column
• Import Column
• OLE DB Command
CHARACTER MAP
• Performs common character translations
• Modified column to be added
• as a new column or
• to update the original column
• The available operation types are:
• Byte Reversal: Reverses the order of the bytes.
• For example, for the data 0x1234 0x9876, the result is 0x4321 0x6789
• Full Width: Converts the half-width charactertype to full width.
• Half Width: Converts the full-width character type to half width.
• Hiragana: Converts the Katakanastyle of Japanese characters to Hiragana.
• Katakana: Converts the Hiraganastyle of Japanese characters to Katakana.
• Linguistic Casing: Applies the regional linguistic rules for casing.
• Lowercase: Changes all letters in the input to lowercase.
• Traditional Chinese: Converts the simplified Chinese characters to traditional Chinese.
• Simplified Chinese: Converts the traditional Chinesecharacters to simplified Chinese.
• Uppercase: Changes all letters in the input to uppercase.
COPY COLUMN
• A very simple transformation that copies the
output of a column to a clone of itself.
• This is useful if you wish to create a copy of a
column before you perform some elaborate
transformations.
• You could then keep the original value as your control subject and the copy
as the modified column.
DATA CONVERSION
• Performs a similar function to the CONVERT
or CAST functions in T-SQL.
• The Output Alias is the column name you
want to assign to the column after it is
transformed.
• If you don’t assign it a new name, it will later be displayed as Data
Conversion: ColumnName in the Data Flow.
DERIVED COLUMN
• Creates a new column that is calculated
(derived) from the output of another column
or set of columns.
• One of the most important transformations
in Data Flow.
• Examples
• To multiply the quantity of orders by the cost of the order to derive the
total cost of the order
• You can also use it to find out the current date or to fill in the blanks in the
data by using the ISNULL function.
EXPORT COLUMN
• Exports data to a file from the Data Flow
• Unlike the other transformations, the Export Column Transformation
doesn’t need a destination to create the file
• A common example is to extract blob-type
data from fields in a database and create files
in their original formats to be stored in a file
system or viewed by a format viewer, such as
Microsoft Word or Microsoft Paint.
IMPORT COLUMN
• The Import Column Transformation is a
partner to the Export Column transformation.
• These transformations do the work of
translating physical files from system file
storage paths into database blob-type fields,
and vice versa
OLE DB COMMAND
• Designed to execute a SQL statement for
each row in an input stream.
• This task is analogous to an ADO Command object being created,
prepared, and executed for each row of a result set.
• This transformation should be avoided
whenever possible.
• It’s a better practice to land the data into a staging table using an OLE DB
Destination and perform an update with a set-based process in the Control
Flow with an Execute SQL Task.
Rowset
Transformations
• Aggregate
• Sort
• Percentage Sampling
• Row Sampling
• Pivot
• Unpivot
AGGREGATE
• Aggregate data from the Data Flow to apply
certain T-SQL functions that are done in a GROUP
BY statement
• The most important option is Operation.
• Group By: Breaks the data set into groups by the column you specify
• Average: Averages the selected column’s numeric data
• Count: Counts the records in a group
• Count Distinct: Counts the distinct non-NULL values in a group
• Minimum: Returns the minimum numeric value in the group
• Maximum: Returns the maximum numeric value in the group
• Sum: Returns sum of the selected column’s numeric data in the group
• It is a Fully Blocking transformation
SORT
• Ιt is a fully blocking asynchronous transformation
• Enables you to sort data based on any column in
the path.
• When possible avoid using it, because of speed
• However, some transformations, like the Merge Join and Merge, require the
data to be sorted.
• If you place an ORDER BY statement in the OLE DB Source, SSIS is not aware of
the ORDER BY statement.
• If you have an ORDER BY clause in your T-SQL, you can notify SSIS that the data
is already sorted, obviating the need for the Sort Transformation in the
Advanced Editor.
PERCENTAGE & ROW SAMPLING
• Enable you to take the data from the source and randomly select a
subset of data.
• The transformation produces two outputs that you can select.
• One output is the data that was randomly selected,
• and the other is the data that was not selected.
• You can use this to send a subset of data to a development or test server.
• The most useful application of this transformation is to train a data-mining model.
• You can use one output path to train your data-mining model,
• and the sampling to validate the model.
• The Percentage Sampling enables you to select the percentage of
rows
• The Row Sampling Transformation enables you to specify how
many rows you wish to be outputted randomly.
• You can specify the seed that will randomize the data.
• If you select a seed and run the transformation multiple times, the same data will be outputted to the
destination.
• If you uncheck this option, which is the default, the seed will be automatically incremented by one at runtime,
and you will see random data each time.
PIVOT & UNPIVOT
• A pivot table is a result of cross-tabulated
columns generated by summarizing data
from a row format.
• Unpivot is the opposite of Pivot
Split & Join
Transformations
• Conditional Split
• Multicast
• Union All
• Merge
• Merge Join
• Lookup
• Cache
• CDC Splitter
CONDITIONAL SPLIT
• Add complex logic to your Data Flow.
• This transformation enables you to send the data from a single data path to
various outputs or paths based on conditions that use the SSIS expression
language.
• Is similar to a CASE decision structure in a
programming language
• Also provides a default output
• If a row matches no expression it is directed to the default output.
MULTICAST
• Send a single data input to multiple output
paths
• Is similar to the Conditional Split
• Both transformations send data to multiple outputs.
• The Multicast will send all the rows down every output path, whereas the
Conditional Split will conditionally send each row down exactly one output
path.
MERGE
• Merge data from two paths into a single output
• Is useful when you
• wish to break out your Data Flow into a path that handles certain errors and
then merge it back into the main Data Flow downstream after the errors have
been handled
• wish to merge data from two Data Sources.
• But has some restrictions:
• The data must be sorted before.
• You can do this by using the Sort
• Transformation prior to the merge or by specifying an ORDER BY clause in the source
connection.
• The metadata must be the same between both paths.
• For example,the CustomerIDcolumn can’t be a numeric column in one path and a
charactercolumn in another path.
• If you have more than two paths, you should choose the Union All
Transformation.
UNION ALL
• Similar to Merge Transformation
• You can merge data from two or more paths into
a single output
• Does not require sorted data.
• This transformation fixes minor metadata issues.
• For example, if you have one input that is a 20-character string and another that
is 50 characters, the output of this from the Union All Transformation will be the
longer 50-character column.
• You need to open the Union All Transformation Editor only if the column names
from one of the transformations that feed the Union All Transformation have
different column names.
MERGE JOIN
• Merge the output of two inputs and perform
an INNER or OUTER join on the data
• If both inputs are in the same database, then
it would be faster to perform a join at the
OLE DB Source level, rather than use a
transformation through T-SQL.
• Useful when you have two different Data
Sources you wish to merge
LOOKUP
• Performs lookups by joining data in input
columns with columns in a reference dataset
• You use the lookup to access additional
information in a related table that is based on
values in common columns.
• Lookup Caching mechanism
• Full-Cache Mode: stores all the rows resulting from a specified query in
memory
• No-Cache Mode: you can choose to cache nothing, for each input row
component sends a request to the reference table in the database server to ask
for a match
• Partial-Cache Mode: this mode caches only the most recently used data within
the memory, as soon as the cache grows too big, the least-used cache data is
thrown away.
CACHE
• Generates a reference dataset for the Lookup
Transformation
• by writing data from a connected data source in the data flow to a Cache
connection manager.
• You can use the Cache connection manager
• When you want to configure the Lookup Transformation to run in the full
cache mode.
• In this mode, the reference dataset is loaded into cache before the Lookup
Transformation runs.
CDC SPLITTER
• Splits a single flow of change rows from a
CDC source data flow into different data
flows for Insert, Update and Delete
operations.
• The data flow is split based on the required column __$operation and its
standard values in SQL Server 2012 change tables.
Audit Transformations
• Audit
• RowCount
AUDIT
• Allows you to add auditing data to your Data
Flow
• Because of acts such as HIPPA and Sarbanes-Oxley (SOX) governing audits, you
often must be able to track who inserted data into a table and when.
• The task is easy to configure
• Simply select the type of data you want to audit in the Audit Type column and
then name the column that will be outputted to the flow.
• Following are some of the available options:
• Execution instance GUID: GUID that identifies the execution instanceof the package
• Package ID: Unique ID for the package
• Package name: Name of the package
• Version ID: Version GUID of the package
• Execution start time: Time the package began
• Machine name: Machine on which the package ran
• User name: User who started the package
• Task name: Data Flow Task name that holds the Audit Task
• Task ID: Unique identifierfor the Data Flow Task that holds the Audit Task
ROWCOUNT
• Provides the capability to count rows in a
stream that is directed to its input source.
• This transformation must place that count
into a variable that could be used in the
Control Flow
BI Transformations
• Slowly Changing Dimension
• Fuzzy Grouping
• Fuzzy Lookup
• Term Extraction
• Term Lookup
• Data Mining Query
• Data Cleansing
SLOWLY CHANGING DIMENSION
• Provides a great head start in helping to solve a common, classic
changing-dimension problem that occurs in the outer edge of your
data model, the dimension or lookup tables.
• A dimension table contains a set of discrete values with a
description and often other measurable attributes such as price,
weight, or sales territory.
• The classic problem is what to do in your dimension data when an
attribute in a row changes particularly when you are loading data
automatically through an ETL process
• This transformation can shave days off of your development time in
relation to creating the load manually through T-SQL, but it can
add time because of how it queries your destination and how it
updates with the OLE DB Command Transform (row by row)
FUZZY LOOKUP
• Performs data cleaning tasks
• standardizing data, correcting data, and providing missing values
• Uses an equi-join to locate matching records in
the reference table
• It returns records with at least one matching record,
• and returns records with no matching records.
• Uses fuzzy matching to return one or more close matches in the reference table.
• Usually follows a Lookup transformation in a
package data flow
• First, the Lookup transformation tries to find an exact match.
• If it fails, the Fuzzy Lookup transformation provides close matches from the
reference table.
FUZZY GROUPING
• Performs data cleaning tasks
• by identifying rows of data that are likely to be duplicates and
• selecting a canonical row of data to use in standardizing the data.
• Requires a connection to an instance of SQL Server
• to create the temporary SQL Server tables that the transformation algorithm requires to
do its work
• The connection must resolve to a user who has permission to create tables in the database.
• Produces one output row for each input row
• Each row has the following additional columns:
• _key_in: a column that uniquely identifies each row.
• _key_out:,a column that identifies a group of duplicaterows.
• The _key_out column has the value of the _key_in column in the canonical data row.
• Rows with the same value in _key_out are part of the same group
• The _key_out value for a group corresponds to the value of _key_in in the canonical data row.
• _score: a value between 0 and 1 that indicates the similarity of the input row to the
canonical row.
TERM EXTRACTION
• A tool to mine free-flowing text for English
word and phrase frequency
• If you have ever done some word and phrase analysis on websites for
better search engine placement, you are familiar with the job that this
transformation performs
• Based on Term Frequency and Inverse
Document Frequency formula
• TDIDF = (frequency of term) * log((# rows in sample) / (#rows with term or
phrase))
• Output two columns:
• a text phrase
• a statistical value for the phrase relative to the total input stream
TERM LOOKUP
• Uses the same algorithms and statistical models
as the Term Extraction Transformation to break
up an incoming stream into noun or noun phrase
tokens
• It is designed to compare those tokens to a stored
word list and output a matching list of terms and
phrases with simple frequency counts.
• Generating statistics on known phrases of
importance.
• A real world application of this would be to pull out all the customer service
notes that had a given set of terms or that mention a competitor’s name.
DATA MINING QUERY
• Typically is used to fill in gaps in your data or
predict a new column for your Data Flow
• Optionally you can add columns
• such as the probability of a certain condition being true.
• Usage Examples
• You could take columns, such as number of children, household income,
and marital income, to predict a new column that states whether the
person owns a house or not.
• You could predict what customers would want to buy based on their
shopping cart items.
• You could fill the gaps in your data where customers didn’t enter all the
fields in a questionnaire.
DATA CLEANSING
• Performs advanced data cleansing on data
• A business analyst create a series of business
rules that declare what good data looks like
• Create domains that define data in your
company
• such as what a Company Name column should always look like.
Data Flow
Custom
Transformations
• Script Component
• Custom Component
SCRIPT COMPONENT
• Enables you to write custom .NET scripts as
• Transformations
• Sources
• Destinations
• Some of the things you can do with this
transformation
• Create a custom transformation that would use a .NET assembly to validate
credit card numbers or mailing addresses.
• Validate data and skip records that don’t seem reasonable.
• Read from a proprietary system for which no standard provider exists.
• Write a custom component to integrate with a third-party vendor.
• Scripts used as sources can support multiple
outputs
• You have the option of precompiling the scripts for runtime efficiency.
CUSTOM COMPONENT
• In a real-world integration solution, you may have
requirements that the built-in functionality in SSIS
does not meet
• Use Visual Studio and Class Library project
template
• Reference the following assemblies
• Microsoft.SqlServer.DTSPipelineWrap
• Microsoft.SqlServer.DTSRuntimeWrap
• Microsoft.SqlServer.ManagedDTS
• Microsoft.SqlServer.PipelineHost
• In addition the component needs
• Provide a strong name key for signing the assembly.
• Set the build output location to the PipelineComponentsfolder.
• Use a post-build event to install the assembly into the GAC.
• Set assembly-level attributes in the AssemblyInfo.csfile.
Script Component
OPTIMIZING DATA FLOW PERFORMANCE
• Optimize queries:
• Select only the rows and columns that you need
• Avoid unnecessary sorting:
• Use presorted data where possible
• Set the IsSorted property where applicable
• Configure Data Flow task properties:
• Buffer size
• Temporary storage location
• Parallelism
• Optimized mode
Q&AQuestions And Answers
• Introduction to SSIS
• SSIS Tools
• Variables, Parameter, Expressions
• SSIS Tasks
• Containers
• Data Flows
SUMMARY
Thank you
SELECT KNOWLEDGE FROM SQL SERVER

Weitere ähnliche Inhalte

Was ist angesagt?

SQL Server Integration Services – Enterprise Manageability
SQL Server Integration Services – Enterprise ManageabilitySQL Server Integration Services – Enterprise Manageability
SQL Server Integration Services – Enterprise ManageabilityDan English
 
Sql 2016 - What's New
Sql 2016 - What's NewSql 2016 - What's New
Sql 2016 - What's Newdpcobb
 
SSIS 2012: Parameters vs. Configurations
SSIS 2012: Parameters vs. ConfigurationsSSIS 2012: Parameters vs. Configurations
SSIS 2012: Parameters vs. ConfigurationsAllen Smith
 
SQL Server Integration Services
SQL Server Integration ServicesSQL Server Integration Services
SQL Server Integration ServicesRobert MacLean
 
SQL Server 2019 Big Data Cluster
SQL Server 2019 Big Data ClusterSQL Server 2019 Big Data Cluster
SQL Server 2019 Big Data ClusterMaximiliano Accotto
 
SQL Server Integration Services Best Practices
SQL Server Integration Services Best PracticesSQL Server Integration Services Best Practices
SQL Server Integration Services Best PracticesDenny Lee
 
SQL Server 2019 Master Data Service
SQL Server 2019 Master Data ServiceSQL Server 2019 Master Data Service
SQL Server 2019 Master Data ServiceKenichiro Nakamura
 
MS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data miningMS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data miningDataminingTools Inc
 
Implementing Mobile Reports in SQL Sserver 2016 Reporting Services
Implementing Mobile Reports in SQL Sserver 2016 Reporting ServicesImplementing Mobile Reports in SQL Sserver 2016 Reporting Services
Implementing Mobile Reports in SQL Sserver 2016 Reporting ServicesAntonios Chatzipavlis
 
SQL Server Integration Services
SQL Server Integration ServicesSQL Server Integration Services
SQL Server Integration ServicesRobert MacLean
 
Everything you need to know about SQL Server 2016
Everything you need to know about SQL Server 2016Everything you need to know about SQL Server 2016
Everything you need to know about SQL Server 2016Softchoice Corporation
 
05 SSIS Control Flow
05 SSIS Control Flow05 SSIS Control Flow
05 SSIS Control FlowSlava Kokaev
 
Architecture of integration services
Architecture of integration servicesArchitecture of integration services
Architecture of integration servicesSlava Kokaev
 
SQL server 2016 New Features
SQL server 2016 New FeaturesSQL server 2016 New Features
SQL server 2016 New Featuresaminmesbahi
 
Organizational compliance and security SQL 2012-2019 by George Walters
Organizational compliance and security SQL 2012-2019 by George WaltersOrganizational compliance and security SQL 2012-2019 by George Walters
Organizational compliance and security SQL 2012-2019 by George WaltersGeorge Walters
 
SQL Server 2016 New Features and Enhancements
SQL Server 2016 New Features and EnhancementsSQL Server 2016 New Features and Enhancements
SQL Server 2016 New Features and EnhancementsJohn Martin
 
SQL Server 2008 Overview
SQL Server 2008 OverviewSQL Server 2008 Overview
SQL Server 2008 OverviewDavid Chou
 

Was ist angesagt? (20)

SQL Server Integration Services – Enterprise Manageability
SQL Server Integration Services – Enterprise ManageabilitySQL Server Integration Services – Enterprise Manageability
SQL Server Integration Services – Enterprise Manageability
 
Sql 2016 - What's New
Sql 2016 - What's NewSql 2016 - What's New
Sql 2016 - What's New
 
SSIS 2012: Parameters vs. Configurations
SSIS 2012: Parameters vs. ConfigurationsSSIS 2012: Parameters vs. Configurations
SSIS 2012: Parameters vs. Configurations
 
Introduction to azure document db
Introduction to azure document dbIntroduction to azure document db
Introduction to azure document db
 
SQL Server Integration Services
SQL Server Integration ServicesSQL Server Integration Services
SQL Server Integration Services
 
SQL Server 2019 Big Data Cluster
SQL Server 2019 Big Data ClusterSQL Server 2019 Big Data Cluster
SQL Server 2019 Big Data Cluster
 
SQL Server Integration Services Best Practices
SQL Server Integration Services Best PracticesSQL Server Integration Services Best Practices
SQL Server Integration Services Best Practices
 
SQL Server 2019 Master Data Service
SQL Server 2019 Master Data ServiceSQL Server 2019 Master Data Service
SQL Server 2019 Master Data Service
 
MS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data miningMS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data mining
 
Implementing Mobile Reports in SQL Sserver 2016 Reporting Services
Implementing Mobile Reports in SQL Sserver 2016 Reporting ServicesImplementing Mobile Reports in SQL Sserver 2016 Reporting Services
Implementing Mobile Reports in SQL Sserver 2016 Reporting Services
 
SQL Server Integration Services
SQL Server Integration ServicesSQL Server Integration Services
SQL Server Integration Services
 
Everything you need to know about SQL Server 2016
Everything you need to know about SQL Server 2016Everything you need to know about SQL Server 2016
Everything you need to know about SQL Server 2016
 
05 SSIS Control Flow
05 SSIS Control Flow05 SSIS Control Flow
05 SSIS Control Flow
 
Architecture of integration services
Architecture of integration servicesArchitecture of integration services
Architecture of integration services
 
SQL server 2016 New Features
SQL server 2016 New FeaturesSQL server 2016 New Features
SQL server 2016 New Features
 
Organizational compliance and security SQL 2012-2019 by George Walters
Organizational compliance and security SQL 2012-2019 by George WaltersOrganizational compliance and security SQL 2012-2019 by George Walters
Organizational compliance and security SQL 2012-2019 by George Walters
 
Stretch db sql server 2016 (sn0028)
Stretch db   sql server 2016 (sn0028)Stretch db   sql server 2016 (sn0028)
Stretch db sql server 2016 (sn0028)
 
SQL Server 2016 New Features and Enhancements
SQL Server 2016 New Features and EnhancementsSQL Server 2016 New Features and Enhancements
SQL Server 2016 New Features and Enhancements
 
What's new in SQL Server Integration Services 2012?
What's new in SQL Server Integration Services 2012?What's new in SQL Server Integration Services 2012?
What's new in SQL Server Integration Services 2012?
 
SQL Server 2008 Overview
SQL Server 2008 OverviewSQL Server 2008 Overview
SQL Server 2008 Overview
 

Andere mochten auch

Sql server-integration-services-ssis-step-by-step-sample-chapters
Sql server-integration-services-ssis-step-by-step-sample-chaptersSql server-integration-services-ssis-step-by-step-sample-chapters
Sql server-integration-services-ssis-step-by-step-sample-chaptersNadinKa Karimou
 
Managing and Configuring SSIS Packages
Managing and Configuring SSIS PackagesManaging and Configuring SSIS Packages
Managing and Configuring SSIS Packagesrpeterson1
 
Top 9 bi interview questions answers
Top 9 bi interview questions answersTop 9 bi interview questions answers
Top 9 bi interview questions answershudsons168
 
Biml for Beginners: Speed up your SSIS development (SQLSaturday Vienna)
Biml for Beginners: Speed up your SSIS development (SQLSaturday Vienna)Biml for Beginners: Speed up your SSIS development (SQLSaturday Vienna)
Biml for Beginners: Speed up your SSIS development (SQLSaturday Vienna)Cathrine Wilhelmsen
 
8\9 SSIS 2008R2_Training - Debugging_Package
8\9 SSIS 2008R2_Training - Debugging_Package8\9 SSIS 2008R2_Training - Debugging_Package
8\9 SSIS 2008R2_Training - Debugging_PackagePramod Singla
 
6.1\9 SSIS 2008R2_Training - DataFlow Transformations
6.1\9 SSIS 2008R2_Training - DataFlow Transformations6.1\9 SSIS 2008R2_Training - DataFlow Transformations
6.1\9 SSIS 2008R2_Training - DataFlow TransformationsPramod Singla
 
6.2\9 SSIS 2008R2_Training - DataFlow Transformations
6.2\9 SSIS 2008R2_Training - DataFlow Transformations6.2\9 SSIS 2008R2_Training - DataFlow Transformations
6.2\9 SSIS 2008R2_Training - DataFlow TransformationsPramod Singla
 
Sql interview questions and answers
Sql interview questions and  answersSql interview questions and  answers
Sql interview questions and answerssheibansari
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingEdureka!
 
MySQL InnoDB Cluster and NDB Cluster
MySQL InnoDB Cluster and NDB ClusterMySQL InnoDB Cluster and NDB Cluster
MySQL InnoDB Cluster and NDB ClusterMario Beck
 
SSIS Basic Data Flow
SSIS Basic Data FlowSSIS Basic Data Flow
SSIS Basic Data FlowRam Kedem
 
Introduction to MSBI
Introduction to MSBIIntroduction to MSBI
Introduction to MSBIEdureka!
 

Andere mochten auch (15)

Sql server-integration-services-ssis-step-by-step-sample-chapters
Sql server-integration-services-ssis-step-by-step-sample-chaptersSql server-integration-services-ssis-step-by-step-sample-chapters
Sql server-integration-services-ssis-step-by-step-sample-chapters
 
Managing and Configuring SSIS Packages
Managing and Configuring SSIS PackagesManaging and Configuring SSIS Packages
Managing and Configuring SSIS Packages
 
SSIS Presentation
SSIS PresentationSSIS Presentation
SSIS Presentation
 
Informational interview
Informational interviewInformational interview
Informational interview
 
Top 9 bi interview questions answers
Top 9 bi interview questions answersTop 9 bi interview questions answers
Top 9 bi interview questions answers
 
Biml for Beginners: Speed up your SSIS development (SQLSaturday Vienna)
Biml for Beginners: Speed up your SSIS development (SQLSaturday Vienna)Biml for Beginners: Speed up your SSIS development (SQLSaturday Vienna)
Biml for Beginners: Speed up your SSIS development (SQLSaturday Vienna)
 
Good sql server interview_questions
Good sql server interview_questionsGood sql server interview_questions
Good sql server interview_questions
 
8\9 SSIS 2008R2_Training - Debugging_Package
8\9 SSIS 2008R2_Training - Debugging_Package8\9 SSIS 2008R2_Training - Debugging_Package
8\9 SSIS 2008R2_Training - Debugging_Package
 
6.1\9 SSIS 2008R2_Training - DataFlow Transformations
6.1\9 SSIS 2008R2_Training - DataFlow Transformations6.1\9 SSIS 2008R2_Training - DataFlow Transformations
6.1\9 SSIS 2008R2_Training - DataFlow Transformations
 
6.2\9 SSIS 2008R2_Training - DataFlow Transformations
6.2\9 SSIS 2008R2_Training - DataFlow Transformations6.2\9 SSIS 2008R2_Training - DataFlow Transformations
6.2\9 SSIS 2008R2_Training - DataFlow Transformations
 
Sql interview questions and answers
Sql interview questions and  answersSql interview questions and  answers
Sql interview questions and answers
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
MySQL InnoDB Cluster and NDB Cluster
MySQL InnoDB Cluster and NDB ClusterMySQL InnoDB Cluster and NDB Cluster
MySQL InnoDB Cluster and NDB Cluster
 
SSIS Basic Data Flow
SSIS Basic Data FlowSSIS Basic Data Flow
SSIS Basic Data Flow
 
Introduction to MSBI
Introduction to MSBIIntroduction to MSBI
Introduction to MSBI
 

Ähnlich wie Ssn0020 ssis 2012 for beginners

3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sqlŁukasz Grala
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersMichael Rys
 
Sql server 2016: System Databases, data types, DML, json, and built-in functions
Sql server 2016: System Databases, data types, DML, json, and built-in functionsSql server 2016: System Databases, data types, DML, json, and built-in functions
Sql server 2016: System Databases, data types, DML, json, and built-in functionsSeyed Ibrahim
 
SQL SCIPY STREAMLIT_Introduction to the basic of SQL SCIPY STREAMLIT
SQL SCIPY STREAMLIT_Introduction to the basic of SQL SCIPY STREAMLITSQL SCIPY STREAMLIT_Introduction to the basic of SQL SCIPY STREAMLIT
SQL SCIPY STREAMLIT_Introduction to the basic of SQL SCIPY STREAMLITchaitalidarode1
 
SQL Server 2008 For Developers
SQL Server 2008 For DevelopersSQL Server 2008 For Developers
SQL Server 2008 For DevelopersJohn Sterrett
 
Bringing DevOps to the Database
Bringing DevOps to the DatabaseBringing DevOps to the Database
Bringing DevOps to the DatabaseMichaela Murray
 
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012Andrew Brust
 
Columnstore indexes in sql server 2014
Columnstore indexes in sql server 2014Columnstore indexes in sql server 2014
Columnstore indexes in sql server 2014Antonios Chatzipavlis
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Michael Rys
 
Azure Digital Twins 2.0
Azure Digital Twins 2.0Azure Digital Twins 2.0
Azure Digital Twins 2.0Marco Parenzan
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentationMichael Keane
 
World2016_T5_S5_SQLServerFunctionalOverview
World2016_T5_S5_SQLServerFunctionalOverviewWorld2016_T5_S5_SQLServerFunctionalOverview
World2016_T5_S5_SQLServerFunctionalOverviewFarah Omer
 
Spatial Data in SQL Server
Spatial Data in SQL ServerSpatial Data in SQL Server
Spatial Data in SQL ServerEduardo Castro
 
Exciting Features for SQL Devs in SQL 2012
Exciting Features for SQL Devs in SQL 2012Exciting Features for SQL Devs in SQL 2012
Exciting Features for SQL Devs in SQL 2012Brij Mishra
 
DesignMind SQL Server 2008 Migration
DesignMind SQL Server 2008 MigrationDesignMind SQL Server 2008 Migration
DesignMind SQL Server 2008 MigrationMark Ginnebaugh
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Michael Rys
 
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptxShshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx086ChintanPatel1
 

Ähnlich wie Ssn0020 ssis 2012 for beginners (20)

3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
 
Using T-SQL
Using T-SQL Using T-SQL
Using T-SQL
 
Sql server 2016: System Databases, data types, DML, json, and built-in functions
Sql server 2016: System Databases, data types, DML, json, and built-in functionsSql server 2016: System Databases, data types, DML, json, and built-in functions
Sql server 2016: System Databases, data types, DML, json, and built-in functions
 
SQL SCIPY STREAMLIT_Introduction to the basic of SQL SCIPY STREAMLIT
SQL SCIPY STREAMLIT_Introduction to the basic of SQL SCIPY STREAMLITSQL SCIPY STREAMLIT_Introduction to the basic of SQL SCIPY STREAMLIT
SQL SCIPY STREAMLIT_Introduction to the basic of SQL SCIPY STREAMLIT
 
SQL Server 2008 For Developers
SQL Server 2008 For DevelopersSQL Server 2008 For Developers
SQL Server 2008 For Developers
 
Bringing DevOps to the Database
Bringing DevOps to the DatabaseBringing DevOps to the Database
Bringing DevOps to the Database
 
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
 
Columnstore indexes in sql server 2014
Columnstore indexes in sql server 2014Columnstore indexes in sql server 2014
Columnstore indexes in sql server 2014
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
 
Sravya(1)
Sravya(1)Sravya(1)
Sravya(1)
 
Azure Digital Twins 2.0
Azure Digital Twins 2.0Azure Digital Twins 2.0
Azure Digital Twins 2.0
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
 
World2016_T5_S5_SQLServerFunctionalOverview
World2016_T5_S5_SQLServerFunctionalOverviewWorld2016_T5_S5_SQLServerFunctionalOverview
World2016_T5_S5_SQLServerFunctionalOverview
 
Spatial Data in SQL Server
Spatial Data in SQL ServerSpatial Data in SQL Server
Spatial Data in SQL Server
 
Exciting Features for SQL Devs in SQL 2012
Exciting Features for SQL Devs in SQL 2012Exciting Features for SQL Devs in SQL 2012
Exciting Features for SQL Devs in SQL 2012
 
DesignMind SQL Server 2008 Migration
DesignMind SQL Server 2008 MigrationDesignMind SQL Server 2008 Migration
DesignMind SQL Server 2008 Migration
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
 
Sql Server2008
Sql Server2008Sql Server2008
Sql Server2008
 
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptxShshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
 

Mehr von Antonios Chatzipavlis

Workload Management in SQL Server 2019
Workload Management in SQL Server 2019Workload Management in SQL Server 2019
Workload Management in SQL Server 2019Antonios Chatzipavlis
 
Loading Data into Azure SQL DW (Synapse Analytics)
Loading Data into Azure SQL DW (Synapse Analytics)Loading Data into Azure SQL DW (Synapse Analytics)
Loading Data into Azure SQL DW (Synapse Analytics)Antonios Chatzipavlis
 
Building diagnostic queries using DMVs and DMFs
Building diagnostic queries using DMVs and DMFs Building diagnostic queries using DMVs and DMFs
Building diagnostic queries using DMVs and DMFs Antonios Chatzipavlis
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure Antonios Chatzipavlis
 
Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019Antonios Chatzipavlis
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure Antonios Chatzipavlis
 
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018 Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018 Antonios Chatzipavlis
 
Introduction to Machine Learning on Azure
Introduction to Machine Learning on AzureIntroduction to Machine Learning on Azure
Introduction to Machine Learning on AzureAntonios Chatzipavlis
 

Mehr von Antonios Chatzipavlis (20)

Data virtualization using polybase
Data virtualization using polybaseData virtualization using polybase
Data virtualization using polybase
 
SQL server Backup Restore Revealed
SQL server Backup Restore RevealedSQL server Backup Restore Revealed
SQL server Backup Restore Revealed
 
Migrate SQL Workloads to Azure
Migrate SQL Workloads to AzureMigrate SQL Workloads to Azure
Migrate SQL Workloads to Azure
 
Machine Learning in SQL Server 2019
Machine Learning in SQL Server 2019Machine Learning in SQL Server 2019
Machine Learning in SQL Server 2019
 
Workload Management in SQL Server 2019
Workload Management in SQL Server 2019Workload Management in SQL Server 2019
Workload Management in SQL Server 2019
 
Loading Data into Azure SQL DW (Synapse Analytics)
Loading Data into Azure SQL DW (Synapse Analytics)Loading Data into Azure SQL DW (Synapse Analytics)
Loading Data into Azure SQL DW (Synapse Analytics)
 
Introduction to DAX Language
Introduction to DAX LanguageIntroduction to DAX Language
Introduction to DAX Language
 
Building diagnostic queries using DMVs and DMFs
Building diagnostic queries using DMVs and DMFs Building diagnostic queries using DMVs and DMFs
Building diagnostic queries using DMVs and DMFs
 
Exploring T-SQL Anti-Patterns
Exploring T-SQL Anti-Patterns Exploring T-SQL Anti-Patterns
Exploring T-SQL Anti-Patterns
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
SQLServer Database Structures
SQLServer Database Structures SQLServer Database Structures
SQLServer Database Structures
 
Sqlschool 2017 recap - 2018 plans
Sqlschool 2017 recap - 2018 plansSqlschool 2017 recap - 2018 plans
Sqlschool 2017 recap - 2018 plans
 
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018 Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
 
Microsoft SQL Family and GDPR
Microsoft SQL Family and GDPRMicrosoft SQL Family and GDPR
Microsoft SQL Family and GDPR
 
Statistics and Indexes Internals
Statistics and Indexes InternalsStatistics and Indexes Internals
Statistics and Indexes Internals
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
 
Azure SQL Data Warehouse
Azure SQL Data Warehouse Azure SQL Data Warehouse
Azure SQL Data Warehouse
 
Introduction to Machine Learning on Azure
Introduction to Machine Learning on AzureIntroduction to Machine Learning on Azure
Introduction to Machine Learning on Azure
 

Ssn0020 ssis 2012 for beginners

  • 1. SQL Saturday Night SQL Server 2012 Integration Services for Beginners June 01, 2013 20th
  • 2. SQL Server 2012 Integration Services for Beginners 20th SQL Saturday Night May 25, 2013
  • 3. Antonios Chatzipavlis Solution Architect – SQL Server Evangelist • I have been started with computers since 1982. • In 1988 I started my professional carrier in computers industry. • In 1996 I have been started to work with SQL Server version 6.0 • In 1998 I earned my first certification at Microsoft as Microsoft Certified Solution Developer (3rd in Greece). • Since then I earned more than 30 certifications (and counting!) • MCT, MCITP, MCPD, MCSD, MCDBA, MCSA, MCTS, MCAD, MCP, OCA, • MCSE : Data Platform • In 1998 started my carrier as Microsoft Certified Trainer (MCT). • Since then I have more than 15.000 hours of training. • In 2010 I became for first time Microsoft MVP on SQL Server • In 2010 I created the SQL School Greece (www.sqlschool.gr) • I am board member of IAMCT organization and Leader of Greek Chapter of IAMCT organization. • In 2012 I became MCT Regional Lead by Microsoft Learning Program. • I am moderator of autoexec.gr and member of dotnetzone.gr SP_WHO
  • 4. @antoniosch @sqlschool SQL School Greece www.sqlschool.gr help@sqlschool.gr COMMUNICATION
  • 5. Agenda SQL Server 2012 Integration Services for Beginners
  • 6. • Introduction to SSIS • SSIS Tools • Variables, Parameter, Expressions • SSIS Tasks • Containers • Data Flows AGENDA
  • 7. Introduction to SSIS SQL Server 2012 Integration Services for Beginners
  • 8. WHAT IS SSIS? • A platform for ETL operations • Installed as a feature of SQL Server • Useful for DBAs, Developers, Data Analysts • DTS Evolution
  • 9. SSIS ARCHITECTURE (1) • SSIS Project • A versioned container for parameters and packages • A unit of deployment to an SSIS Catalog • SSIS Package • A unit of task flow execution • A unit of deployment (package deployment model)
  • 10. SSIS ARCHITECTURE (2) Control Flow • It is the brain of package • Orchestrates the order of execution for all its components • Tasks • Individual units of work • Precedence constraints • Directs tasks to execute in a give order • Defines the workflow of SSIS package • Containers • Core units for grouping tasks together logically into units of work • Give us the abilityto define variables and events within container scope
  • 11. SSIS ARCHITECTURE (3) Data Flow • It is the heart of package • Provides the capability to implements ETL • Sources • A component which specify the location of the source data • Transformation • Allow changes to the data within the data pipeline • Destinations • A component which specify the destination of the transformed data
  • 12. SSIS ARCHITECTURE (4) • Variables • SSIS variables can be set to evaluate to an expression at runtime • Parameters • Parameters behave much like variables but with a few main exceptions. • Parameters can make a package dynamic. • Parameters can be set outside the package easily • Can be designated as values that must be passed in for the package to start • Parameters are new to SSIS in SQL Server 2012 and replace the capabilities of Configurations in previous releases of SQL Server. • Error Handling and Logging
  • 13. SSIS Tools SQL Server 2012 Integration Services for Beginners
  • 14. IMPORT – EXPORT WIZARD The easiest method to move data Uses SSIS as a framework Optionally we can save a package
  • 17. Using SQL Server Data Tools to create SSIS Project & Package
  • 18. Variables, Parameter, Expressions SQL Server 2012 Integration Services for Beginners
  • 19. SSIS DATA TYPES • Named differently than similar types in .NET or T-SQL • C:Program FilesMicrosoft SQL Server110DTSMappingFiles • The .NET managed types are important only if you are using: • Script component • CLR • .NET-based coding to manipulate your Data Flows.
  • 20. SSIS DATA TYPE SQL SERVER DATA TYPE .NET DATA TYPE DT_NUMERIC numeric System.Decimal DT_DECIMAL Decimal DT_CY numeric, decimal DT_I1 System.Sbyte DT_I2 smallint System.Int16 DT_I4 int System.Int32 DT_BOOL Bit System.Boolean DT_I8 bigint System.Int64 DT_R4 real System.Single DT_R8 float System.Double DT_U1 tinyint System.Byte DT_U2 System.UInt16 DT_U4 System.UInt32 DT_U8 System.UInt64 DT_GUID Uniqueidentifier System.Guid SSIS NUMERIC DATA TYPES TABLE MAPPINGS
  • 21. SSIS DATA TYPE SQL SERVER DATA TYPE .NET DATA TYPE DT_WSTR nvarchar, nchar System.String DT_STR varchar, char DT_TEXT text DT_NTEXT ntext, sql_variant, xml DT_BYTES binary, varbinary System.Byte() DT_IMAGE timestamp, image DT_DBTIMESTAMP smalldatetime, datetime System.DateTime DT_DBTIMESTAMP2 datetime DT_DBDATE Date DT_DATE DT_FILETIME DT_DBDATETIMESTAMPOFFSET datetimeoffset DT_DBTIME2 time System.TimeSpan DT_DBTIME SSIS STRING AND DATE-TIME DATA TYPES TABLE MAPPINGS
  • 22. PERFORMANCE AND DATA TYPES • Convert only when necessary. • Don’t convert columns from a data source that will be dropped from the data stream. • Each conversion costs something. • Convert to the closest type for your destination source using the mapping files. • If a value is converted to a non-supported data type, you’ll incur an additional conversion internal to SSIS to the mapped data type. • Convert using the closest size and precision. • Don’t import all columns as 50-character data columns if you are working with a fixed or reliable file format with columns that don’t require as much space. • Evaluate the option to convert after the fact. • Remember that SSIS is still an ETL tool and sometimes it is more efficient to stage the data and convert it using set-based methods.
  • 23. UNICODE AND NON-UNICODE • Unicode data type is the default • Default Import behavior • All the string SSIS functions expect Unicode strings as input • Use Data Conversion Transformation to convert non-Unicode data types to appropriate Unicode data type according to the mapping table
  • 24. CONVERSION IN SSIS EXPRESSIONS • Use explicit casting to avoid troubles • Casting is easy • It looks like .NET • (DT_I4) 32 • Casting operators parameters • Length – Final string length • Code_Page – Unicode character set • Precision • Scale
  • 25. VARIABLES • Key feature in SSIS • User Variables • Variables created by an SSIS developer to hold dynamic values • Defined in the User namespace by default User::MyVar • Defined at a specified scope • Syntax: User::MyVar • System Variables • Built-in variables with dynamic system values • Defined in the System namespace • Syntax: System::MyVar
  • 26. PARAMETERS • New feature in SQL Server 2012. • Similar to a variable • Can store information • It has a few different properties and uses • Parameters are set externally • Project parameters • Package parameters • Replace package configurations • when using the project deployment model • Required property of the parameter • Necessitatethe caller of the package to pass in a value for the parameter
  • 27. PROJECT PARAMETERS • Created at the project level • Can be used in all packages that are included in that project. • Best used for values that should be shared among packages such as e-mail addresses for error messages.
  • 28. PACKAGE PARAMETERS • Created at the package level • Can be used only in that package • Best used for values specific to that package, such as directory locations.
  • 29. PARAMETERS VS VARIABLES Parameters If you want to set the value of something from outside the package Variables If you want to create or store values only within a package
  • 30. VARIABLE & PARAMETERS DATA TYPES VARIABLE DATA TYPE SSIS DATA TYPE REMARKS Boolean DT_BOOL Be careful setting these data types in code because the expression language and .NET languages define these differently Byte DT_UI1 A 1-byte unsigned int. (Note this is not a byte array.) Char DT_UI2 DateTime DT_DBTIMESTAMP DBNull N/A Double DT_R8 Int16 DT_I2 Int32 DT_I4 Int64 DT_I8 Object N/A An object reference. Typically used to store data sets or large object structures Sbyte DT_I1 Single DT_R4 String DT_WSTR UInt32 DT_UI4 UInt64 DT_UI8
  • 31. EXPRESSIONS • Used to set values dynamically • Properties • Conditional split criteria • Derived column values • Precedence constraints • Based on Integration Services expression syntax • Can include variables and parameters • The language is heavily C#-like but contains a Visual Basic flavor and sometimes T-SQL for functions • Can be created graphically by using Expression Builder • Expression adorners, a new feature in SQL Server 2012
  • 32. EXPRESSION OPERATORS • || Logical OR • && Logical AND • == Determine equivalency • != Determine inequality • ? : Conditional operator
  • 33. EXPRESSION BUILDER • Friendly UI • Provides • Syntax check • Expression evaluation
  • 34. EXPRESSION SYNTAX BASICS • Equivalence Operator (==) • String Concatenation (+) • Line Continuation (n) • Literals • Numeric Literals • 30L -> DT_I8 • 30U -> DT_R8 • 30.0F -> DT_R4 • 30.3E -> DT_R8 • String Literals • n CRLF • t TAB • ” Quotation mark • Backslash • Boolean Literals • @User::Var == True
  • 35. DEALING WITH NULL IN SSIS • In SSIS variable can not be set to NULL • IsNull SSIS function <> ISNULL T-SQL • Each data type maintains a default value VARIABLE DATA TYPE DEFAULT VALUE VARIABLE DATA TYPE DEFAULT VALUE Boolean False DBNull N/A in an expression Byte 0 Object N/A in an expression Char 0 String “” DateTime 30/12/1899 Sbyte 0 Double 0 Single 0 Int16 0 Int32 0 UInt32 0 Int64 0 UInt64 0
  • 36. STRING FUNCTIONS • SSIS String Functions are UNICODE • Is different from SQL Server string functions • The comparison is case and padding sensitive • Comparing strings requires that you have two strings of the same padding length and case.
  • 37. USING EXPRESSIONS IN SSIS PACKAGE • Variable and Parameters as Expressions • Expressions in Connection Manager Properties • Expressions in Control Flow Tasks • Expressions in Control Flow Precedence • Expression Task new in SQL Server 2012 • Expression in Data Flow
  • 38. SSIS Tasks SQL Server 2012 Integration Services for Beginners
  • 39. WHAT IS A TASK • A foundation of the Control Flow in SSIS • A discrete unit of work that can perform typical actions • Each task has a set of setup parameters • Setup parameters are visible at Task Editor • Most of these parameters can be dynamic by using expression on Task Editor Expression Tab
  • 40. CONTROL FLOW TASKS Data Flow Tasks Database Tasks File & Internet Tasks • Data Flow • Data Profiling • Bulk Insert • Execute SQL • Execute T-SQL • CDC Control • File System • FTP • XML • Web Service • Send Mail Process Execution Task WMI Tasks Custom Logic Tasks • Execute Package • Execute Process • WMI Data Reader • WMI Event Watcher • Script • Custom Tasks Database Transfer Tasks Analysis Services Tasks SQL Server Maintenance Tasks • Transfer Database • Transfer Error Messages • Transfer Jobs • Transfer Logins • Transfer Master Stored Procedures • Transfer SQL Server Objects • Analysis Services Execute DDL • Analysis Services Processing • Data Mining Query • Back Up Database • Check Database Integrity • History Cleanup • Maintenance Cleanup • Notify Operator • Rebuild Index • Reorganize Index • Shrink Database • Update Statistics
  • 41. PRECEDENCE CONSTRAINTS Connect sequences of tasks Three control flow conditions Success Failure Completion Multiple constraints Logical AND Logical OR Task 1 Task 2 Task 3 Task 4 Task 5 Task 10 Task 6 Task 7 Success (AND) Failure (AND) Completion (AND) Success (OR) Failure (OR) Completion (OR) Task 9 Task 8
  • 42. TASK EDITOR • Easy to access • Consistent design • Name and Description generic properties
  • 43. TASK EDITOR SSIS uses a concept of setting the value of most task properties to a dynamic expression that is evaluated at runtime
  • 45. COMMON PROPERTIES (1) • If set to true, SSIS will not validate any of the properties set in the task until runtime. • This is useful if you are operating in a disconnected mode • Want to enter a value for production that cannot be validated until the package is deployed, or you are dynamically setting the properties using expressions • The default value for this property is false. DelayValidation
  • 46. COMMON PROPERTIES (2) • The description of what the instance of the task does. • The default name for this is <task name> • if you have multiple tasks of the same type, it would read <task name 1> (where the number 1 increments). • This property does not have to be unique and is optional. • If you do provide details here, it will display in the tooltip when hovering over the task object. • For consistency, the property should accurately describe what the task does for people who may be monitoring the package in your operations group. Description
  • 47. COMMON PROPERTIES (3) • If set to true, the task is disabled and will not execute. • Helpful if you are testing a package and want to disable the execution of a task temporarily. • Is the equivalent of commenting a task out temporarily. • Is set to false by default. Disable
  • 48. COMMON PROPERTIES (4) • Contains the name of the custom variable that will store the output of the task’s execution. • The default value of this property is <none> means the execution output is not stored • Enables the task to expose information related to the results of the internal actions within the task. ExecValueVariable
  • 49. COMMON PROPERTIES (5) • If set to true • The entire package will fail if the individual task fails. • Typically, you want to control what happens to a package if a task fails with a custom error handler or Control Flow. • By default, this property is set to false. FailPackageOnFailure
  • 50. COMMON PROPERTIES (6) • If set to true • The task’s parent will fail if the individual task reports an error. • The task’s parent can be a package or container • By default, this property is set to false. FailParentOnFailure
  • 51. COMMON PROPERTIES (7) • Read-only • Automatically generated unique ID • Associated with an instance of a task. • The ID is in GUID format • {BK4FH3I-RDN3-I8RF-KU3F-JF83AFJRLS}. ID
  • 52. COMMON PROPERTIES (8) • Specifies the isolation level of the transaction • The values are • Chaos • ReadCommitted • ReadUncommitted • RepeatableRead • Serializable • Unspecified • Snapshot. • The default value of this property is Serializable IsolationLevel
  • 53. COMMON PROPERTIES (9) • Specifies the type of logging that will be performed for this task. • The values are • UseParentSetting • Enabled • Disabled. • The default value is UseParentSetting • Basic logging is turned on at the package level by default in SQL Server 2012. LoggingMode
  • 54. COMMON PROPERTIES (10) • The name associated with the task. • The default name for this is <task name> • Ιf you have multiple tasks of the same type, <task name 1> (where the number 1 increments) • Change this name to make it more readable to an operator at runtime • Ιt must be unique inside your package. • Used to identify the task programmatically Name
  • 55. COMMON PROPERTIES (11) • Specifies the transaction attribute for the task. • Possible values are • NotSupported • Supported • Required. • The default value is Supported • Supported enables the option for you to use transactions in your task. TransactionOption
  • 56. DATA FLOW TASK • The heart of SSIS • Has its own design surface • Encapsulates all the data transformation aspects of ETL • Each Data Flow tasks corresponding to separate Data Flow surface • Split and handle data in pipelines based on data element
  • 57. Database Tasks • Data Profiling • Bulk Insert • Execute SQL • Execute T-SQL • CDC Control
  • 58. DATA PROFILER TASK • Examining data and collecting metadata about the quality of the data • About frequency of statistical patterns, interdependencies, uniqueness, and redundancy. • Important for the overall quality and health of an operational data store (ODS) or data warehouse. • Doesn’t have built-in conditional workflow logic • but technically you can use XPath queries on the results • Creates a report on data statistics • You still need to make judgments about these statistics. • For example, a column may contain an overwhelming amount of NULL values, but the profiler doesn’t know whether this reflects a valid business scenario. EXAMINING AREAS o Candidate Key Profile Request o ColumnLength Distribution Profile o ColumnNull Ratio Profile Request o ColumnPattern Profile Request o ColumnStatistics Profile Request o FunctionalDependency Profile Request o ValueInclusionProfile Request
  • 60. BULK INSERT TASK • Inserts data from a text or flat file into a SQL Server database • Similar to BULK INSERT statement or BCP.exe • Very fast operation especially with large amount of data • In fact this is a wizard to store the information needed to create and execute a bulk copying command at runtime • Has no ability to transform data • Because of this give us the fastest way to load data
  • 61. EXECUTE SQL TASK • Is one of the most widely used task in SSIS • Used for • Truncating a staging data table prior to importing • Retrieving row counts to determine the next step in a workflow • Calling stored proc to perform business logic against sets of stage data • Retrieve information from a database repository • Executing a Parameterized SQL Statement • ? Indicates the parameter when we are using ADO, ODBC, OLEDB and EXCEL • ADO, OLEDB and EXCEL is zero base • ODBC starts from 1 • @<Real Param Name> when we are using ADO.NET
  • 63. EXECUTE T-SQL STATEMENT TASK • Similar to Execute SQL Task • Supports only T-SQL for SQL Server
  • 64. CDC CONTROL TASK • Used to control the life cycle of change data capture (CDC) • Handles CDC package synchronization • Maintains the state of the CDC package • Supports two groups of operations. • One group handles the synchronization of initial load and change processing • The other manages the change-processing range of LSNs for a run of a CDC package and keeps track of what was processed successfully.
  • 65. File & Internet Tasks • File System • FTP • XML • Web Service • Send Mail
  • 66. FILE SYSTEM TASK • Performs file operations available in the System.IO.File .NET class. • The creation of directory structures does not have to be made recursively as we did in the DTS legacy product. • It is written for a single operation • If you need to iterate over a series of fi les or directories, the File System Task can be simply placed within a Looping Container
  • 68. FTP TASK • Enables the use of the File Transfer Protocol (FTP) in your package development tasks. • Exposes more FTP command capability • Enabling you to create or remove local and remote directories and files. • Another change from the legacy DTS FTP Task is the capability to use FTP in passive mode. • This solves the problem that DTS had in communicating with FTP servers when the firewalls filtered the incoming data port connection to the server.
  • 71. WEB SERVICE TASK • Used to retrieve XML-based result sets by executing a method on a web service • Only retrieves the data • It doesn’t yet address the need to navigate through the data, or extract sections of the resulting documents. • Can be used in SSIS to provide real-time validation of data in your ETL processes or to maintain lookup or dimensional data. • Requires creation of an HTTP Connection Manager • To a specific HTTP endpoint on a website or to a specific Web Services Description Language (WSDL) file on a website
  • 72. SEND MAIL TASK • Sending e-mail messages via SMTP • Only supports Windows and Anonymous authentication. • Google mail needs basic authentication. • So you cannot configure SMTP Connection Manager in SSIS with an external SMTP server like Gmail, Yahoo etc
  • 74. Process Execution Task • Execute Package • Execute Process
  • 75. EXECUTE PACKAGE TASK • Enables you to build SSIS solutions called parent packages that execute other packages called child packages • Several improvements have simplified the task: • The child packages can be run as either in-process or out-of-process executables. • A big difference in this release of the task compared to its 2008 or 2005 predecessor is that you execute packages within a project to make migrating the code from development to QA much easier. • The task enables you to easily map parameters in the parent package to the child packages now too.
  • 76. EXECUTE PROCESS TASK • Executes a Windows or console application inside of the Control Flow. • The most common example would have to be unzipping packed or encrypted data files with a command-line tool • The configuration items for this task are: • RequireFullFileName • Executable • WorkingDirectory • StandardInputVariable • StandardOutputVariable • StandardErrorVariable • FailTaskIfReturnCodeIsNotSuccessValue • Timeout • TerminateProcessAfterTimeOut • WindowStyle
  • 77. Process Execution Task • Execute Package • Execute Process
  • 78. WMI Tasks • WMI Data Reader • WMI Event Watcher
  • 79. WMI DATA READER TASK • Οne of the best-kept secrets in Windows • Enables you to manage Windows servers and workstations through a scripting interface similar to running a T-SQL query • This task enables you to interface with this environment by writing WQL queries • The output of this query can be written to a file or variable for later consumption • You could use the WMI Data Reader Task to: • Read the event log looking for a given error. • Query the list of applications that are running. • Query to see how much RAM is available at package execution for debugging. • Determine the amount of free space on a hard drive.
  • 80. WMI EVENT WATCHER TASK • The WMI Event Watcher Task empowers SSIS to wait for and respond to certain WMI events that occur in the operating system. • The following are some of the useful things you can do with this task: • Watch a directory for a certain fi le to be written. • Wait for a given service to start. • Wait for the memory of a server to reach a certain level before executing the rest of the package or before transferring fi les to the server. • Watch for the CPU to be free.
  • 82. Custom Logic Tasks • Script • Custom Tasks
  • 83. SCRIPT TASK • Enables you to access the VSTA environment to write and execute scripts using the VB and C# languages • In the latest SSIS edition • Solidifies the connection to the full .NET 4.0 libraries for both VB and C#. • A coding environment with the advantage of IntelliSense • An integrated Visual Studio design environment within SSIS • An easy-to-use methodology for passing parameters into the script • The capability to add breakpoints to your code for testing and debugging purposes • The automatic compiling of your script into binary format for increased speed
  • 85. CUSTOM TASK • In a real-world integration solution, you may have requirements that the built-in functionality in SSIS does not meet • Use Visual Studio and Class Library project template • Reference the following assemblies • Microsoft.SqlServer.DTSPipelineWrap • Microsoft.SqlServer.DTSRuntimeWrap • Microsoft.SqlServer.ManagedDTS • Microsoft.SqlServer.PipelineHost • In addition the component needs • Provide a strong name key for signing the assembly. • Set the build output location to the PipelineComponentsfolder. • Use a post-build event to install the assembly into the GAC. • Set assembly-level attributes in the AssemblyInfo.csfile.
  • 86. Database Transfer Tasks • Transfer Database • Transfer Error Messages • Transfer Jobs • Transfer Logins • Transfer Master Stored Procedures • Transfer SQL Server Objects
  • 91. TRANSFER MASTER STORED PROCEDURE TASK
  • 93. Analysis Services Tasks • Analysis Services Execute DDL • Analysis Services Processing • Data Mining Query
  • 97. SQL Server Maintenance Tasks • Back Up Database • Check Database Integrity • History Cleanup • Maintenance Cleanup • Notify Operator • Rebuild Index • Reorganize Index • Shrink Database • Update Statistics
  • 107. Containers SQL Server 2012 Integration Services for Beginners
  • 108. SEQUENCE CONTAINER • Handle the flow of a subset of a package • Can help you divide a package into smaller and more manageable pieces • Usage of Sequence Container include: • Grouping tasks so that you can disable a part of the package that’s no longer needed • Narrowing the scope of the variable to a container • Managing the properties of multiple tasks in one step by setting the properties of the container • Using one method to ensure that multiple tasks have to execute successfully before the next task executes • Creating a transaction across a series of data-related tasks, but not on the entire package • Creating event handlers on a single container, wherein you could send an email if anything inside one container fails and perhaps page if anything else fails
  • 109. GROUPS • Are not actually containers but simply a way to group components together • A key difference between groups and containers is that properties cannot be delegated through a container • Don’t have precedence constraints originating from them (only from the tasks). • You cannot disable the entire group, as you can with a Sequence Container. • Groups are good for quick compartmentalization of tasks for aesthetics. • Their only usefulness is to quickly group components in either a Control Flow or a Data Flow together.
  • 110. FOR LOOP CONTAINER • Enables you to create looping in your package similar to how you would loop in nearly any programming language. • In this looping style, SSIS optionally initializes an expression and continues to evaluate it until the expression evaluates to false.
  • 111. FOREACH LOOP CONTAINER • Enables you to loop through a collection of objects. • Foreach File Enumerator: Performs an action for each fi le in a directorywith a given file extension • Foreach Item Enumerator: Loops through a list of items that are set manually in the container • Foreach ADO Enumerator: Loops through a list of tables or rows in a table from an ADO recordset • Foreach ADO.NET Schema Rowset Enumerator: Loops through an ADO.NETschema • Foreach From Variable Enumerator: Loops through an SSIS variable • Foreach Nodelist Enumerator: Loops through a node list in an XML document • Foreach SMO Enumerator: Enumerates a list of SQL Management Objects (SMO) • As you loop through the collection • the containerassigns the value from the collection to a variable • which can later be used by tasks or connections insideor outsidethe container • also you can also map the value to a variable
  • 113. Data Flows SQL Server 2012 Integration Services for Beginners
  • 114. DATA FLOW TASK • The heart of SSIS • Has its own design surface • Encapsulates all the data transformation aspects of ETL • Each Data Flow tasks corresponding to separate Data Flow surface • Split and handle data in pipelines based on data element
  • 115. DATA SOURCES • Databases • ADO.NET • OLE DB • CDC Source • Files • Excel • Flat files • XML • Raw files • Others • Custom
  • 116. DATA DESTINATIONS • Database • ADO.NET • OLE DB • SQL Server • ORACLE • File • SSAS • Rowset • Other
  • 117. CONNECTION MANAGERS • A connection to a data source or destination: • Provider (for example, ADO.NET, OLE DB, or flat file) • Connection string • Credentials • Project or package level: • Project-level connection managers: • Can be shared across packages • Are listed in Solution Explorer and the Connection Managers pane for packages in which they are used • Package-level connection managers: • Can be shared across objects in the package • Are listed only in the Connection Managers pane for packages in which they are used
  • 118. DATA FLOW TRANSFORMATIONS Row Transformations Rowset Transformations Split & Join Transformations • Character Map • Copy Column • Data Conversion • Derived Column • Export Column • Import Column • OLE DB Command • Aggregate • Sort • Percentage Sampling • Row Sampling • Pivot • Unpivot • Conditional Split • Multicast • Union All • Merge • Merge Join • Lookup • Cache • CDC Splitter Audit Transformations BI Transformations Custom Transformations • Audit • RowCount • Slowly Changing Dimension • Fuzzy Grouping • Fuzzy Lookup • Term Extraction • Term Lookup • Data Mining Query • Data Cleansing • Script Component • Custom Component
  • 119. SYNCHRONOUS VS ASYNCHRONOUS TRANSFORMATIONS • Synchronous • Synchronous transformations such as the Derived Column and Data Conversion, where rows flow into memory buffers in the transformation, and the same buffers come out. • No rows are held, and typically these transformations perform very quickly,with minimal impact to your Data Flow. • Asynchronous • Asynchronous transformations can cause a block in your Data Flowand slow down your runtime. • There are two types of asynchronous transformations: • Partially blocking transformations • Create new memory buffers for the output of the transformation. • e.g. Union All transformation • Fully blocking transformations • e.g. Sort and Aggregate Transformations • Create new memory buffers for the output of the transformation but cause a full block of the data. • These fully blocking transformations represent the single largest slowdown in SSIS and should be considered carefully in terms of any architecture decisions you must make.
  • 120. Row Transformations • Character Map • Copy Column • Data Conversion • Derived Column • Export Column • Import Column • OLE DB Command
  • 121. CHARACTER MAP • Performs common character translations • Modified column to be added • as a new column or • to update the original column • The available operation types are: • Byte Reversal: Reverses the order of the bytes. • For example, for the data 0x1234 0x9876, the result is 0x4321 0x6789 • Full Width: Converts the half-width charactertype to full width. • Half Width: Converts the full-width character type to half width. • Hiragana: Converts the Katakanastyle of Japanese characters to Hiragana. • Katakana: Converts the Hiraganastyle of Japanese characters to Katakana. • Linguistic Casing: Applies the regional linguistic rules for casing. • Lowercase: Changes all letters in the input to lowercase. • Traditional Chinese: Converts the simplified Chinese characters to traditional Chinese. • Simplified Chinese: Converts the traditional Chinesecharacters to simplified Chinese. • Uppercase: Changes all letters in the input to uppercase.
  • 122. COPY COLUMN • A very simple transformation that copies the output of a column to a clone of itself. • This is useful if you wish to create a copy of a column before you perform some elaborate transformations. • You could then keep the original value as your control subject and the copy as the modified column.
  • 123. DATA CONVERSION • Performs a similar function to the CONVERT or CAST functions in T-SQL. • The Output Alias is the column name you want to assign to the column after it is transformed. • If you don’t assign it a new name, it will later be displayed as Data Conversion: ColumnName in the Data Flow.
  • 124. DERIVED COLUMN • Creates a new column that is calculated (derived) from the output of another column or set of columns. • One of the most important transformations in Data Flow. • Examples • To multiply the quantity of orders by the cost of the order to derive the total cost of the order • You can also use it to find out the current date or to fill in the blanks in the data by using the ISNULL function.
  • 125. EXPORT COLUMN • Exports data to a file from the Data Flow • Unlike the other transformations, the Export Column Transformation doesn’t need a destination to create the file • A common example is to extract blob-type data from fields in a database and create files in their original formats to be stored in a file system or viewed by a format viewer, such as Microsoft Word or Microsoft Paint.
  • 126. IMPORT COLUMN • The Import Column Transformation is a partner to the Export Column transformation. • These transformations do the work of translating physical files from system file storage paths into database blob-type fields, and vice versa
  • 127. OLE DB COMMAND • Designed to execute a SQL statement for each row in an input stream. • This task is analogous to an ADO Command object being created, prepared, and executed for each row of a result set. • This transformation should be avoided whenever possible. • It’s a better practice to land the data into a staging table using an OLE DB Destination and perform an update with a set-based process in the Control Flow with an Execute SQL Task.
  • 128. Rowset Transformations • Aggregate • Sort • Percentage Sampling • Row Sampling • Pivot • Unpivot
  • 129. AGGREGATE • Aggregate data from the Data Flow to apply certain T-SQL functions that are done in a GROUP BY statement • The most important option is Operation. • Group By: Breaks the data set into groups by the column you specify • Average: Averages the selected column’s numeric data • Count: Counts the records in a group • Count Distinct: Counts the distinct non-NULL values in a group • Minimum: Returns the minimum numeric value in the group • Maximum: Returns the maximum numeric value in the group • Sum: Returns sum of the selected column’s numeric data in the group • It is a Fully Blocking transformation
  • 130. SORT • Ιt is a fully blocking asynchronous transformation • Enables you to sort data based on any column in the path. • When possible avoid using it, because of speed • However, some transformations, like the Merge Join and Merge, require the data to be sorted. • If you place an ORDER BY statement in the OLE DB Source, SSIS is not aware of the ORDER BY statement. • If you have an ORDER BY clause in your T-SQL, you can notify SSIS that the data is already sorted, obviating the need for the Sort Transformation in the Advanced Editor.
  • 131. PERCENTAGE & ROW SAMPLING • Enable you to take the data from the source and randomly select a subset of data. • The transformation produces two outputs that you can select. • One output is the data that was randomly selected, • and the other is the data that was not selected. • You can use this to send a subset of data to a development or test server. • The most useful application of this transformation is to train a data-mining model. • You can use one output path to train your data-mining model, • and the sampling to validate the model. • The Percentage Sampling enables you to select the percentage of rows • The Row Sampling Transformation enables you to specify how many rows you wish to be outputted randomly. • You can specify the seed that will randomize the data. • If you select a seed and run the transformation multiple times, the same data will be outputted to the destination. • If you uncheck this option, which is the default, the seed will be automatically incremented by one at runtime, and you will see random data each time.
  • 132. PIVOT & UNPIVOT • A pivot table is a result of cross-tabulated columns generated by summarizing data from a row format. • Unpivot is the opposite of Pivot
  • 133. Split & Join Transformations • Conditional Split • Multicast • Union All • Merge • Merge Join • Lookup • Cache • CDC Splitter
  • 134. CONDITIONAL SPLIT • Add complex logic to your Data Flow. • This transformation enables you to send the data from a single data path to various outputs or paths based on conditions that use the SSIS expression language. • Is similar to a CASE decision structure in a programming language • Also provides a default output • If a row matches no expression it is directed to the default output.
  • 135. MULTICAST • Send a single data input to multiple output paths • Is similar to the Conditional Split • Both transformations send data to multiple outputs. • The Multicast will send all the rows down every output path, whereas the Conditional Split will conditionally send each row down exactly one output path.
  • 136. MERGE • Merge data from two paths into a single output • Is useful when you • wish to break out your Data Flow into a path that handles certain errors and then merge it back into the main Data Flow downstream after the errors have been handled • wish to merge data from two Data Sources. • But has some restrictions: • The data must be sorted before. • You can do this by using the Sort • Transformation prior to the merge or by specifying an ORDER BY clause in the source connection. • The metadata must be the same between both paths. • For example,the CustomerIDcolumn can’t be a numeric column in one path and a charactercolumn in another path. • If you have more than two paths, you should choose the Union All Transformation.
  • 137. UNION ALL • Similar to Merge Transformation • You can merge data from two or more paths into a single output • Does not require sorted data. • This transformation fixes minor metadata issues. • For example, if you have one input that is a 20-character string and another that is 50 characters, the output of this from the Union All Transformation will be the longer 50-character column. • You need to open the Union All Transformation Editor only if the column names from one of the transformations that feed the Union All Transformation have different column names.
  • 138. MERGE JOIN • Merge the output of two inputs and perform an INNER or OUTER join on the data • If both inputs are in the same database, then it would be faster to perform a join at the OLE DB Source level, rather than use a transformation through T-SQL. • Useful when you have two different Data Sources you wish to merge
  • 139. LOOKUP • Performs lookups by joining data in input columns with columns in a reference dataset • You use the lookup to access additional information in a related table that is based on values in common columns. • Lookup Caching mechanism • Full-Cache Mode: stores all the rows resulting from a specified query in memory • No-Cache Mode: you can choose to cache nothing, for each input row component sends a request to the reference table in the database server to ask for a match • Partial-Cache Mode: this mode caches only the most recently used data within the memory, as soon as the cache grows too big, the least-used cache data is thrown away.
  • 140. CACHE • Generates a reference dataset for the Lookup Transformation • by writing data from a connected data source in the data flow to a Cache connection manager. • You can use the Cache connection manager • When you want to configure the Lookup Transformation to run in the full cache mode. • In this mode, the reference dataset is loaded into cache before the Lookup Transformation runs.
  • 141. CDC SPLITTER • Splits a single flow of change rows from a CDC source data flow into different data flows for Insert, Update and Delete operations. • The data flow is split based on the required column __$operation and its standard values in SQL Server 2012 change tables.
  • 143. AUDIT • Allows you to add auditing data to your Data Flow • Because of acts such as HIPPA and Sarbanes-Oxley (SOX) governing audits, you often must be able to track who inserted data into a table and when. • The task is easy to configure • Simply select the type of data you want to audit in the Audit Type column and then name the column that will be outputted to the flow. • Following are some of the available options: • Execution instance GUID: GUID that identifies the execution instanceof the package • Package ID: Unique ID for the package • Package name: Name of the package • Version ID: Version GUID of the package • Execution start time: Time the package began • Machine name: Machine on which the package ran • User name: User who started the package • Task name: Data Flow Task name that holds the Audit Task • Task ID: Unique identifierfor the Data Flow Task that holds the Audit Task
  • 144. ROWCOUNT • Provides the capability to count rows in a stream that is directed to its input source. • This transformation must place that count into a variable that could be used in the Control Flow
  • 145. BI Transformations • Slowly Changing Dimension • Fuzzy Grouping • Fuzzy Lookup • Term Extraction • Term Lookup • Data Mining Query • Data Cleansing
  • 146. SLOWLY CHANGING DIMENSION • Provides a great head start in helping to solve a common, classic changing-dimension problem that occurs in the outer edge of your data model, the dimension or lookup tables. • A dimension table contains a set of discrete values with a description and often other measurable attributes such as price, weight, or sales territory. • The classic problem is what to do in your dimension data when an attribute in a row changes particularly when you are loading data automatically through an ETL process • This transformation can shave days off of your development time in relation to creating the load manually through T-SQL, but it can add time because of how it queries your destination and how it updates with the OLE DB Command Transform (row by row)
  • 147. FUZZY LOOKUP • Performs data cleaning tasks • standardizing data, correcting data, and providing missing values • Uses an equi-join to locate matching records in the reference table • It returns records with at least one matching record, • and returns records with no matching records. • Uses fuzzy matching to return one or more close matches in the reference table. • Usually follows a Lookup transformation in a package data flow • First, the Lookup transformation tries to find an exact match. • If it fails, the Fuzzy Lookup transformation provides close matches from the reference table.
  • 148. FUZZY GROUPING • Performs data cleaning tasks • by identifying rows of data that are likely to be duplicates and • selecting a canonical row of data to use in standardizing the data. • Requires a connection to an instance of SQL Server • to create the temporary SQL Server tables that the transformation algorithm requires to do its work • The connection must resolve to a user who has permission to create tables in the database. • Produces one output row for each input row • Each row has the following additional columns: • _key_in: a column that uniquely identifies each row. • _key_out:,a column that identifies a group of duplicaterows. • The _key_out column has the value of the _key_in column in the canonical data row. • Rows with the same value in _key_out are part of the same group • The _key_out value for a group corresponds to the value of _key_in in the canonical data row. • _score: a value between 0 and 1 that indicates the similarity of the input row to the canonical row.
  • 149. TERM EXTRACTION • A tool to mine free-flowing text for English word and phrase frequency • If you have ever done some word and phrase analysis on websites for better search engine placement, you are familiar with the job that this transformation performs • Based on Term Frequency and Inverse Document Frequency formula • TDIDF = (frequency of term) * log((# rows in sample) / (#rows with term or phrase)) • Output two columns: • a text phrase • a statistical value for the phrase relative to the total input stream
  • 150. TERM LOOKUP • Uses the same algorithms and statistical models as the Term Extraction Transformation to break up an incoming stream into noun or noun phrase tokens • It is designed to compare those tokens to a stored word list and output a matching list of terms and phrases with simple frequency counts. • Generating statistics on known phrases of importance. • A real world application of this would be to pull out all the customer service notes that had a given set of terms or that mention a competitor’s name.
  • 151. DATA MINING QUERY • Typically is used to fill in gaps in your data or predict a new column for your Data Flow • Optionally you can add columns • such as the probability of a certain condition being true. • Usage Examples • You could take columns, such as number of children, household income, and marital income, to predict a new column that states whether the person owns a house or not. • You could predict what customers would want to buy based on their shopping cart items. • You could fill the gaps in your data where customers didn’t enter all the fields in a questionnaire.
  • 152. DATA CLEANSING • Performs advanced data cleansing on data • A business analyst create a series of business rules that declare what good data looks like • Create domains that define data in your company • such as what a Company Name column should always look like.
  • 155. SCRIPT COMPONENT • Enables you to write custom .NET scripts as • Transformations • Sources • Destinations • Some of the things you can do with this transformation • Create a custom transformation that would use a .NET assembly to validate credit card numbers or mailing addresses. • Validate data and skip records that don’t seem reasonable. • Read from a proprietary system for which no standard provider exists. • Write a custom component to integrate with a third-party vendor. • Scripts used as sources can support multiple outputs • You have the option of precompiling the scripts for runtime efficiency.
  • 156. CUSTOM COMPONENT • In a real-world integration solution, you may have requirements that the built-in functionality in SSIS does not meet • Use Visual Studio and Class Library project template • Reference the following assemblies • Microsoft.SqlServer.DTSPipelineWrap • Microsoft.SqlServer.DTSRuntimeWrap • Microsoft.SqlServer.ManagedDTS • Microsoft.SqlServer.PipelineHost • In addition the component needs • Provide a strong name key for signing the assembly. • Set the build output location to the PipelineComponentsfolder. • Use a post-build event to install the assembly into the GAC. • Set assembly-level attributes in the AssemblyInfo.csfile.
  • 158. OPTIMIZING DATA FLOW PERFORMANCE • Optimize queries: • Select only the rows and columns that you need • Avoid unnecessary sorting: • Use presorted data where possible • Set the IsSorted property where applicable • Configure Data Flow task properties: • Buffer size • Temporary storage location • Parallelism • Optimized mode
  • 160. • Introduction to SSIS • SSIS Tools • Variables, Parameter, Expressions • SSIS Tasks • Containers • Data Flows SUMMARY
  • 162. SELECT KNOWLEDGE FROM SQL SERVER