Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Â
What's new in SQL Server Integration Services 2012?
1. WHATâS NEW IN SQL SERVER
INTEGRATION SERVICES 2012?
Nico Jacobs
Nico@U2U.be
@sqlwaldorf
2. WHATâS SSIS?
⢠E xtract from source systems
⢠SQL Server, Oracle, DB2, flat file, xml, Excel, âŚ
⢠T ransform data
⢠Lookup surrogate keys, clean data, reformat, âŚ
⢠L oad it into a destination database
⢠Transactions, checkpoints, scalability, âŚ
3. WHATâS SSIS
⢠Data flow reads data from source(s)
⢠Data is pushed in a row-based pipeline
⢠It optionally passes through one or more preprogrammed or ad-hoc
transformations
⢠Streaming transformations improve scalability
⢠Destination(s) write data to disk, db, âŚ
⢠Control flow dictates in which order tasks execute, data flow is one of
these tasks
4. WHATâS NEW IN 2012?
⢠A lot!
⢠New stuff for package developers
⢠New stuff for package administration
⢠New stuff for package usage
⢠Letâs get started!
5. 1: GUI IMPROVEMENTS
⢠Getting started window
⢠Package visualization
⢠Zoom
⢠Undo
⢠SSIS toolbox
⢠Data flow source/destination wizard
⢠Sort packages by name
⢠Grouping in data flow
6. CHANGE DATA CAPTURE
⢠Incremental load loads all rows that have changed since the last load
⢠How do we know what has changed?
⢠Compare every source row with every destination row
⢠Last modified date and a trigger to maintain this
⢠Change tracking
⢠Change data capture!
7. CHANGE DATA CAPTURE
⢠SQL Server Enterprise edition, 2008 or higher
⢠Asynchronous process
⢠Captures all changes
⢠Maintains time window
⢠CDC data access via table valued functions
Books online, change data capture
8. 2: CDC TASK AND COMPONENTS
⢠CDC needs to keep track of which changes have already been
processed
⢠CDC task does this by storing LSNs in a tracking table
⢠CDC Source component reads from the CDC table function, based on
the LSN it got from the CDC task
⢠CDC transformation splits records into new rows, updated rows and
deleted rows
⢠No documentation yet in RC0, check Matt Massonâs blog
⢠Based on Attunity CDC components
9. 3: MAPPING DATA FLOW COLUMNS
⢠When modifying a data flow, column remapping is sometimes needed
⢠SSIS 2012 maps columns on name instead of id
⢠It also has an improved remapping dialog
10. 4: ODBC SOURCE AND DESTINATION
⢠ODBC was not natively supported in 2008
⢠SSIS 2012 has ODBC Source & Destination
⢠Handy for connecting to SQL Azure
⢠Essential if SQL Server stops supporting OleDb
⢠SSIS 2008 could access ODBC via ADO.Net:
⢠Has create table option, which ODBC lacks
⢠No control on batch inserts
nr of rows ODBC ADO.Net % Diff
⢠Low performance 1000 0,42 2,12 405%
10000 4,91 7,84 60%
100000 49,2 78,36 59%
1000000 481,65 781,28 62%
11. REPLACE OLEDB WITH ODBC?
⢠After comparing ODBC with ODBC via ADO.Net, lets test ODBC versus
OleDb
⢠On bulk insert nr of rows OleDb OleDb Fast ODBC % Diff
1000 0,15 0,07 0,865 477%
10000 0,32 0,16 4,8 1400%
100000 1,66 0,565 48,13 2799%
1000000 12,485 9,12 483,085 3769%
⢠On row by row nr of rows OleDb ODBC % Diff
1000 0,62 0,76 -18%
10000 9,15 6,28 46%
100000 71,21 67,37 6%
1000000 730,16 684,28 7%
Your mileage may varyâŚ
12. 5: SCRIPTING
⢠Script task and script component now support .Net 4.0
⢠Breakpoints are supported in script component
⢠When developing custom components, there is better backpressure
support:
⢠SupportsBackPressure property, IsInputReady and GetDependantInputs method
13. 6: EXPRESSION TASK
⢠The script task can be used to modify variable values⌠but itâs overkill
⢠Expression task provides a simple task to change variable values
14. DATA QUALITY SERVICES (DQS)
⢠DQS is a new service to clean domain data
⢠Domain knowledge base needs to be build
⢠Based on rules, positive and negative examples
⢠Potentially using external data from Azure Marketplace or other providers
15. 7: DQS CLEANSING TASK
⢠Cleaning and standardizing data before it is loaded in the data
warehouse is essential
⢠DQS Cleansing task labels data in 4 categories:
⢠Correct: a value accepted by the knowledge base
⢠Corrected: a value on which DQS is confident it can correct to a valid domain
value
⢠Suggested: a value on which DQS is less confident, but can still suggest a
domain value
⢠New: DQS has no suggestions for this
⢠See Koen Verbeeckâs session on DQS for more info!
16. 8: PACKAGE CATALOG
⢠SSIS 2012 can work in the new project mode (default) or in old
package mode (backwards compatibility)
⢠In project mode, many things change:
⢠Project becomes the level of deployment
⢠Deployment to SQL Server becomes obligatory
⢠Packages not stored in msdb, but in dedicated user database:
o The package catalog, named SSISDB
⢠Logging happens automatically and is done in the package catalog
o Custom logging still supported
⢠Projects can be converted from one deployment type to another
17. PACKAGE CATALOG
⢠Manage via SSMS: Relational engine
⢠Fixed database name: SSISDB
⢠Stores projects, versions, logs, 5 reports, 25 views, 42 stored
procedures, âŚ
⢠This makes it possible to run, monitor and manage SSIS projects and
packages via T-SQL!
18. 9: PARAMETERS
⢠Just two scopes:
⢠Package
⢠Project!
⢠Read-only
⢠Value is set when scope starts and cannot be changed
⢠Can be set from SQL Server Data Tools configurations
⢠Often used together with environments
⢠Does not replace variables
⢠It is more a package configuration replacement
⢠Using the visual studio (SSDT) configurations
we can configure default values for testing
19. 10: SHARED CONNECTION MANAGERS
⢠Shared connection manager is defined at project level and is
automatically available in every package
⢠Not copied as in SSIS 2008
⢠Shared connection managers can be parameterized as well
⢠When converting shared connection managers back to regular
(package) connection managers, they disappear in all other packages
⢠Shared cache connection managers are supported as well
⢠This allows to cache data in memory in one package and reuse it in multiple
other packages
20. 11: ENVIRONMENTS
⢠Environments replace package configurations
⢠They can control parameter values and connection strings
⢠Environments are created in the package catalog
⢠They are not deployed to the server, but created on the server
⢠Donât forget to reference the environment at the project level
⢠Script them while creating, this eases creating multiple environments
⢠A server might have multiple environments
⢠When we execute a package, we can choose which environment weâll use
21. 12: DATA TAPS
⢠Imagine a data viewer
⢠Which can be added on the runtime server
⢠Without modifying the package, but using T-SQL
⢠Which writes the data to disk instead on visualizing itâŚ
⢠Voila, you are now thinking about the data tap ď
22. 13: AND A LOT MOREâŚ
⢠.Net API and Powershell
⢠Pivot and row count transformation get a user interface
⢠Flat file supports
⢠Embedded qualifiers
⢠Variable number of columns (but still fixed meta-data)
⢠Raw file improvements
⢠Generate empty raw file
⢠Stores sort info
⢠DTSX files are becoming more readable and âmergeableâ
⢠Sorted, filtered and prettyprinted
⢠Merge and merge join improve backpressure handling
23. AND A LOT MOREâŚ
⢠4000 char expression length lifted
⢠New expression language keywords
⢠LEFT as syntactic sugar for SUBSTRING(,1,)
⢠TOKEN and TOKENCOUNT for shredding strings
24. SUMMARY
⢠Improved GUI
⢠Change data capture support
⢠Easy column remapping
⢠ODBC connections
⢠.Net 4.0 support & script component debugging
⢠Expression Task
⢠Data Quality Cleansing
⢠Package catalog
⢠Parameters
⢠Shared Connection Managers
⢠Environments
⢠Data Taps
⢠And a lot moreâŚ