2. • Manager of Data and Development for
Data Diver Technologies in Chandler, AZ
• Managed the Phoenix CFUG
• I’ve worked with ColdFusion and SQL Server since
the late 90’s
• Hates CCS front end work and CSS hates me
• How to find me on the Interwebs:
• Email: jeff@garzasixpack.com Twitter: @cf_coder
3.
4.
5. • Extract, Transform and Load
• Can be data from many types of sources
• Flat Text files, Excel, XML, Other Databases
• Destinations can also vary
• Cubes, Flat Text Files, Excel, XML, Other
Databases
• Basic Process
• Get Data, Mess with it and Stuff it somewhere else
6. • Familiarity with platform is tempting
• Inherently Slow
• Reads and Transforms data Line by line
• “Line by line” kills performance
• Doesn’t take advantage of processes inherent
in the DB platform for bulk data processing
7.
8. • Native database functionality for bulk data
• BCP : Bulk Copy Process
• Clunky
• Requires Command Text files
• Run from command line or Powershell
• Import Data Wizard (SQL Server)
• One off Imports, no transformations
9. • SQL Server Integration Services (SSIS)
• Runs in BIDS
• Business Intelligence Development Studio
• Multiple inputs and outputs
• Workflow and process based
• Scriptable and dynamic inputs
10. • Starting up the environment
• Project Based
• Can open just files, but projects allow for more
options.
• Integration with source control
• SVN (Tortise and VisualSVN)/GIT/Others
11. Project Deployment Models
•Package Deployment Model
• Stores packages in a disk based folder structure
•Project Deployment Model
• Deploys packages to a special database
•Either model can be run outside of BIDS.
12.
13. • Connection Managers
• These will set up the input/output files and
database connections.
• Can be set at project or package level.
• Create DB Connection
• Create Connections to any Files
• Can create new files for output or input
14.
15.
16. • Workflow using Pre-Built Components
• Lots of options
• Get Files, run commands, loop over things
• Primarily Execute SQL and Data Flow Tasks
• Connectors
• Go/No Go, evaluation, data viewers
• Red connectors re-direct on error.
17. • Variables
• Create at the package level
• Set at the component level
• Uses most Basic Types of variable classes
• Ints, Strings, Booleans, Dates and Objects
• Objects are typically Recordsets but can also
be Arrays and COM objects.
18. • File Connections
• Multi-File pulls from Directories
• QUERY Result sets
• Web Services
• OLEDB/ODBC Connections
Getting Data Into Your Package
19. • FTP Component
• Built in and very simple
• Does not do advanced FTP functions
• HTTP File Requests
• No built in component for this
• Use Script or Command Line (wget)
20. 4/10/14
• Data Flow Task
• Heavy lifting of getting data from A to B
• Utilizes the pre-defined connections
• Can create them on the fly if necessary
• Can transform data in the pipeline en-route to
the destination tables/files
21. 4/10/14
• Change the data in the pipeline
• Can be helpful if you’re not a SQL person
• Editors make choosing transformations easy
• Performance can be impacted
• Anything that impedes the pipeline slows
the data transfer process down.
• Becomes a larger issue with larger datasets
22.
23.
24. 4/10/14
• Component Level Errors
• Can Redirect Flow to another path
• Allows for cleanup of temp tables/files, send
Email to administrator
• Data Flow Errors
• Redirection of Bad Rows to an Error File
• Can trap General or Truncation Errors.
25.
26. • For and For Each Containers
• For Containers are basically count loopers
• For Each Containers
• Loops over files, arrays, and objects
• Must be used with Variables
• Great way to deal with sequential files
27.
28. • The Marketing Database of Choice!
• Not terribly difficult to work with
• Data is in UTF-8 format, Nvarchar → Varchar
• Need special connection parameters to skip
top level rows
• Make Sure project Debug Options set to 32-bit
29. • Report Format (multi-row records)
• Most difficult format to work with
• Conditional Split Transformer
• Built in and works with the pipeline
• Cursor and String Parsing
• Allows for keeping rows grouped with IDs
30. • XML
• You must have a DTD
• SSIS will create one for you
• Data is in UTF-8 format
• Either Transform to VARCHAR or use
NVARCHAR for result
31.
32. • Cannot call packages directly from ColdFusion
• Run manually through BIDS (easiest)
• SQL Agent - Scheduler for SQL Server
• Options for the Agent
• Run the Agent on a schedule and look for work
• Set up the Agent task and then execute via TSQL
33. Executing a SQL Agent Job from TSQL
You need to be in MSDB to run the Agent.
Access to MSDB limited to SysAdmin roles.
USE msdb ;
GO
EXEC dbo.sp_start_job N'AgentTaskName' ;
GO
34.
35. • Data Validation – Delay Validation “YES”
• SSIS will check for existence of Tables/Files.
• Create temp tables beforehand or turn off.
• Excel Files
• Must use 32-bit debug execution mode.
• Data Type conversions
• Try importing to VARCHARs and convert
afterwards if needed.
36. • Learn SQL String Manipulation!
• This is the toughest aspect of dealing with
text files and textual data.
• Very limited number of functions to use
• Left(), Right, Substring(), Len()
• CharIndex() and PatIndex()
• String Length Math
37. • Temp Tables
• Where is TempDB?
• #Table vs. ##Table vs. Temp_Table
• Usage as part of aggregate queries
• Make sure you have enough space for
TempDB operations.
• When dealing with large data sets, TempDB
can get very large very fast.