2. Who Am I?
• Terry Bunio
• Data Base Administrator
- Oracle, SQL Server 6,6.5,7,2000,2005,2012, Informix, ADABAS
• Sharepoint fan
• Data Modeler/Architect
- Investors Group, LPL Financial, Manitoba Blue Cross, Assante
Financial, CI Funds, Mackenzie Financial
- Normalized and Dimensional
• Agilist
- Innovation Gamer, Team Member, SQL Developer, Test writer,
Sticky Sticker, Project Manager, PMO on SAP Implementation
- Currently all of these all the same time on an Agile SAP
implementation for an Insurance company
3.
4.
5.
6.
7.
8.
9.
10. Protegra
• In Business 13+ years
• Offices in Winnipeg and San Diego
• Agile and Lean key to our practices
• 70+ consultants
• Software Development and Business Performance
Consulting
• Microsoft Gold Partner
• Culture and Community are cornerstones of the
company
13. Learning Objectives
• Why do we automate tests?
• Why is Automated Database Testing Difficult?
• What was the problem?
• What is tSQLt and SQLtest?
• How we create a Data Driven ETL Test Framework?
15. Agile
• Agile software development is a group of software
development methods based on iterative and
incremental development, where requirements and
solutions evolve through collaboration between self-
organizing, cross-functional teams. It promotes adaptive
planning, evolutionary development and delivery, a time-
boxed iterative approach, and encourages rapid and
flexible response to change. It is a conceptual framework
that promotes foreseen interactions throughout the
development cycle.
19. Waterfall
• Do as much as you can before you fall off that cliff
because it is difficult to go back up
- Thorough Business Case
- Detailed Requirements
- Detailed Documentation
- Extensive Construction Phase
- Extensive Test Phase
- Critical Deployment Phase
20.
21. Agile
• Do as little as you can before you can proceed because
we want to deliver asap and are going to have another
cycle anyway…
- Lean Start-up Business Case
- Minimal Requirements (some in the form of Tests)
- Minimal Design
- Minimal Construction
- Minimal Test
- Minimal Deployment
22.
23. Automation
• We usually try to automate tasks that:
- Take a lot of time
- That we need to do frequently
- That are error prone
- That the required people expertise can be captured and
duplicated
25. Agile Automation
• In Agile, the tasks that lend themselves to automation
are:
- Testing – Automated Testing
- Deployment – Continuous Integration
• This is primarily due to:
- The amount of times they need to be run
- The potential for people to introduce errors
- The ability to test and catch errors early
- The ease of automation, the ease of transferring human logic
into software
28. Dangers of automated testing
• Automated testing can provide great benefit, but you
need to ensure that it is flexible, nimble, and easy to
change and adapt
• Consider how you can create your tests to make them as
easy as possible to maintain and adapt to change
• If you design for this, the chances are your automated
tests will be more valuable and evolve with the software
- Otherwise some may deem them ‘too expensive’ to modify and
enhance and they will be left behind
30. Automated Database Testing
• Unit Testing databases is hard
- Requires modification of the data in the database
- Complex Querying of Multiple Tables
- Multiple types of objects
• Views
• Functions
• Stored Procedures
• Triggers
- Many types of Database Technologies
- Need to test structure(schema), state(data), and process(Stored
Procedures and ETL processes)
31. Automated Database Testing
• Lots of existing frameworks out there for automated
testing in programming languages
- Java – Junit
- .NET – Nunit
- Fitnesse
- Etc…
• Although you could use these frameworks to perform
automated database testing, it certainly isn’t easy to
adapt
- Refer to Lion on ice-floe cartoon
32. What we need..
• Is an Automated Database testing framework that
- Doesn’t require another programming language to run
- Handles the complexity of adding and deleting data for tests
- Provides additional functionality that makes database testing
easy
• For example, a table to table comparison test or ability to create fake data
and tables
- Makes it easy to manage the large amount of tests that can be
created in a database
- Allows us to create database tests on:
• Structure
• Data
• Metadata
33. The Project
• Agile Data Warehouse project for SAP Implementation
• Health Benefits Provider
• I’d worked on multiple Data Warehouse projects with
little or no test automation in the past
- #1 item from those retrospectives were that we would create
automated tests next time for the ETL process
34. The Project
• Operational Data Store and Data Warehouse
• SQL Server 2012
- SSIS for ETL process
• We also used the CDC functionality
- SSRS for reporting with Sharepoint as the report portal
• Sharepoint Foundation 2010
• 400+ tables
• 2.5 terabytes of data
- Just for the initial load
• 3640 columns in the Operational Data Store
• 2520 columns in the Data Warehouse
36. Results
• We reviewed several candidates and none of them really
fit what we required
• We looked at
- Junit
- Dbunit
- FitNesse
- Other SQL specific ones
• Until….
39. tSQLt
• tSQLt is an Open Source database unit testing
framework for Microsoft SQL Server. tSQLt is compatible
with SQL Server 2005 (service pack 2 required) and
above on all editions.
- Currently using it on SQL Server 2012
- No issues
40. tSQLt
• Tests are automatically run within transactions – this
keeps tests independent and reduces any cleanup work
you need
• Tests can be grouped together within a schema –
allowing you to organize your tests and use common
setup methods
- Highly recommended
• Output can be generated in plain text or XML – making it
easier to integrate with a continuous integration tool
• Provides the ability to fake tables and views and create
stored procedure spies – allows you to be notified when
developers aren’t following your standards!!
43. tSQLt Creation and Execution Commands
• NewTestClass
- tSQLt.NewTestClass creates a new test class. A test class is
simply a schema where the user can create test case procedure
and any other related objects.
• DropClass
- tSQLt.DropClass drops a schema and all objects belonging to it.
If the schema does not exist, nothing happens.
• RunAll
- tSQLt.RunAll executes all tests in all test classes created with
tSQLt.NewTestClass in the current database. If the test class
schema contains a stored procedure called SetUp, it is executed
before calling each test case.
44. tSQLt Creation and Execution Commands
• Run
tSQLt.Run is a flexible procedure allowing three different ways of
executing test cases:
1. Providing a test class name executes all tests in that test class. If
a SetUp stored procedure exists in that test class, then it is
executed before each test.
2. Providing a test case name executes that single test.
3. Providing no parameter value executes tSQLt.Run the same way
the previous call to tSQLt.Run was made when a parameter was
provided.
45. tSQLt additional commands
• ApplyConstraint
- We want to be able to test constraints individually. We can use
FakeTable to remove all the constraints on a table, and
ApplyConstraint to add back in the one which we want to test.
- ApplyConstraint in combination with FakeTable allows
constraints to be tested in isolation of other constraints on a
table.
• ApplyTrigger
- We want to be able to test triggers individually. We can use
FakeTable to remove all the constraints and triggers from a
table, and ApplyTrigger to add back in the one which we want to
test.
- ApplyTrigger in combination with FakeTable allows triggers to be
tested in isolation of constraints and other triggers on a table.
46. tSQLt Additional Commands
• FakeTable
- We want to keep our test cases focused and do not want to
insert data which is irrelevant to the current test. However, table
and column constraints can make this difficult.
- FakeTable allows tests to be written in isolation of the constraints
on a table. FakeTable creates an empty version of the table
without the constraints in place of the specified table. Therefore
any statements which access the table during the execution of
the test case are actually working against the fake table with no
constraints. When the test case completes, the original table is
put back in place because of the rollback which tSQLt performs
at the end of each test case.
48. tSQLt installation procedures
• Download tSQLt from tSQLt.
• Unzip the file to a location on your hard drive.
• Make sure CLRs are enabled on your development
server by running the following sql:
- EXEC sp_configure 'clr enabled', 1;
- RECONFIGURE;
• Execute the Example.sql file from the zip file to create an
example database (tSQLt_Example) with tSQLt and test
cases.
49. tSQLt installation
• Your database must be set to trustworthy for tSQLt to
run. Execute the following script in your development
database:
- DECLARE @cmd NVARCHAR(MAX); SET @cmd='ALTER
DATABASE ' + QUOTENAME(DB_NAME()) + ' SET
TRUSTWORTHY ON;'; EXEC(@cmd);
• Execute the tSQLt.class.sql script (included in the zip
file) in your development database.
51. Test Demonstration
• Old School Testing
- Open Enterprise Manager
- Open /SDEC12/SDEC solution
- Open solution explorer
- 1 testStatus1.sql
- 2 testStatusAll.sql
• Boring…
52. Test Demonstration
• testTable
- Traditional test
• testTableNew
- tSQLt test
• AssertEqualsTable test success
• Insert extra record
• AssetEqualsTable test failure
55. SQLTest
• Developed by Red Gate
• SQL Test is a unit test add-in for SQL Server
Management Studio
• Part of the SQL Developer Bundle
• $369
56. SQLTest Demo
• Open SQL Test
• Test Class TstContainer_sdec12
• TestCase1 – count rows
• TestCase2 – Fake Table and Assert Empty Table
• TestCase3 – Fake Table and Assert Empty Table - Error
• TestCase4 - AssertLike
• Run All
57. SQLTest Additional Features
• SQLCop Tests
- Procedures with SP_
- VarChar Size Problems
- Decimal Size Problem
- Undocumented Procedures
- Procedures without SET NOCOUNT ON
- Procedures with SET ROWCOUNT
- Procedures with @@Identity
- Procedures with dynamic sql
- Procedures using dynamic sql without sp_executesql
58. SQLTest Additional Features
• Template for adding tests
• Easier graphical interface to enter and review tests
• One click to run tests
- Individually
- Schema
- Database
• Click through on TestCases to debug and modify
60. tSQLtclr Weird Error
• It is possible that when you restore a database from a
different server you can get this issue when trying to load
the tSQLtclr assembly
• KB link : http://support.microsoft.com/kb/918040
• Easiest solution is to ensure sa is the database owner on
the database running the tSQLt tests
- EXEC sp_changedbowner 'sa'
62. ETL Test Categories
- Data completeness - validates that all is loaded correctly. This
involves a field by field comparison of expected versus actual
results.
- Data transformation – validates that all data is transformed
correctly according to specifications.
- Data quality - validates that the ETL application correctly
corrects, ignores, or rejects critical data errors, substitutes
default values, and reports invalid data.
- Process Integrity – validates that the ETL process correctly
executes, reports data errors, generates log messages, and is
able to restart and recover from an abnormal termination
correctly.
- Test Automation – validates that the ETL testing process can
be re-run at any time in an automated fashion to validate that the
process is operating as required.
63. ETL Testing Additional Value
- Performance and scalability - validates that the ETL process
performs within expected time frames and that the technical
architecture is scalable and extendable.
- Integration testing - validates that the ETL process functions
well within the entire environment. Validates that other
applications can interact with the ETL process and databases as
required.
- Regression testing – validates that the automated tests can be
utilized to confirm existing functionality continues to operate as
required for each new release.
64. ETL Test Objectives
• We need an Agile Test Framework so it is easy to adapt
to changes
• We can’t afford for the Test framework to take material
time to manage on top on Data Models and Databases
- Data Driven
• Although SQLTest is good, you can imagine how long it
would take to add tests for every field
- Not feasible
- Would also create a maintenance nightmare as the schema
changes
• So what is a DBA to do?
65. ETL Test Objectives
• To create an automated ETL test framework that is Data
Driven by
- NEGATIVE TESTING - Information_schema metadata in SQL
Server to loop through the tables and fields and execute health
checks
• Table level
• Have we missed tables or columns in our mapping
- POSITIVE TESTING - By Data Mapping metadata in SQL Server
to loop through all fields and validate the ETL process
• Column level
• Have we mapped tables and columns correctly
67. ETL Smells
• Larger smells that indicate problems in the ETL
- Empty tables
- Empty columns
- Columns with all Nulls
- Columns with only 1 value
- Numeric columns with all zeros
• Usually detected by negative testing
68. Negative Tests
• TstTableCount: Compares record counts between
source data and target data. This will be done on table to
table basis. This will be done starting from the target
table and comparing to the associated source table or
tables.
• TstTableColumnDistinct: Compares counts on distinct
values of columns. This is a valuable technique that
points out a variety of possible data errors without doing
a full validation on all fields.
• TstTableColumnNull: Generates a report of all columns
where all the contents of a field is all null. This typically
can highlight situations where the column was not
assigned in the ETL process.
70. Positive Tests
• TstColumnDataMapping: Compares columns directly
assigned from a source column on a field by field basis
for specified rows in the target table. More rows can be
selected depending on the risk and complexity of the
data transformation.
• TstColumnConstantMapping: Compares columns
directly assigned from a constant on a field by field basis
for specified rows in the target table. More rows can be
selected depending on the risk and complexity of the
data transformation.
71. Positive Tests
• TstColumnFKDataMapping: Compares whether
transformed columns in source and target are both not
null. This is another key indicator that Foreign Key
Values for related tables were unable to be located in the
target database
• TstInvalidColumnDataMapping: Tests that an Invalid
value results in the value being assigned a default value
or Null. This record will be added to the staging table to
test the process.
72. Advanced Negative Tests – not yet implemented
• TstInvalidParentFKColumn: Tests that an Invalid
Parent FK value results in the records being logged and
bypassed. This record will be added to the staging table
to test the process.
• TstInvalidFKColumn: Tests that an Invalid FK value
results in the value being assigned a default value or
Null. This record will be added to the staging table to test
the process.
74. Data Mapping Framework Design
• We can create two tables that when used in conjunction
with Information_schema can generate all negative tests
and simple positive tests
- table_data_mapping
- column_data_mapping
• We can then create two more tables that can be used to
generate more complicated positive tests
- etl_test_case
- etl_test_case_transform
• The database also requires a cross_reference table that
converts old PK values to new PK values
76. table_data_mapping
Column Name Description
Table Data Mapping ID Surrogate Key
Source Database Database that the source data exists in
Source Table Table that the source data exists in
Source Primary Key Source Primary Key Column Name
Target Database Database that the data will be loaded into
Target Table Table that the data will be loaded into
Target Primary Key Primary Key Column Name
Target Table Type Type of Table. Valid Values are:
“Main” – Main Table
“Reference” – Reference Table
Purpose : Defines Table to Table ETL Mapping
77. column_data_mapping
Column Name Description
Column Data Mapping ID Surrogate Key
Table Data Mapping ID ID of the related Table Data Mapping record
Source Column Column that the source data exists in
Source Constant Constant value that is to be assigned/expected in the
target column
Target Column Column that the data will be loaded into
Data Mapping Type Formula for column assign. Typical examples would be:
“=” – Assignment
“Constant” – Literal Assignment
“Transform” – Column Transformation
“Generated” – Column Generated
Purpose : Defines Column to Column ETL Mapping
78. Data Mapping Complex Framework Design
• We need two additional tables that can be used to
generate the more complex test cases
• These tables are used to stored primary key values to
select which rows we will use as test cases
- We can then choose to test 1 row or 1,000 rows
79. etl_test_case
Column Name Description
ETL Test Case ID Surrogate Key
Table Data Mapping ID ID of the related Table Data Mapping
record
Target Primary Key Value Target Primary Key Value for the test
Successful Test Ind Indicate if the test was successful
Purpose : Defines row to row ETL Mapping for a test
case
80. etl_test_case_transform
Column Name Description
ETL Test Case Transform
ID
Surrogate Key
ETL Test Case ID ID of the related Test Case record
Target Column Target Column name
Expected value Expected value in target column
Successful Test Ind Indicate if the test was successful
Purpose : Defines expected target column value for
each test case defined in etl_test_case
81. One more table… cross_reference
Column Name Description
cross_reference_id Surrogate key
source_system_code Source system name
source_system_table Table row originated from in source system
source_system_val Primary Key value from source system
target_table_name Table row was stored to in target system
target_id Primary key value from target system
updated_date Date row was updated
modified_by User that modified the row
deleted_ind Indicated if the row is deleted
82. Data Mapping table in Excel that is used to
populate tables
Column Name Description
Target_db Target database that target column exists in
Target_table Target table that target column exists in
Target_table_type Target table type. Values are Main or Reference
Target_column Target column that is being loaded by the ETL process
Source_db Source database that source column is being extracted from
Source_table Source table that source column is being extracted from
Source_column Source column that is being extracted
Source_PK Primary Key of the source table that the source column exists in.
This field will be used in conjunction with the cross_reference table
to determine the target_pk value so equivalent records can be
compared between the source and target databases
Mapping_type Type of data mapping for the column. The values are:
Constant/”=“/Not Populated/Transform/Generated
Data_type Data type of the target column. Valid values are:
Integer/bigint/nvarchar/datetime/indicator
Nullable_ind Indicates whether the target column is nullable
Invalid_test Contains the values that can be tested to cause exceptions