SlideShare a Scribd company logo
1 of 39
Download to read offline
ML
PM Tutorial
9/30/2013 1:00:00 PM

"Testing the Data Warehouse—
Big Data, Big Problems"
Presented by:
Geoff Horne
iSQA

Brought to you by:

340 Corporate Way, Suite 300, Orange Park, FL 32073
888-268-8770 ∙ 904-278-0524 ∙ sqeinfo@sqe.com ∙ www.sqe.com
Geoff Horne
NZTester Magazine
Geoff Horne has an extensive background in test program/project directorship and
management, architecture, and general consulting. In New Zealand Geoff established and ran
ISQA as a testing consultancy which enjoys a local and international clientele in Australia, the
US, and the United Kingdom. He has held senior test management roles across a number of
diverse industry sectors, and is editor and publisher of the recently launched NZTester
magazine. Geoff has authored a variety of white papers on software testing and is a regular
speaker at the STAR conferences.
NZTester
Testing the Data
Warehouse
Geoff Horne, NZTester Magazine
ed@nztester.co.nz
April 2013

www.nztester.co.nz

+

+
NZTester

2

1
www.nztester.co.nz

NZTester

3

NZTester

4

www.nztester.co.nz

2
www.nztester.co.nz

Examples:

Source: Wikipedia

NZTester

5

www.nztester.co.nz

Examples:
• Walmart handles 1m transactions per hour imported into
databases containing 2.5 petabytes of data
• Google processes 25 petabytes of data per day (= ~25,600
terabytes)
• AT&T transfers 30 petabytes per day
• 90 trillion emails are sent per year
• World of Warcraft uses 1.3 petabytes of storage
• Facebook stores 2.5+ petabytes of user data including 50 billion
photos and processes 50+ terabytes per day
NZTester

6

3
www.nztester.co.nz

Examples:
• Wayback Machine stores 3 petabytes of data and processes 100
terabytes per day
• eBay stores 6.5 petabytes of data and processes 100 terabytes per
month
• CERN’s Large Hydron Collider generates 15 petabytes per year
• NASA Center for Climate Simulation store 32 petabytes of climate
observations
• Amazon.com handles millions of back-end operations every day
and operates the three largest Linux databases in the world
Source: Wikipedia, TheBigDataGroup.com

NZTester

7

www.nztester.co.nz

Characteristics – the 3 + 1 Vs:
• Volume: more data than ever before, most of the world’s data is
un-, semi- or multi-structured
• Variety: more sources than ever before – social, web logs, machine
logs, photos, documents, geotags, video….
• Velocity: some data only has value for a short space of time –
relevance engines, financial fraud sensors, early warning sensors….
• Vitality: agility is required in analytics, able to adapt quickly to
changing business needs

NZTester

8

4
www.nztester.co.nz

Enterprise Involvement:

• Awareness is high however 75% still wondering what its all about
• Usual answer – we don’t know what the business case is!

NZTester

9

NZTester

10

www.nztester.co.nz

Worldwide Data Growth:

5
www.nztester.co.nz

Challenges:
• How can we understand and use Big Data when it comes in an
unstructured format eg text or video?
• How can we capture the most important data as it happens and
deliver that to the right people in real-time?
• How can we store the data?
• How can we analyse and understand it given its size and our
computational capacity?
• How will we cater for the increasing data deluge?

NZTester

11

www.nztester.co.nz

Opportunities:
• McKinsey calls Big Data “the next frontier for innovation,
competition and productivity”.

• We can answer questions with Big Data that were beyond our reach
in the past.
• We can extract insight and knowledge, identify trends and use the
data to improve productivity, gain competitive advantage and
create substantial value.
• The challenges with Big Data are limited compared to the potential
benefits, which are limited only by our creativity and ability to
make connections among the trillions of bytes of data we have
access to.
NZTester

12

6
www.nztester.co.nz

So, how is all that data to be divvied up?

NZTester

13

NZTester

14

www.nztester.co.nz

So, how?

+

7
www.nztester.co.nz

Date Warehousing :

NZTester

15

www.nztester.co.nz

Date Warehousing :
• Pre-1990s: innovations by ACNielsen, Sperry & Teradata
• 1990 – Ralph Kimball & Red Brick Systems
• Businesses becoming increasingly dependent on timely intelligence
• Fast growing requirement for faster, more stable, reliable, flexible & easily
accessible intelligence repositories
• Big Data revolution will create exponential pressure to deliver quality solutions
• Will current toolsets be able to cope in terms of speed & reliability?
• New innovations, products, technologies will undoubtedly emerge and….

NZTester

16

8
www.nztester.co.nz

Date Warehousing :
If you take over the world, you’re gonna need lawyers!

NZTester

17

www.nztester.co.nz

Date Warehousing :
If you develop & deliver faster, more stable, reliable, flexible & easily accessible
intelligence repositories, you’re gonna need testers!

NZTester

18

9
www.nztester.co.nz

Why Test?
•

Source data is often huge in volume and obtained from varied types of data
repositories eg. application databases, spreadsheets, flat files, data feeds etc

•

Source data quality cannot be assumed and should be profiled and cleaned

•

Source data may be inconsistent and redundancy present

•

Source data records may be rejected by ETL procedures and logs will contain
error messages that will need addressing

•

Source field values may be missing where they should be present.

•

Source data history, business rules and audits of source data may not be
available.

•

Enterprise-wide data knowledge and business rules may not be available to
verify data.

NZTester

19

www.nztester.co.nz

Why Test(2)?
•

There may be multi-phased ETL procedures and a high level of data variety may
exist.

•

Data sources (eg. mainframe, spreadsheets, databases, flat files) will be updated
over time

•

Transaction-level traceability is difficult to attain during ETL

•

The data warehouse will be a strategic enterprise resource and heavily replied
upon.

NZTester

20

10
www.nztester.co.nz

What to Test?
•

Data Completeness – all expected data is correctly loaded via ETL procedures

•

Data Transformation – all data is transformed correctly according to business
rules and design specifications

•

Data Quality – the ETL application correctly rejects, remedies, ignores,
substitutes and reports on invalid data

•

Performance and Scalability - data loads and queries perform within expected
time frames and that the technical architecture is scalable.

•

Integration Testing - the ETL process accommodates all required upstream and
downstream processes.

•

User Acceptance Testing - the end result meets or exceeds business
stakeholder and user expectations.

•

Regression Testing - existing functionality remains intact each time a new
release of code is completed.
NZTester

21

NZTester

22

www.nztester.co.nz

Where to Test?
Primary

11
www.nztester.co.nz

Where to Test?
Primary

Secondary

NZTester

23

NZTester

24

www.nztester.co.nz

Where to Test?
Primary

Secondary

Tertiary

12
www.nztester.co.nz

Test Order?
Primary

NZTester

25

NZTester

26

www.nztester.co.nz

Test Order?
Primary

Secondary

13
www.nztester.co.nz

Test Order?
Primary

Secondary

Tertiary

NZTester

27

NZTester

28

www.nztester.co.nz

Test Order?
Primary

Secondary

Tertiary

14
www.nztester.co.nz

Useful Skills for Testing:
•

Good understanding of the fundamental concepts of data warehousing and
its place in an information management environment.

•

Understanding the role of the testing process as part of data warehouse
development.

•

Development of data warehouse test strategies, test plans, and test cases what they are and how to develop them, specifically for data warehouse
and decision-support systems.

•

Creating effective test cases and scenarios based on technical and
business/user requirements.

•

Able to participate in reviews of the data models, data mapping
documents, ETL design, and ETL coding; provide feedback to designers and
developers.

NZTester

29

www.nztester.co.nz

Useful Skills for Testing(2):
•

Able to participate in the change management process and documenting
relevant changes to decision support requirements.

•

A good understanding of data modelling and source-to-target data
mappings

•

Skills and experience with SQL, stored procedures, database management
and ETL tools

•

Data profiling experience

•

Microsoft Excel etc for data analysis

•

Understanding how data from the data warehouse is used by the business
and the business processes it is related to

NZTester

30

15
www.nztester.co.nz

Typical Data Warehouse Issues:
•

Inadequate ETL and stored procedure design documentation to aid in
test planning.

•

Field values are null when specified as Not Null.

•

Field constraints and SQL not coded correctly for the ETL tool.

•

Excessive ETL errors discovered after entry to formal QA - lack of unit
testing.

•

Source data does not meet table mapping specifications (ex. dirty data).

•

Source-to-target mappings: (1) often not reviewed before
implementation, (2) are in error or (3) not consistently maintained
throughout the development life cycle.

•

Data models are not adequately maintained during the development life
cycle.
NZTester

31

www.nztester.co.nz

Typical Data Warehouse Issues(2):
•

Duplicate field values are found in either source or target data when
defined in mapping specifications to be distinct.

•

ETL SQL/transformation errors leading to missing rows and invalid field
values.

•

Constraint violations exist in source (perhaps could be found through
data profiling).

•

Target data is incorrectly stored in non-standard formats.

•

Primary or foreign key values are incorrect for important relationship
linkages.

NZTester

32

16
www.nztester.co.nz

Transformation rules:
• Specify source table elements from all data sources including metadata
• Specify Data Warehouse destination table elements:
• Dimensions – reference data, keys etc.
• Facts – data assets
• Specify how the source table elements map onto the destination table
elements
• Form the basis of unit test cases

NZTester

33

www.nztester.co.nz

Transformation rules:

Source_Database_1
SD1_Table_1
SD1_T1_Attr_1
SD1_T1_Attr_2
SD1_T1_Attr_3
SD1_T1_Attr_4
SD1_Table_2
SD1_T2_Attr_1
SD1_T2_Attr_2
SD1_T2_Attr_3
SD1_T2_Attr_4

Dest_Database_DWH
DWH_Dim
DD1_T1_Attr_1
DD1_T1_Attr_2
DD1_T1_Attr_3
DWH_Fact
DD1_T2_Attr_1
DD1_T2_Attr_2
DD1_T2_Attr_3

Transformation Rules
= SD1_T1_Attr_1
= SD1_T1_Attr_2
= SD1_T1_Attr_3 + SD1_T1_Attr_4

= (SD1_T2_Attr_1 * SD1_T2_Attr_3)/52
= SD1_T2_Attr_3 + " " + SD1_T2_Attr_4
= DD1_T1_Attr_3/SD1_T2_Attr_4

NZTester

34

17
www.nztester.co.nz

Transformation rules:

NZTester

35

www.nztester.co.nz

From Source to Data Warehouse – Unit Testing:
• Know your transformation rules!
• Test cases should cover each transformation rule and include positive and
negative situations
• Row counts: Destination = Source + Rejected
• Correctly access all required data including metadata
• Cross reference Data Warehouse Dimensions to source tables
• All computations are correct especially those based on business rules
• Database queries, expected vs actual results

NZTester

36

18
www.nztester.co.nz

From Source to Data Warehouse – Unit Testing:
• Rejects are correctly handled and conform to business rules
• Slow-changing data eg. address, marital status
• Correctness of surrogate keys eg. time zones, currencies in Fact tables
• Opportunities for automation
• Dual drive:
• Source table driven – data ends up in the right place
• Destination table driven – contains the right result
• Risk-based testing

NZTester

37

www.nztester.co.nz

From Source to Data Warehouse – Integration Testing:
Once all extract, transformation and load unit tests have been
successfully executed, need to execute ETL process from end-to-end:
•
•
•
•

Job sequences and dependencies
Errors in one job that impact subsequent jobs
Error log generation
Restarting the ETL process in case of failure:
• Does it have to be started over?
• Can it start from where it failed?
• Restores required?
• Auto/manual?
• Impact of failure on subsequent jobs
• Processing of rejected records
• Reprocessing of already processed records
NZTester

38

19
www.nztester.co.nz

Data Warehouse Testing – Continually Changing Source
Systems
• Source data quality = garbage in/garbage out
• Inherent nature of Data Warehouse is continually updating data and source
systems so testing must allow for both
• New Source data/schema/application = retesting/regression testing
• Data Warehouse systems are always high maintenance
• Will always find new issues
• Opportunities for automation
• Package test suites modularly for ease of repeatability

NZTester

39

NZTester

40

www.nztester.co.nz

Planning for Data Warehouse Testing
•

Source data quality = garbage in/garbage out

•

Business requirements document

•

Data models for source and target schemas

•

Source-to-target mappings

•

ETL design documents Configuration management system

•

Project schedule

•

Data quality verification process

•

Incident and error handling system

20
www.nztester.co.nz

Planning for Data Warehouse Testing (2)
•

QA staff resources estimates and training needs

•

Testing environment budget and plan

•

Test tools

•

Test objectives

•

QA roles and responsibilities

•

Test deliverables

•

Test tasks

•

Defect reporting requirements

•

Entrance criteria that should be met before formal testing commences

•

Exit criteria that should be met before formal testing is completed
NZTester

41

www.nztester.co.nz

Planning Tests for Common Data Warehouse Issues
•

Inadequate ETL and stored procedure design documentation to aid in test
planning.

•

Field values are null when specified as Not Null.

•

Field constraints and SQL not coded correctly for the ETL tool.

•

Excessive ETL errors discovered after entry to formal QA.

•

Source data does not meet table mapping specifications (ex. dirty data).

•

Source-to-target mappings: (1) often not reviewed before implementation,
(2) are in error or (3) not consistently maintained throughout the
development life cycle.

•

Data models are not adequately maintained during the development life
cycle.
NZTester

42

21
www.nztester.co.nz

Planning Tests for Common Data Warehouse Issues (2)

•

Duplicate field values are found in either source or target data when defined in
mapping specifications to be distinct.

•

ETL SQL/transformation errors leading to missing rows and invalid field values.

•

Constraint violations exist in source (perhaps could be found through data
profiling).

•

Target data is incorrectly stored in nonstandard formats.

•

Primary or foreign key values are incorrect for important relationship linkages.

NZTester

43

www.nztester.co.nz
Some data mapping and data movement best practice goals:
•

Introduce common, consistent data movement analysis, design, and coding
patterns,

•

Develop reusable, enterprise-wide analysis, design, and construction
components through data movement modelling processes using data
movement tools, to ensure an acceptable level of data quality per business
specifications,

•

Introduce best practices and consistency in coding and naming standards.

•

Reduce costs to develop and maintain analysis, design and source code
deliverables, and

•

Integrate controls into the data movement process to ensure data quality and
integrity.

•

An ETL conceptual data movement model should be created as part of the
information management strategy. This model is part of the business model and
shows what data flows into, within, and out of the organization.
NZTester

44

22
www.nztester.co.nz

Those involved in test planning should consider the following verifications
as primary among those planned for various phases of the data warehouse
loading project.
• Verify data mappings, source to target
• Verify that all tables and specified fields were loaded from source to staging
• Verify that primary and foreign keys were properly generated using sequence
generator or similar
• Verify that not-null fields were populated
• Verify no data truncation in each field
• Verify data types and formats are as specified in design phase
• Verify no unexpected duplicate records in target tables.

NZTester

45

www.nztester.co.nz

Those involved in test planning should consider the following verifications
as primary among those planned for various phases of the data warehouse
loading project. (2)
• Verify transformations based on data table low level design (LLDs—usually
text documents describing design direction and specifications)
• Verify that numeric fields are populated with correct precision
• Verify that each ETL session completed with only planned exceptions
• Verify all cleansing, transformation, error and exception handling
• Verify stored procedure calculations and data mappings

NZTester

46

23
www.nztester.co.nz

Common QA Tasks for the Data Warehouse Team
During the data warehouse testing life cycle, many of the following tasks may be
typically be executed by the QA team. It is important to plan for those tasks
below that are keys to the project’s success.
• Complete test data acquisition and baseline all test data.
• Create test environments.
• Document test cases.
• Create and validate test scripts.
• Conduct unit testing and confirm that each component is functioning
correctly.
• Conduct testing to confirm that each group of components meet
specification.
NZTester

47

www.nztester.co.nz

Common QA Tasks for the Data Warehouse Team (2)
• Conduct unit testing and confirm that each component is functioning
correctly.
• Conduct testing to confirm that each group of components meet
specification.
• Conduct quality assurance testing to confirm that the solution meets
requirements.
• Perform load testing, or performance testing, to confirm that the system is
operating correctly and can handle the required data volumes and that data
can be loaded in the available load window.
• Specify and conduct reconciliation tests to manually confirm the validity of
data.
NZTester

48

24
www.nztester.co.nz

Common QA Tasks for the Data Warehouse Team (3)
• Conduct testing to ensure that the new software does not cause problems
with existing software.
• Conduct user acceptance testing to ensure that business intelligence reports
work as intended.
• Carefully manage scope to ensure that perceived defects are actually
requirement defects and not something that would be “nice to have, but we
forgot to ask.”
• Conduct a release test and production readiness test.
• Ensure that the on-going defect management and reporting is effective.
• Manage testing to ensure that each follows testing procedures and software
testing best practices.
NZTester

49

www.nztester.co.nz

Common QA Tasks for the Data Warehouse Team (4)
• Establish standard business terminology and value standards for each subject
area.
• Develop a business data dictionary that is owned and maintained by a series
of business-side data stewards. These individuals should ensure that all
terminology is kept current and that any associated rules are documented.
• Document the data in your core systems and how it relates to the standard
business terminology. This will include data transformation and conversion
rules.

NZTester

50

25
www.nztester.co.nz

Common QA Tasks for the Data Warehouse Team (5)
• Establish a set of data acceptance criteria and correction methods for your
standard business terminology. This should be identified by the business-side
data stewards and implemented against each of your core systems (where
practical).
• Implement a data profiling program as a production process. You should
• consider regularly measuring the data quality (and value accuracy) of the
data
• contained within each of your core operational systems.

NZTester

51

www.nztester.co.nz

Considerations for Selecting Data Warehouse Testers
Members of the QA staff who will plan and execute data warehouse testing
should have many of the following skills and experiences.
• Over five years of experience in testing and development in the fields of data
warehousing, client server technologies, which includes over five years of
extensive experience in data warehousing with Informatica, SSIS or other ETL
tools.
• Strong experience in Informatica or SQL Server, stored procedure and SQL
testing.
• Expertise in unit and integration testing of the associated ETL or stored
procedure code.

NZTester

52

26
www.nztester.co.nz

Considerations for Selecting Data Warehouse Testers (2)
• Experience in creating data verification unit and integration test plans and
test cases based on technical specifications.
• Demonstrated ability to write complex multi-table SQL queries.
• Excellent skills with OLAP, ETL, and business intelligence.
• Experience with dimensional data modelling using Erwin Modelling star join
schema/snowflake modelling, fact and dimensions tables, physical and logical
data modelling.
• Experience in OLAP reporting tools like Business Objects, SSRS, OBIEE or
Cognos.
• Expertise in data migration, data profiling, data cleansing.

NZTester

53

www.nztester.co.nz

Considerations for Selecting Data Warehouse Testers (3)
• Hands on experience with source-to-target mapping in enterprise data
warehouse environment. Responsible for QA tasks in all phases of the system
development life cycle (SDLC), from requirements definition through
implementation, on large-scale, mission critical processes; excellent
understanding of business requirements development, data analysis,
relational database design, systems development methodologies,
business/technical liaising, workflow and quality assurance.
• Experienced in business analysis, source system data analysis, architectural
reviews, data validation, data testing, resolution of data discrepancies and
ETL architecture. Good knowledge of QA processes.

NZTester

54

27
www.nztester.co.nz

Considerations for Selecting Data Warehouse Testers (4)
• Familiarity with performance tuning of targets databases and sources system.
• Extensively worked in both UNIX (AIX/HP/Sun Solaris) and Windows
(Windows SQL Server) platforms.
• Good knowledge of UNIX Shell Scripting and understanding of PERL scripting.
• Experience in Oracle 10g/9i/8i, PL/SQL, SQL, TOAD, Stored Procedures,
Functions and Triggers.

NZTester

55

www.nztester.co.nz

Analyze Source Data before and after Extraction to Staging
Process Description:
• Extract representative samples of data from each source or staging table.
• Parse the data for the purpose of profiling.
• Verify that not-null fields are populated as expected.
• Structure discovery—Does the data match the corresponding metadata? Do
field attributes of the data match expected patterns? Does the data adhere
to appropriate uniqueness and null value rules?
• Data discovery—Are the data values complete, accurate and unambiguous?
• Relationship discovery—Does the data adhere to specified required key
relationships across columns and tables? Are there inferred relationships
across columns, tables or databases? Is there redundant data?
NZTester

56

28
www.nztester.co.nz

Analyze Source Data before and after Extraction to
Staging (2)
• Verify that all required data from the source was extracted. Verify that
extraction process did not extract more or less data source than it should
have.
• Verify or write defects for exceptions and errors discovered during the ETL
process.
• Verify that extraction process did not extract duplicate data from the source
(usually this happens in repeatable processes where at point zero we need to
extract all data from the source file, but the during the next intervals we only
need to capture the modified, and new rows).
• Validate that no data truncation occurred during staging.

NZTester

57

www.nztester.co.nz

Verify Corrected, Cleaned, Source Data in Staging
This step works to improve the quality of existing data in source files or defects
that meet source specs but must be corrected before load.
Inputs:
• Files or tables (staging) that require cleansing; data definition and business
rule documents, data map of source files and fields; business rules, and data
anomalies discovered in earlier steps of this process.
• Fixes for data defects that will result in data does not meet specifications for
the application DWH.meet source specs but must be corrected before load.

NZTester

58

29
www.nztester.co.nz

Verify Corrected, Cleaned, Source Data in Staging (2)
Outputs: Defect reports, cleansed data, rejected or uncorrectable data.
Techniques and Tools: Data reengineering, transformation, and cleansing tools,
MS Access, Excel filtering.
Process Description: In this step, data with missing values, known errors, and
suspect data is corrected. Automated tools may be identified to best to locate,
clean/correct large volumes of data.

NZTester

59

www.nztester.co.nz

Verify Corrected, Cleaned, Source Data in Staging (3)
• Document the type of data cleansing approach taken for each data type in
• the repository.
• Determine how uncorrectable or suspect data is processed, rejected,
maintained for corrective action. SMEs and stakeholders should be involved
in the decision.
• Review ETL defect reports to assess rejected data excluded from source files
or information group targeted for the warehouse.
• Determine if data not meeting quality rules was accepted.
• Document in defect reports, records and important fields that cannot be
easily corrected.
NZTester

60

30
www.nztester.co.nz

Verify Corrected, Cleaned, Source Data in Staging (4)
• Document records that were corrected and how corrected.
• Certification Method: Validation of data cleansing processes could be a tricky
proposition, but certainly doable. All data cleansing requirements should be
clearly identified. The QA team should learn all data cleansing tools available
and their methods. QA should create various conditions as specified in the
requirements for the data cleansing tool to support and validate its results.
QA will run a volume of real data through each tool to validate accuracy as
well as performance.

NZTester

61

www.nztester.co.nz

Verifying Matched and Consolidated Data
There are often ETL processes where data has been consolidated from various
files into a single occurrence of records. The cleaned and consolidated data can
be assessed to very matched and consolidated data.
Much of the ETL heavy lifting occurs in the transform step where combined data,
data with quality issues, updated data, surrogate keys, and build aggregates are
processed.
Inputs: Analysis of all files or databases for each entity type.

NZTester

62

31
www.nztester.co.nz

Verifying Matched and Consolidated Data (2)
Outputs:
• Report of matched, consolidated, related data that is suspect or in error.
• List of duplicate data records or fields.
• List of duplicate data suspects.
Techniques and Tools: Data matching techniques or tools; data cleansing
software with matching and merging capabilities.

NZTester

63

www.nztester.co.nz

Verifying Matched and Consolidated Data (3)
Process Description:
• Establish match criteria for data. Select attributes to become the basis for
possible duplicate occurrences (e.g., names, account numbers).
• Determine the impact of incorrectly consolidated records. If the negative
impact of consolidating two different occurrences such as different
customers into a single customer record exists, submit defect reports. The fix
should be higher controls to help avoid such consolidations in the future.
• Determine the matching techniques to be used. Exact character match in two
corresponding fields such as wild card match, key words, close match, etc.

NZTester

64

32
www.nztester.co.nz

Verifying Matched and Consolidated Data (4)
• Compare match criteria for specific record with all other records within a
given file to look for intra-file duplicate records.
• Compare match criteria for a specific record with all records in another file to
seek inter-file duplicate records.
• Evaluate potential matched occurrences to assure they are, in fact, duplicate.
• Verify that consolidated data into single occurrences is correct.
• Examine and re-relate data related to old records being consolidated to new
occurrence-of-reference record. Validate that no related data was
overlooked.

NZTester

65

www.nztester.co.nz

Verify Transformed/Enhanced/Calculated Data to
Target Tables
At this stage, base data is being prepared for loading into the application
operational tables and the data mart. This includes converting and formatting
cleansed, consolidated data into the new data architecture, and possibly
enhancing internal operational data with external data licensed from service
providers.
The objective is to successfully map the cleaned, corrected and consolidated
data into the DWH environment.

NZTester

66

33
www.nztester.co.nz

Verify Transformed/Enhanced/Calculated Data to
Target Tables (2)
Inputs: Cleansed, consolidated data; external data from service providers;
business rules governing the source data; business rules governing the target
DWH data; transformation rules governing the transformation process; DWH or
target data architecture; data map of source data to standardized data.
Output: Transformed, calculated, enhanced data; updated data map of source
data to standardized data; data map of source data to target data architecture.

NZTester

67

www.nztester.co.nz

Verify Transformed/Enhanced/Calculated Data to
Target Tables (3)
Techniques and Tools: Data transformation software; external or online or
public databases.
Process Description:
• Verify that the data warehouse construction team is using the data map of
source data to the DWH standardized data, verify the mapping.
• Verify that the data transformation rules and routines are correct.
• Verify the data transformations to the DWH and assure that the processes
were performed according to specifications.

NZTester

68

34
www.nztester.co.nz

Verify Transformed/Enhanced/Calculated Data to
Target Tables (4)
• Verify that data loaded in the operational tables and data mart meets the
definition of the data architecture including data types, formats, accuracy,
etc.
• Develop scenarios to be covered in Load Integration Testing.
• Count Validation: Record Count Verification DWH backend/Reporting
queries against source and target as an initial check.
• Dimensional Analysis: Data integrity exists between the various source
tables and parent/child relationships.
• Statistical Analysis: Validation for various calculations.

NZTester

69

www.nztester.co.nz

Verify Transformed/Enhanced/Calculated Data to
Target Tables (5)
• Data Quality Validation: Check for missing data, negatives and consistency.
Field-by-field data verification will be done to check the consistency of
source and target data.
• Granularity: Validate at the lowest granular level possible (lowest in the
hierarchy, e.g., Country-City-Sector—start with test cases).
• Dynamic Transformation Rules and Tables: Such methods need to be
checked continuously to ensure the correct transformation routines are
executed. Verify that dynamic mapping tables and dynamic mapping rules
provide an easy, documented, and automated way for transforming values
from one or more sources into a standard value presented in the DWH.
NZTester

70

35
www.nztester.co.nz

Verify Transformed/Enhanced/Calculated Data to
Target Tables (6)
• Verification Method: The QA team will identify the detailed requirements
as they relate to transformation and validate the dynamic transformation
rules and tables against DWH records. Utilizing SQL and related tools, the
team will identify unique values in source data files that are subject to
transformation. The QA team identifies the results from the
transformation process and validate that such transformation have
accurately taken place.

NZTester

71

NZTester

72

www.nztester.co.nz

Questions?

36
NZTester
Testing the Data
Warehouse
Geoff Horne, NZTester Magazine
ed@nztester.co.nz
April 2013

37

More Related Content

What's hot

Dsc 2021 presentation_radovan_bacovic
Dsc 2021 presentation_radovan_bacovicDsc 2021 presentation_radovan_bacovic
Dsc 2021 presentation_radovan_bacovicRadovan Baćović
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex
 
ironSource Atom BigData Berlin
ironSource Atom BigData BerlinironSource Atom BigData Berlin
ironSource Atom BigData BerlinShimon Tolts
 
Data summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data opsData summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data opsRyan Gross
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makerszekeLabs Technologies
 
Information Technology
Information TechnologyInformation Technology
Information TechnologySahil Mahajan
 
Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of...
 Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of... Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of...
Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of...Dataconomy Media
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationDenodo
 
How Yellowbrick Data Integrates to Existing Environments Webcast
How Yellowbrick Data Integrates to Existing Environments WebcastHow Yellowbrick Data Integrates to Existing Environments Webcast
How Yellowbrick Data Integrates to Existing Environments WebcastYellowbrick Data
 
Southwest Power Pool big data case study
Southwest Power Pool big data case study Southwest Power Pool big data case study
Southwest Power Pool big data case study Seeling Cheung
 
Constant Contact: An Online Marketing Leader’s Data Lake Journey
Constant Contact: An Online Marketing Leader’s Data Lake JourneyConstant Contact: An Online Marketing Leader’s Data Lake Journey
Constant Contact: An Online Marketing Leader’s Data Lake JourneySeeling Cheung
 
Low-tech, Low-cost data management: Six insights from national reporting on f...
Low-tech, Low-cost data management: Six insights from national reporting on f...Low-tech, Low-cost data management: Six insights from national reporting on f...
Low-tech, Low-cost data management: Six insights from national reporting on f...srjbridge
 
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScape
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScapeData Vault 2.0 DeMystified with Dan Linstedt and WhereScape
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScapeWhereScape
 
Introduction: Architecting for Scale
Introduction: Architecting for ScaleIntroduction: Architecting for Scale
Introduction: Architecting for ScaleDataStax
 
Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]shuwutong
 
Testing the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big ProblemsTesting the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big ProblemsTechWell
 
Case Study: Big Data Analytics
Case Study: Big Data AnalyticsCase Study: Big Data Analytics
Case Study: Big Data AnalyticsAbhinav Das
 
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarNext generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarBig Data Spain
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Dataconomy Media
 

What's hot (20)

Dsc 2021 presentation_radovan_bacovic
Dsc 2021 presentation_radovan_bacovicDsc 2021 presentation_radovan_bacovic
Dsc 2021 presentation_radovan_bacovic
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
ironSource Atom BigData Berlin
ironSource Atom BigData BerlinironSource Atom BigData Berlin
ironSource Atom BigData Berlin
 
Data summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data opsData summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data ops
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
Information Technology
Information TechnologyInformation Technology
Information Technology
 
Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of...
 Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of... Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of...
Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of...
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
How Yellowbrick Data Integrates to Existing Environments Webcast
How Yellowbrick Data Integrates to Existing Environments WebcastHow Yellowbrick Data Integrates to Existing Environments Webcast
How Yellowbrick Data Integrates to Existing Environments Webcast
 
Southwest Power Pool big data case study
Southwest Power Pool big data case study Southwest Power Pool big data case study
Southwest Power Pool big data case study
 
Constant Contact: An Online Marketing Leader’s Data Lake Journey
Constant Contact: An Online Marketing Leader’s Data Lake JourneyConstant Contact: An Online Marketing Leader’s Data Lake Journey
Constant Contact: An Online Marketing Leader’s Data Lake Journey
 
Low-tech, Low-cost data management: Six insights from national reporting on f...
Low-tech, Low-cost data management: Six insights from national reporting on f...Low-tech, Low-cost data management: Six insights from national reporting on f...
Low-tech, Low-cost data management: Six insights from national reporting on f...
 
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScape
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScapeData Vault 2.0 DeMystified with Dan Linstedt and WhereScape
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScape
 
Introduction: Architecting for Scale
Introduction: Architecting for ScaleIntroduction: Architecting for Scale
Introduction: Architecting for Scale
 
Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]
 
Testing the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big ProblemsTesting the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big Problems
 
Case Study: Big Data Analytics
Case Study: Big Data AnalyticsCase Study: Big Data Analytics
Case Study: Big Data Analytics
 
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarNext generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
 

Viewers also liked

3 arte islámico e hispanomusulmán 2010 2011
3 arte islámico e hispanomusulmán 2010 20113 arte islámico e hispanomusulmán 2010 2011
3 arte islámico e hispanomusulmán 2010 2011luismillanalonso
 
La importancia de la lengua oral en el aula
La importancia de la lengua oral en el aula La importancia de la lengua oral en el aula
La importancia de la lengua oral en el aula Rode Huillca Mosquera
 
3a aula gestao conteudo rs 23mai2015 - Senac-SP
3a aula gestao conteudo rs 23mai2015 -  Senac-SP3a aula gestao conteudo rs 23mai2015 -  Senac-SP
3a aula gestao conteudo rs 23mai2015 - Senac-SPIvone Rocha
 
Media Literacy Use in Classroom
Media Literacy Use in ClassroomMedia Literacy Use in Classroom
Media Literacy Use in ClassroomSr Margaret Kerry
 
Artigo formação de professores
Artigo formação de professoresArtigo formação de professores
Artigo formação de professoresÉvila Vasconcelos
 
Url autoridad, liderazgo y direccion
Url autoridad, liderazgo y direccionUrl autoridad, liderazgo y direccion
Url autoridad, liderazgo y direccionEdhyGuerrero
 
Url autoridad, liderazgo y direccion
Url autoridad, liderazgo y direccionUrl autoridad, liderazgo y direccion
Url autoridad, liderazgo y direccionEdhyGuerrero
 
A Importância Das Competências Interdisciplinares para os Gerentes de Projetos
A Importância Das Competências Interdisciplinares para os Gerentes de ProjetosA Importância Das Competências Interdisciplinares para os Gerentes de Projetos
A Importância Das Competências Interdisciplinares para os Gerentes de ProjetosHelio Ferenhof, Dr. Eng, MBA, PMP, ITIL .'.
 
(TCC) O desafio da mensuração de resultados pela assessoria de imprensa
(TCC) O desafio da mensuração de resultados pela assessoria de imprensa(TCC) O desafio da mensuração de resultados pela assessoria de imprensa
(TCC) O desafio da mensuração de resultados pela assessoria de imprensaJuliana Abade
 
2 comunicacion organizacional
2 comunicacion organizacional2 comunicacion organizacional
2 comunicacion organizacional3163148172
 
Milward Brown/IAB Efficacité digitale : impact des formats et de la création
Milward Brown/IAB Efficacité digitale : impact des formats et de la création Milward Brown/IAB Efficacité digitale : impact des formats et de la création
Milward Brown/IAB Efficacité digitale : impact des formats et de la création Kantar Millward Brown France
 
"Marketing" na Casa Sou.l versão 1.0
"Marketing" na Casa Sou.l versão 1.0"Marketing" na Casa Sou.l versão 1.0
"Marketing" na Casa Sou.l versão 1.0Guilherme Lito
 
CatáLogo De Cursos Bfc
CatáLogo De Cursos BfcCatáLogo De Cursos Bfc
CatáLogo De Cursos BfcJorge García
 

Viewers also liked (20)

Google Marketing
Google MarketingGoogle Marketing
Google Marketing
 
Las frutas
Las frutasLas frutas
Las frutas
 
BLOQUE
BLOQUE BLOQUE
BLOQUE
 
3 arte islámico e hispanomusulmán 2010 2011
3 arte islámico e hispanomusulmán 2010 20113 arte islámico e hispanomusulmán 2010 2011
3 arte islámico e hispanomusulmán 2010 2011
 
La importancia de la lengua oral en el aula
La importancia de la lengua oral en el aula La importancia de la lengua oral en el aula
La importancia de la lengua oral en el aula
 
About me
About meAbout me
About me
 
Modelo curricular de Frida Díaz Barriga 5
Modelo curricular de Frida Díaz Barriga 5Modelo curricular de Frida Díaz Barriga 5
Modelo curricular de Frida Díaz Barriga 5
 
3a aula gestao conteudo rs 23mai2015 - Senac-SP
3a aula gestao conteudo rs 23mai2015 -  Senac-SP3a aula gestao conteudo rs 23mai2015 -  Senac-SP
3a aula gestao conteudo rs 23mai2015 - Senac-SP
 
Media Literacy Use in Classroom
Media Literacy Use in ClassroomMedia Literacy Use in Classroom
Media Literacy Use in Classroom
 
Artigo formação de professores
Artigo formação de professoresArtigo formação de professores
Artigo formação de professores
 
Fam obj estudio_2011
Fam obj estudio_2011Fam obj estudio_2011
Fam obj estudio_2011
 
Url autoridad, liderazgo y direccion
Url autoridad, liderazgo y direccionUrl autoridad, liderazgo y direccion
Url autoridad, liderazgo y direccion
 
Url autoridad, liderazgo y direccion
Url autoridad, liderazgo y direccionUrl autoridad, liderazgo y direccion
Url autoridad, liderazgo y direccion
 
A Importância Das Competências Interdisciplinares para os Gerentes de Projetos
A Importância Das Competências Interdisciplinares para os Gerentes de ProjetosA Importância Das Competências Interdisciplinares para os Gerentes de Projetos
A Importância Das Competências Interdisciplinares para os Gerentes de Projetos
 
(TCC) O desafio da mensuração de resultados pela assessoria de imprensa
(TCC) O desafio da mensuração de resultados pela assessoria de imprensa(TCC) O desafio da mensuração de resultados pela assessoria de imprensa
(TCC) O desafio da mensuração de resultados pela assessoria de imprensa
 
2 comunicacion organizacional
2 comunicacion organizacional2 comunicacion organizacional
2 comunicacion organizacional
 
Milward Brown/IAB Efficacité digitale : impact des formats et de la création
Milward Brown/IAB Efficacité digitale : impact des formats et de la création Milward Brown/IAB Efficacité digitale : impact des formats et de la création
Milward Brown/IAB Efficacité digitale : impact des formats et de la création
 
"Marketing" na Casa Sou.l versão 1.0
"Marketing" na Casa Sou.l versão 1.0"Marketing" na Casa Sou.l versão 1.0
"Marketing" na Casa Sou.l versão 1.0
 
CatáLogo De Cursos Bfc
CatáLogo De Cursos BfcCatáLogo De Cursos Bfc
CatáLogo De Cursos Bfc
 
Acessibilidade Urbana
Acessibilidade UrbanaAcessibilidade Urbana
Acessibilidade Urbana
 

Similar to Testing the Data Warehouse—Big Data, Big Problems

Testing the Data Warehouse
Testing the Data WarehouseTesting the Data Warehouse
Testing the Data WarehouseTechWell
 
Testing the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big ProblemsTesting the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big ProblemsTechWell
 
The Death of the Star Schema
The Death of the Star SchemaThe Death of the Star Schema
The Death of the Star SchemaDATAVERSITY
 
Deepesh_Rai_Resume_Latest
Deepesh_Rai_Resume_LatestDeepesh_Rai_Resume_Latest
Deepesh_Rai_Resume_LatestDeepesh Rai
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Productioniguazio
 
Yield Improvement Through Data Analysis using TIBCO Spotfire
Yield Improvement Through Data Analysis using TIBCO SpotfireYield Improvement Through Data Analysis using TIBCO Spotfire
Yield Improvement Through Data Analysis using TIBCO SpotfireTIBCO Spotfire
 
How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)Denodo
 
Varsha_CV_ETLTester5.8Years
Varsha_CV_ETLTester5.8YearsVarsha_CV_ETLTester5.8Years
Varsha_CV_ETLTester5.8YearsVarsha Hiremath
 
Internet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data InfrastructureInternet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data InfrastructureSingleStore
 
Technology Trends for 2019: What it Means for Your Business
Technology Trends for 2019: What it Means for Your BusinessTechnology Trends for 2019: What it Means for Your Business
Technology Trends for 2019: What it Means for Your BusinessPrecisely
 
Resume - Deepak v.s
Resume -  Deepak v.sResume -  Deepak v.s
Resume - Deepak v.sDeepak V S
 
Big Data Scotland
Big Data ScotlandBig Data Scotland
Big Data ScotlandRay Bugg
 
Big Data is not Rocket Science
Big Data is not Rocket ScienceBig Data is not Rocket Science
Big Data is not Rocket Sciencelarsgeorge
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Denodo
 
Data Science and Enterprise Engineering with Michael Finger and Chris Robison
Data Science and Enterprise Engineering with Michael Finger and Chris RobisonData Science and Enterprise Engineering with Michael Finger and Chris Robison
Data Science and Enterprise Engineering with Michael Finger and Chris RobisonDatabricks
 
Keeping the Pulse of Your Data:  Why You Need Data Observability 
Keeping the Pulse of Your Data:  Why You Need Data Observability Keeping the Pulse of Your Data:  Why You Need Data Observability 
Keeping the Pulse of Your Data:  Why You Need Data Observability Precisely
 

Similar to Testing the Data Warehouse—Big Data, Big Problems (20)

Testing the Data Warehouse
Testing the Data WarehouseTesting the Data Warehouse
Testing the Data Warehouse
 
Testing the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big ProblemsTesting the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big Problems
 
The Death of the Star Schema
The Death of the Star SchemaThe Death of the Star Schema
The Death of the Star Schema
 
Deepesh_Rai_Resume_Latest
Deepesh_Rai_Resume_LatestDeepesh_Rai_Resume_Latest
Deepesh_Rai_Resume_Latest
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
 
Yield Improvement Through Data Analysis using TIBCO Spotfire
Yield Improvement Through Data Analysis using TIBCO SpotfireYield Improvement Through Data Analysis using TIBCO Spotfire
Yield Improvement Through Data Analysis using TIBCO Spotfire
 
How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)
 
Varsha_CV_ETLTester5.8Years
Varsha_CV_ETLTester5.8YearsVarsha_CV_ETLTester5.8Years
Varsha_CV_ETLTester5.8Years
 
Internet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data InfrastructureInternet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data Infrastructure
 
Technology Trends for 2019: What it Means for Your Business
Technology Trends for 2019: What it Means for Your BusinessTechnology Trends for 2019: What it Means for Your Business
Technology Trends for 2019: What it Means for Your Business
 
Resume - Deepak v.s
Resume -  Deepak v.sResume -  Deepak v.s
Resume - Deepak v.s
 
CV MG DU TOIT V0 8.8
CV MG DU TOIT V0 8.8CV MG DU TOIT V0 8.8
CV MG DU TOIT V0 8.8
 
Tamilarasu_Uthirasamy_10Yrs_Resume
Tamilarasu_Uthirasamy_10Yrs_ResumeTamilarasu_Uthirasamy_10Yrs_Resume
Tamilarasu_Uthirasamy_10Yrs_Resume
 
Big Data Scotland
Big Data ScotlandBig Data Scotland
Big Data Scotland
 
Big Data is not Rocket Science
Big Data is not Rocket ScienceBig Data is not Rocket Science
Big Data is not Rocket Science
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)
 
Data Science and Enterprise Engineering with Michael Finger and Chris Robison
Data Science and Enterprise Engineering with Michael Finger and Chris RobisonData Science and Enterprise Engineering with Michael Finger and Chris Robison
Data Science and Enterprise Engineering with Michael Finger and Chris Robison
 
Resume
ResumeResume
Resume
 
Keeping the Pulse of Your Data:  Why You Need Data Observability 
Keeping the Pulse of Your Data:  Why You Need Data Observability Keeping the Pulse of Your Data:  Why You Need Data Observability 
Keeping the Pulse of Your Data:  Why You Need Data Observability 
 
sandhya exp resume
sandhya exp resume sandhya exp resume
sandhya exp resume
 

More from TechWell

Failing and Recovering
Failing and RecoveringFailing and Recovering
Failing and RecoveringTechWell
 
Instill a DevOps Testing Culture in Your Team and Organization
Instill a DevOps Testing Culture in Your Team and Organization Instill a DevOps Testing Culture in Your Team and Organization
Instill a DevOps Testing Culture in Your Team and Organization TechWell
 
Test Design for Fully Automated Build Architecture
Test Design for Fully Automated Build ArchitectureTest Design for Fully Automated Build Architecture
Test Design for Fully Automated Build ArchitectureTechWell
 
System-Level Test Automation: Ensuring a Good Start
System-Level Test Automation: Ensuring a Good StartSystem-Level Test Automation: Ensuring a Good Start
System-Level Test Automation: Ensuring a Good StartTechWell
 
Build Your Mobile App Quality and Test Strategy
Build Your Mobile App Quality and Test StrategyBuild Your Mobile App Quality and Test Strategy
Build Your Mobile App Quality and Test StrategyTechWell
 
Testing Transformation: The Art and Science for Success
Testing Transformation: The Art and Science for SuccessTesting Transformation: The Art and Science for Success
Testing Transformation: The Art and Science for SuccessTechWell
 
Implement BDD with Cucumber and SpecFlow
Implement BDD with Cucumber and SpecFlowImplement BDD with Cucumber and SpecFlow
Implement BDD with Cucumber and SpecFlowTechWell
 
Develop WebDriver Automated Tests—and Keep Your Sanity
Develop WebDriver Automated Tests—and Keep Your SanityDevelop WebDriver Automated Tests—and Keep Your Sanity
Develop WebDriver Automated Tests—and Keep Your SanityTechWell
 
Eliminate Cloud Waste with a Holistic DevOps Strategy
Eliminate Cloud Waste with a Holistic DevOps StrategyEliminate Cloud Waste with a Holistic DevOps Strategy
Eliminate Cloud Waste with a Holistic DevOps StrategyTechWell
 
Transform Test Organizations for the New World of DevOps
Transform Test Organizations for the New World of DevOpsTransform Test Organizations for the New World of DevOps
Transform Test Organizations for the New World of DevOpsTechWell
 
The Fourth Constraint in Project Delivery—Leadership
The Fourth Constraint in Project Delivery—LeadershipThe Fourth Constraint in Project Delivery—Leadership
The Fourth Constraint in Project Delivery—LeadershipTechWell
 
Resolve the Contradiction of Specialists within Agile Teams
Resolve the Contradiction of Specialists within Agile TeamsResolve the Contradiction of Specialists within Agile Teams
Resolve the Contradiction of Specialists within Agile TeamsTechWell
 
Pin the Tail on the Metric: A Field-Tested Agile Game
Pin the Tail on the Metric: A Field-Tested Agile GamePin the Tail on the Metric: A Field-Tested Agile Game
Pin the Tail on the Metric: A Field-Tested Agile GameTechWell
 
Agile Performance Holarchy (APH)—A Model for Scaling Agile Teams
Agile Performance Holarchy (APH)—A Model for Scaling Agile TeamsAgile Performance Holarchy (APH)—A Model for Scaling Agile Teams
Agile Performance Holarchy (APH)—A Model for Scaling Agile TeamsTechWell
 
A Business-First Approach to DevOps Implementation
A Business-First Approach to DevOps ImplementationA Business-First Approach to DevOps Implementation
A Business-First Approach to DevOps ImplementationTechWell
 
Databases in a Continuous Integration/Delivery Process
Databases in a Continuous Integration/Delivery ProcessDatabases in a Continuous Integration/Delivery Process
Databases in a Continuous Integration/Delivery ProcessTechWell
 
Mobile Testing: What—and What Not—to Automate
Mobile Testing: What—and What Not—to AutomateMobile Testing: What—and What Not—to Automate
Mobile Testing: What—and What Not—to AutomateTechWell
 
Cultural Intelligence: A Key Skill for Success
Cultural Intelligence: A Key Skill for SuccessCultural Intelligence: A Key Skill for Success
Cultural Intelligence: A Key Skill for SuccessTechWell
 
Turn the Lights On: A Power Utility Company's Agile Transformation
Turn the Lights On: A Power Utility Company's Agile TransformationTurn the Lights On: A Power Utility Company's Agile Transformation
Turn the Lights On: A Power Utility Company's Agile TransformationTechWell
 

More from TechWell (20)

Failing and Recovering
Failing and RecoveringFailing and Recovering
Failing and Recovering
 
Instill a DevOps Testing Culture in Your Team and Organization
Instill a DevOps Testing Culture in Your Team and Organization Instill a DevOps Testing Culture in Your Team and Organization
Instill a DevOps Testing Culture in Your Team and Organization
 
Test Design for Fully Automated Build Architecture
Test Design for Fully Automated Build ArchitectureTest Design for Fully Automated Build Architecture
Test Design for Fully Automated Build Architecture
 
System-Level Test Automation: Ensuring a Good Start
System-Level Test Automation: Ensuring a Good StartSystem-Level Test Automation: Ensuring a Good Start
System-Level Test Automation: Ensuring a Good Start
 
Build Your Mobile App Quality and Test Strategy
Build Your Mobile App Quality and Test StrategyBuild Your Mobile App Quality and Test Strategy
Build Your Mobile App Quality and Test Strategy
 
Testing Transformation: The Art and Science for Success
Testing Transformation: The Art and Science for SuccessTesting Transformation: The Art and Science for Success
Testing Transformation: The Art and Science for Success
 
Implement BDD with Cucumber and SpecFlow
Implement BDD with Cucumber and SpecFlowImplement BDD with Cucumber and SpecFlow
Implement BDD with Cucumber and SpecFlow
 
Develop WebDriver Automated Tests—and Keep Your Sanity
Develop WebDriver Automated Tests—and Keep Your SanityDevelop WebDriver Automated Tests—and Keep Your Sanity
Develop WebDriver Automated Tests—and Keep Your Sanity
 
Ma 15
Ma 15Ma 15
Ma 15
 
Eliminate Cloud Waste with a Holistic DevOps Strategy
Eliminate Cloud Waste with a Holistic DevOps StrategyEliminate Cloud Waste with a Holistic DevOps Strategy
Eliminate Cloud Waste with a Holistic DevOps Strategy
 
Transform Test Organizations for the New World of DevOps
Transform Test Organizations for the New World of DevOpsTransform Test Organizations for the New World of DevOps
Transform Test Organizations for the New World of DevOps
 
The Fourth Constraint in Project Delivery—Leadership
The Fourth Constraint in Project Delivery—LeadershipThe Fourth Constraint in Project Delivery—Leadership
The Fourth Constraint in Project Delivery—Leadership
 
Resolve the Contradiction of Specialists within Agile Teams
Resolve the Contradiction of Specialists within Agile TeamsResolve the Contradiction of Specialists within Agile Teams
Resolve the Contradiction of Specialists within Agile Teams
 
Pin the Tail on the Metric: A Field-Tested Agile Game
Pin the Tail on the Metric: A Field-Tested Agile GamePin the Tail on the Metric: A Field-Tested Agile Game
Pin the Tail on the Metric: A Field-Tested Agile Game
 
Agile Performance Holarchy (APH)—A Model for Scaling Agile Teams
Agile Performance Holarchy (APH)—A Model for Scaling Agile TeamsAgile Performance Holarchy (APH)—A Model for Scaling Agile Teams
Agile Performance Holarchy (APH)—A Model for Scaling Agile Teams
 
A Business-First Approach to DevOps Implementation
A Business-First Approach to DevOps ImplementationA Business-First Approach to DevOps Implementation
A Business-First Approach to DevOps Implementation
 
Databases in a Continuous Integration/Delivery Process
Databases in a Continuous Integration/Delivery ProcessDatabases in a Continuous Integration/Delivery Process
Databases in a Continuous Integration/Delivery Process
 
Mobile Testing: What—and What Not—to Automate
Mobile Testing: What—and What Not—to AutomateMobile Testing: What—and What Not—to Automate
Mobile Testing: What—and What Not—to Automate
 
Cultural Intelligence: A Key Skill for Success
Cultural Intelligence: A Key Skill for SuccessCultural Intelligence: A Key Skill for Success
Cultural Intelligence: A Key Skill for Success
 
Turn the Lights On: A Power Utility Company's Agile Transformation
Turn the Lights On: A Power Utility Company's Agile TransformationTurn the Lights On: A Power Utility Company's Agile Transformation
Turn the Lights On: A Power Utility Company's Agile Transformation
 

Recently uploaded

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 

Recently uploaded (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 

Testing the Data Warehouse—Big Data, Big Problems

  • 1. ML PM Tutorial 9/30/2013 1:00:00 PM "Testing the Data Warehouse— Big Data, Big Problems" Presented by: Geoff Horne iSQA Brought to you by: 340 Corporate Way, Suite 300, Orange Park, FL 32073 888-268-8770 ∙ 904-278-0524 ∙ sqeinfo@sqe.com ∙ www.sqe.com
  • 2. Geoff Horne NZTester Magazine Geoff Horne has an extensive background in test program/project directorship and management, architecture, and general consulting. In New Zealand Geoff established and ran ISQA as a testing consultancy which enjoys a local and international clientele in Australia, the US, and the United Kingdom. He has held senior test management roles across a number of diverse industry sectors, and is editor and publisher of the recently launched NZTester magazine. Geoff has authored a variety of white papers on software testing and is a regular speaker at the STAR conferences.
  • 3. NZTester Testing the Data Warehouse Geoff Horne, NZTester Magazine ed@nztester.co.nz April 2013 www.nztester.co.nz + + NZTester 2 1
  • 5. www.nztester.co.nz Examples: Source: Wikipedia NZTester 5 www.nztester.co.nz Examples: • Walmart handles 1m transactions per hour imported into databases containing 2.5 petabytes of data • Google processes 25 petabytes of data per day (= ~25,600 terabytes) • AT&T transfers 30 petabytes per day • 90 trillion emails are sent per year • World of Warcraft uses 1.3 petabytes of storage • Facebook stores 2.5+ petabytes of user data including 50 billion photos and processes 50+ terabytes per day NZTester 6 3
  • 6. www.nztester.co.nz Examples: • Wayback Machine stores 3 petabytes of data and processes 100 terabytes per day • eBay stores 6.5 petabytes of data and processes 100 terabytes per month • CERN’s Large Hydron Collider generates 15 petabytes per year • NASA Center for Climate Simulation store 32 petabytes of climate observations • Amazon.com handles millions of back-end operations every day and operates the three largest Linux databases in the world Source: Wikipedia, TheBigDataGroup.com NZTester 7 www.nztester.co.nz Characteristics – the 3 + 1 Vs: • Volume: more data than ever before, most of the world’s data is un-, semi- or multi-structured • Variety: more sources than ever before – social, web logs, machine logs, photos, documents, geotags, video…. • Velocity: some data only has value for a short space of time – relevance engines, financial fraud sensors, early warning sensors…. • Vitality: agility is required in analytics, able to adapt quickly to changing business needs NZTester 8 4
  • 7. www.nztester.co.nz Enterprise Involvement: • Awareness is high however 75% still wondering what its all about • Usual answer – we don’t know what the business case is! NZTester 9 NZTester 10 www.nztester.co.nz Worldwide Data Growth: 5
  • 8. www.nztester.co.nz Challenges: • How can we understand and use Big Data when it comes in an unstructured format eg text or video? • How can we capture the most important data as it happens and deliver that to the right people in real-time? • How can we store the data? • How can we analyse and understand it given its size and our computational capacity? • How will we cater for the increasing data deluge? NZTester 11 www.nztester.co.nz Opportunities: • McKinsey calls Big Data “the next frontier for innovation, competition and productivity”. • We can answer questions with Big Data that were beyond our reach in the past. • We can extract insight and knowledge, identify trends and use the data to improve productivity, gain competitive advantage and create substantial value. • The challenges with Big Data are limited compared to the potential benefits, which are limited only by our creativity and ability to make connections among the trillions of bytes of data we have access to. NZTester 12 6
  • 9. www.nztester.co.nz So, how is all that data to be divvied up? NZTester 13 NZTester 14 www.nztester.co.nz So, how? + 7
  • 10. www.nztester.co.nz Date Warehousing : NZTester 15 www.nztester.co.nz Date Warehousing : • Pre-1990s: innovations by ACNielsen, Sperry & Teradata • 1990 – Ralph Kimball & Red Brick Systems • Businesses becoming increasingly dependent on timely intelligence • Fast growing requirement for faster, more stable, reliable, flexible & easily accessible intelligence repositories • Big Data revolution will create exponential pressure to deliver quality solutions • Will current toolsets be able to cope in terms of speed & reliability? • New innovations, products, technologies will undoubtedly emerge and…. NZTester 16 8
  • 11. www.nztester.co.nz Date Warehousing : If you take over the world, you’re gonna need lawyers! NZTester 17 www.nztester.co.nz Date Warehousing : If you develop & deliver faster, more stable, reliable, flexible & easily accessible intelligence repositories, you’re gonna need testers! NZTester 18 9
  • 12. www.nztester.co.nz Why Test? • Source data is often huge in volume and obtained from varied types of data repositories eg. application databases, spreadsheets, flat files, data feeds etc • Source data quality cannot be assumed and should be profiled and cleaned • Source data may be inconsistent and redundancy present • Source data records may be rejected by ETL procedures and logs will contain error messages that will need addressing • Source field values may be missing where they should be present. • Source data history, business rules and audits of source data may not be available. • Enterprise-wide data knowledge and business rules may not be available to verify data. NZTester 19 www.nztester.co.nz Why Test(2)? • There may be multi-phased ETL procedures and a high level of data variety may exist. • Data sources (eg. mainframe, spreadsheets, databases, flat files) will be updated over time • Transaction-level traceability is difficult to attain during ETL • The data warehouse will be a strategic enterprise resource and heavily replied upon. NZTester 20 10
  • 13. www.nztester.co.nz What to Test? • Data Completeness – all expected data is correctly loaded via ETL procedures • Data Transformation – all data is transformed correctly according to business rules and design specifications • Data Quality – the ETL application correctly rejects, remedies, ignores, substitutes and reports on invalid data • Performance and Scalability - data loads and queries perform within expected time frames and that the technical architecture is scalable. • Integration Testing - the ETL process accommodates all required upstream and downstream processes. • User Acceptance Testing - the end result meets or exceeds business stakeholder and user expectations. • Regression Testing - existing functionality remains intact each time a new release of code is completed. NZTester 21 NZTester 22 www.nztester.co.nz Where to Test? Primary 11
  • 17. www.nztester.co.nz Useful Skills for Testing: • Good understanding of the fundamental concepts of data warehousing and its place in an information management environment. • Understanding the role of the testing process as part of data warehouse development. • Development of data warehouse test strategies, test plans, and test cases what they are and how to develop them, specifically for data warehouse and decision-support systems. • Creating effective test cases and scenarios based on technical and business/user requirements. • Able to participate in reviews of the data models, data mapping documents, ETL design, and ETL coding; provide feedback to designers and developers. NZTester 29 www.nztester.co.nz Useful Skills for Testing(2): • Able to participate in the change management process and documenting relevant changes to decision support requirements. • A good understanding of data modelling and source-to-target data mappings • Skills and experience with SQL, stored procedures, database management and ETL tools • Data profiling experience • Microsoft Excel etc for data analysis • Understanding how data from the data warehouse is used by the business and the business processes it is related to NZTester 30 15
  • 18. www.nztester.co.nz Typical Data Warehouse Issues: • Inadequate ETL and stored procedure design documentation to aid in test planning. • Field values are null when specified as Not Null. • Field constraints and SQL not coded correctly for the ETL tool. • Excessive ETL errors discovered after entry to formal QA - lack of unit testing. • Source data does not meet table mapping specifications (ex. dirty data). • Source-to-target mappings: (1) often not reviewed before implementation, (2) are in error or (3) not consistently maintained throughout the development life cycle. • Data models are not adequately maintained during the development life cycle. NZTester 31 www.nztester.co.nz Typical Data Warehouse Issues(2): • Duplicate field values are found in either source or target data when defined in mapping specifications to be distinct. • ETL SQL/transformation errors leading to missing rows and invalid field values. • Constraint violations exist in source (perhaps could be found through data profiling). • Target data is incorrectly stored in non-standard formats. • Primary or foreign key values are incorrect for important relationship linkages. NZTester 32 16
  • 19. www.nztester.co.nz Transformation rules: • Specify source table elements from all data sources including metadata • Specify Data Warehouse destination table elements: • Dimensions – reference data, keys etc. • Facts – data assets • Specify how the source table elements map onto the destination table elements • Form the basis of unit test cases NZTester 33 www.nztester.co.nz Transformation rules: Source_Database_1 SD1_Table_1 SD1_T1_Attr_1 SD1_T1_Attr_2 SD1_T1_Attr_3 SD1_T1_Attr_4 SD1_Table_2 SD1_T2_Attr_1 SD1_T2_Attr_2 SD1_T2_Attr_3 SD1_T2_Attr_4 Dest_Database_DWH DWH_Dim DD1_T1_Attr_1 DD1_T1_Attr_2 DD1_T1_Attr_3 DWH_Fact DD1_T2_Attr_1 DD1_T2_Attr_2 DD1_T2_Attr_3 Transformation Rules = SD1_T1_Attr_1 = SD1_T1_Attr_2 = SD1_T1_Attr_3 + SD1_T1_Attr_4 = (SD1_T2_Attr_1 * SD1_T2_Attr_3)/52 = SD1_T2_Attr_3 + " " + SD1_T2_Attr_4 = DD1_T1_Attr_3/SD1_T2_Attr_4 NZTester 34 17
  • 20. www.nztester.co.nz Transformation rules: NZTester 35 www.nztester.co.nz From Source to Data Warehouse – Unit Testing: • Know your transformation rules! • Test cases should cover each transformation rule and include positive and negative situations • Row counts: Destination = Source + Rejected • Correctly access all required data including metadata • Cross reference Data Warehouse Dimensions to source tables • All computations are correct especially those based on business rules • Database queries, expected vs actual results NZTester 36 18
  • 21. www.nztester.co.nz From Source to Data Warehouse – Unit Testing: • Rejects are correctly handled and conform to business rules • Slow-changing data eg. address, marital status • Correctness of surrogate keys eg. time zones, currencies in Fact tables • Opportunities for automation • Dual drive: • Source table driven – data ends up in the right place • Destination table driven – contains the right result • Risk-based testing NZTester 37 www.nztester.co.nz From Source to Data Warehouse – Integration Testing: Once all extract, transformation and load unit tests have been successfully executed, need to execute ETL process from end-to-end: • • • • Job sequences and dependencies Errors in one job that impact subsequent jobs Error log generation Restarting the ETL process in case of failure: • Does it have to be started over? • Can it start from where it failed? • Restores required? • Auto/manual? • Impact of failure on subsequent jobs • Processing of rejected records • Reprocessing of already processed records NZTester 38 19
  • 22. www.nztester.co.nz Data Warehouse Testing – Continually Changing Source Systems • Source data quality = garbage in/garbage out • Inherent nature of Data Warehouse is continually updating data and source systems so testing must allow for both • New Source data/schema/application = retesting/regression testing • Data Warehouse systems are always high maintenance • Will always find new issues • Opportunities for automation • Package test suites modularly for ease of repeatability NZTester 39 NZTester 40 www.nztester.co.nz Planning for Data Warehouse Testing • Source data quality = garbage in/garbage out • Business requirements document • Data models for source and target schemas • Source-to-target mappings • ETL design documents Configuration management system • Project schedule • Data quality verification process • Incident and error handling system 20
  • 23. www.nztester.co.nz Planning for Data Warehouse Testing (2) • QA staff resources estimates and training needs • Testing environment budget and plan • Test tools • Test objectives • QA roles and responsibilities • Test deliverables • Test tasks • Defect reporting requirements • Entrance criteria that should be met before formal testing commences • Exit criteria that should be met before formal testing is completed NZTester 41 www.nztester.co.nz Planning Tests for Common Data Warehouse Issues • Inadequate ETL and stored procedure design documentation to aid in test planning. • Field values are null when specified as Not Null. • Field constraints and SQL not coded correctly for the ETL tool. • Excessive ETL errors discovered after entry to formal QA. • Source data does not meet table mapping specifications (ex. dirty data). • Source-to-target mappings: (1) often not reviewed before implementation, (2) are in error or (3) not consistently maintained throughout the development life cycle. • Data models are not adequately maintained during the development life cycle. NZTester 42 21
  • 24. www.nztester.co.nz Planning Tests for Common Data Warehouse Issues (2) • Duplicate field values are found in either source or target data when defined in mapping specifications to be distinct. • ETL SQL/transformation errors leading to missing rows and invalid field values. • Constraint violations exist in source (perhaps could be found through data profiling). • Target data is incorrectly stored in nonstandard formats. • Primary or foreign key values are incorrect for important relationship linkages. NZTester 43 www.nztester.co.nz Some data mapping and data movement best practice goals: • Introduce common, consistent data movement analysis, design, and coding patterns, • Develop reusable, enterprise-wide analysis, design, and construction components through data movement modelling processes using data movement tools, to ensure an acceptable level of data quality per business specifications, • Introduce best practices and consistency in coding and naming standards. • Reduce costs to develop and maintain analysis, design and source code deliverables, and • Integrate controls into the data movement process to ensure data quality and integrity. • An ETL conceptual data movement model should be created as part of the information management strategy. This model is part of the business model and shows what data flows into, within, and out of the organization. NZTester 44 22
  • 25. www.nztester.co.nz Those involved in test planning should consider the following verifications as primary among those planned for various phases of the data warehouse loading project. • Verify data mappings, source to target • Verify that all tables and specified fields were loaded from source to staging • Verify that primary and foreign keys were properly generated using sequence generator or similar • Verify that not-null fields were populated • Verify no data truncation in each field • Verify data types and formats are as specified in design phase • Verify no unexpected duplicate records in target tables. NZTester 45 www.nztester.co.nz Those involved in test planning should consider the following verifications as primary among those planned for various phases of the data warehouse loading project. (2) • Verify transformations based on data table low level design (LLDs—usually text documents describing design direction and specifications) • Verify that numeric fields are populated with correct precision • Verify that each ETL session completed with only planned exceptions • Verify all cleansing, transformation, error and exception handling • Verify stored procedure calculations and data mappings NZTester 46 23
  • 26. www.nztester.co.nz Common QA Tasks for the Data Warehouse Team During the data warehouse testing life cycle, many of the following tasks may be typically be executed by the QA team. It is important to plan for those tasks below that are keys to the project’s success. • Complete test data acquisition and baseline all test data. • Create test environments. • Document test cases. • Create and validate test scripts. • Conduct unit testing and confirm that each component is functioning correctly. • Conduct testing to confirm that each group of components meet specification. NZTester 47 www.nztester.co.nz Common QA Tasks for the Data Warehouse Team (2) • Conduct unit testing and confirm that each component is functioning correctly. • Conduct testing to confirm that each group of components meet specification. • Conduct quality assurance testing to confirm that the solution meets requirements. • Perform load testing, or performance testing, to confirm that the system is operating correctly and can handle the required data volumes and that data can be loaded in the available load window. • Specify and conduct reconciliation tests to manually confirm the validity of data. NZTester 48 24
  • 27. www.nztester.co.nz Common QA Tasks for the Data Warehouse Team (3) • Conduct testing to ensure that the new software does not cause problems with existing software. • Conduct user acceptance testing to ensure that business intelligence reports work as intended. • Carefully manage scope to ensure that perceived defects are actually requirement defects and not something that would be “nice to have, but we forgot to ask.” • Conduct a release test and production readiness test. • Ensure that the on-going defect management and reporting is effective. • Manage testing to ensure that each follows testing procedures and software testing best practices. NZTester 49 www.nztester.co.nz Common QA Tasks for the Data Warehouse Team (4) • Establish standard business terminology and value standards for each subject area. • Develop a business data dictionary that is owned and maintained by a series of business-side data stewards. These individuals should ensure that all terminology is kept current and that any associated rules are documented. • Document the data in your core systems and how it relates to the standard business terminology. This will include data transformation and conversion rules. NZTester 50 25
  • 28. www.nztester.co.nz Common QA Tasks for the Data Warehouse Team (5) • Establish a set of data acceptance criteria and correction methods for your standard business terminology. This should be identified by the business-side data stewards and implemented against each of your core systems (where practical). • Implement a data profiling program as a production process. You should • consider regularly measuring the data quality (and value accuracy) of the data • contained within each of your core operational systems. NZTester 51 www.nztester.co.nz Considerations for Selecting Data Warehouse Testers Members of the QA staff who will plan and execute data warehouse testing should have many of the following skills and experiences. • Over five years of experience in testing and development in the fields of data warehousing, client server technologies, which includes over five years of extensive experience in data warehousing with Informatica, SSIS or other ETL tools. • Strong experience in Informatica or SQL Server, stored procedure and SQL testing. • Expertise in unit and integration testing of the associated ETL or stored procedure code. NZTester 52 26
  • 29. www.nztester.co.nz Considerations for Selecting Data Warehouse Testers (2) • Experience in creating data verification unit and integration test plans and test cases based on technical specifications. • Demonstrated ability to write complex multi-table SQL queries. • Excellent skills with OLAP, ETL, and business intelligence. • Experience with dimensional data modelling using Erwin Modelling star join schema/snowflake modelling, fact and dimensions tables, physical and logical data modelling. • Experience in OLAP reporting tools like Business Objects, SSRS, OBIEE or Cognos. • Expertise in data migration, data profiling, data cleansing. NZTester 53 www.nztester.co.nz Considerations for Selecting Data Warehouse Testers (3) • Hands on experience with source-to-target mapping in enterprise data warehouse environment. Responsible for QA tasks in all phases of the system development life cycle (SDLC), from requirements definition through implementation, on large-scale, mission critical processes; excellent understanding of business requirements development, data analysis, relational database design, systems development methodologies, business/technical liaising, workflow and quality assurance. • Experienced in business analysis, source system data analysis, architectural reviews, data validation, data testing, resolution of data discrepancies and ETL architecture. Good knowledge of QA processes. NZTester 54 27
  • 30. www.nztester.co.nz Considerations for Selecting Data Warehouse Testers (4) • Familiarity with performance tuning of targets databases and sources system. • Extensively worked in both UNIX (AIX/HP/Sun Solaris) and Windows (Windows SQL Server) platforms. • Good knowledge of UNIX Shell Scripting and understanding of PERL scripting. • Experience in Oracle 10g/9i/8i, PL/SQL, SQL, TOAD, Stored Procedures, Functions and Triggers. NZTester 55 www.nztester.co.nz Analyze Source Data before and after Extraction to Staging Process Description: • Extract representative samples of data from each source or staging table. • Parse the data for the purpose of profiling. • Verify that not-null fields are populated as expected. • Structure discovery—Does the data match the corresponding metadata? Do field attributes of the data match expected patterns? Does the data adhere to appropriate uniqueness and null value rules? • Data discovery—Are the data values complete, accurate and unambiguous? • Relationship discovery—Does the data adhere to specified required key relationships across columns and tables? Are there inferred relationships across columns, tables or databases? Is there redundant data? NZTester 56 28
  • 31. www.nztester.co.nz Analyze Source Data before and after Extraction to Staging (2) • Verify that all required data from the source was extracted. Verify that extraction process did not extract more or less data source than it should have. • Verify or write defects for exceptions and errors discovered during the ETL process. • Verify that extraction process did not extract duplicate data from the source (usually this happens in repeatable processes where at point zero we need to extract all data from the source file, but the during the next intervals we only need to capture the modified, and new rows). • Validate that no data truncation occurred during staging. NZTester 57 www.nztester.co.nz Verify Corrected, Cleaned, Source Data in Staging This step works to improve the quality of existing data in source files or defects that meet source specs but must be corrected before load. Inputs: • Files or tables (staging) that require cleansing; data definition and business rule documents, data map of source files and fields; business rules, and data anomalies discovered in earlier steps of this process. • Fixes for data defects that will result in data does not meet specifications for the application DWH.meet source specs but must be corrected before load. NZTester 58 29
  • 32. www.nztester.co.nz Verify Corrected, Cleaned, Source Data in Staging (2) Outputs: Defect reports, cleansed data, rejected or uncorrectable data. Techniques and Tools: Data reengineering, transformation, and cleansing tools, MS Access, Excel filtering. Process Description: In this step, data with missing values, known errors, and suspect data is corrected. Automated tools may be identified to best to locate, clean/correct large volumes of data. NZTester 59 www.nztester.co.nz Verify Corrected, Cleaned, Source Data in Staging (3) • Document the type of data cleansing approach taken for each data type in • the repository. • Determine how uncorrectable or suspect data is processed, rejected, maintained for corrective action. SMEs and stakeholders should be involved in the decision. • Review ETL defect reports to assess rejected data excluded from source files or information group targeted for the warehouse. • Determine if data not meeting quality rules was accepted. • Document in defect reports, records and important fields that cannot be easily corrected. NZTester 60 30
  • 33. www.nztester.co.nz Verify Corrected, Cleaned, Source Data in Staging (4) • Document records that were corrected and how corrected. • Certification Method: Validation of data cleansing processes could be a tricky proposition, but certainly doable. All data cleansing requirements should be clearly identified. The QA team should learn all data cleansing tools available and their methods. QA should create various conditions as specified in the requirements for the data cleansing tool to support and validate its results. QA will run a volume of real data through each tool to validate accuracy as well as performance. NZTester 61 www.nztester.co.nz Verifying Matched and Consolidated Data There are often ETL processes where data has been consolidated from various files into a single occurrence of records. The cleaned and consolidated data can be assessed to very matched and consolidated data. Much of the ETL heavy lifting occurs in the transform step where combined data, data with quality issues, updated data, surrogate keys, and build aggregates are processed. Inputs: Analysis of all files or databases for each entity type. NZTester 62 31
  • 34. www.nztester.co.nz Verifying Matched and Consolidated Data (2) Outputs: • Report of matched, consolidated, related data that is suspect or in error. • List of duplicate data records or fields. • List of duplicate data suspects. Techniques and Tools: Data matching techniques or tools; data cleansing software with matching and merging capabilities. NZTester 63 www.nztester.co.nz Verifying Matched and Consolidated Data (3) Process Description: • Establish match criteria for data. Select attributes to become the basis for possible duplicate occurrences (e.g., names, account numbers). • Determine the impact of incorrectly consolidated records. If the negative impact of consolidating two different occurrences such as different customers into a single customer record exists, submit defect reports. The fix should be higher controls to help avoid such consolidations in the future. • Determine the matching techniques to be used. Exact character match in two corresponding fields such as wild card match, key words, close match, etc. NZTester 64 32
  • 35. www.nztester.co.nz Verifying Matched and Consolidated Data (4) • Compare match criteria for specific record with all other records within a given file to look for intra-file duplicate records. • Compare match criteria for a specific record with all records in another file to seek inter-file duplicate records. • Evaluate potential matched occurrences to assure they are, in fact, duplicate. • Verify that consolidated data into single occurrences is correct. • Examine and re-relate data related to old records being consolidated to new occurrence-of-reference record. Validate that no related data was overlooked. NZTester 65 www.nztester.co.nz Verify Transformed/Enhanced/Calculated Data to Target Tables At this stage, base data is being prepared for loading into the application operational tables and the data mart. This includes converting and formatting cleansed, consolidated data into the new data architecture, and possibly enhancing internal operational data with external data licensed from service providers. The objective is to successfully map the cleaned, corrected and consolidated data into the DWH environment. NZTester 66 33
  • 36. www.nztester.co.nz Verify Transformed/Enhanced/Calculated Data to Target Tables (2) Inputs: Cleansed, consolidated data; external data from service providers; business rules governing the source data; business rules governing the target DWH data; transformation rules governing the transformation process; DWH or target data architecture; data map of source data to standardized data. Output: Transformed, calculated, enhanced data; updated data map of source data to standardized data; data map of source data to target data architecture. NZTester 67 www.nztester.co.nz Verify Transformed/Enhanced/Calculated Data to Target Tables (3) Techniques and Tools: Data transformation software; external or online or public databases. Process Description: • Verify that the data warehouse construction team is using the data map of source data to the DWH standardized data, verify the mapping. • Verify that the data transformation rules and routines are correct. • Verify the data transformations to the DWH and assure that the processes were performed according to specifications. NZTester 68 34
  • 37. www.nztester.co.nz Verify Transformed/Enhanced/Calculated Data to Target Tables (4) • Verify that data loaded in the operational tables and data mart meets the definition of the data architecture including data types, formats, accuracy, etc. • Develop scenarios to be covered in Load Integration Testing. • Count Validation: Record Count Verification DWH backend/Reporting queries against source and target as an initial check. • Dimensional Analysis: Data integrity exists between the various source tables and parent/child relationships. • Statistical Analysis: Validation for various calculations. NZTester 69 www.nztester.co.nz Verify Transformed/Enhanced/Calculated Data to Target Tables (5) • Data Quality Validation: Check for missing data, negatives and consistency. Field-by-field data verification will be done to check the consistency of source and target data. • Granularity: Validate at the lowest granular level possible (lowest in the hierarchy, e.g., Country-City-Sector—start with test cases). • Dynamic Transformation Rules and Tables: Such methods need to be checked continuously to ensure the correct transformation routines are executed. Verify that dynamic mapping tables and dynamic mapping rules provide an easy, documented, and automated way for transforming values from one or more sources into a standard value presented in the DWH. NZTester 70 35
  • 38. www.nztester.co.nz Verify Transformed/Enhanced/Calculated Data to Target Tables (6) • Verification Method: The QA team will identify the detailed requirements as they relate to transformation and validate the dynamic transformation rules and tables against DWH records. Utilizing SQL and related tools, the team will identify unique values in source data files that are subject to transformation. The QA team identifies the results from the transformation process and validate that such transformation have accurately taken place. NZTester 71 NZTester 72 www.nztester.co.nz Questions? 36
  • 39. NZTester Testing the Data Warehouse Geoff Horne, NZTester Magazine ed@nztester.co.nz April 2013 37