SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Downloaden Sie, um offline zu lesen
page 1copyright Real-Time Technology Solutions, Inc., updated January, 2015
Enterprise Business Intelligence
& Data Warehousing:
The Data Quality Conundrum
A study by RTTS
For Business and IT Professionals
By Bill Hayduk
page 2copyright Real-Time Technology Solutions, Inc., updated January, 2015
Enterprise BI & Data Warehousing: the Data Quality Conundrum
Poor data quality is a huge and exponentially growing problem. Big Data is causing massive increases in
the volume (amount of data), velocity (speed of data in and out), and variety (range of data types and
sources) of data. Therefore, concern for poor data quality or ‘bad data’ is now a critical issue.
According to Gartner, “the average organization loses $8.2 million annually through poor data quality,
with 22% estimated their annual losses resulting from bad data at $20 million and 4% put that figure as
high as an astounding $100 million”. And InformationWeek found that “46% of companies cite data
quality as a barrier for adopting Business Intelligence products”.
We recently performed a study that included responses from over 200 companies interested in
improving the data quality in their Business Intelligence, ETL software and Enterprise Data Warehouse
implementations. Below are our findings, along with context around these results.
SECTION I.
Our first section polled customers on their architecture: specifically on their data warehouse, ETL and
business intelligence software and vendors.
Enterprise Data Warehouse Software
The top data warehouse vendors are
Oracle (plus MySQL and Exadata) as
number one and Teradata (and Aster Data)
as number two and every other one far
behind. This is in sync with research by
most analyst firms that track this platform.
Oracle and Teradata dominate the space
with both their loyal customer base and
their innovation.
Analyst firm Gartner, in its 2013 Magic
Quadrant for Data Warehouse Database
Management Systems report, projected a
10% growth in the database management
system market and it pinpointed a
significant increase in organizations
seeking to deploy data warehouses for the
first time.
page 3copyright Real-Time Technology Solutions, Inc., updated January, 2015
Business Intelligence Software
IBM leads the BI space with Cognos,
followed by the surprising performance of
Microsoft at second and Oracle’s combined
offering at 3rd. Unexpectedly, ‘other
vendors’ were chosen 18% of the time,
meaning there is still room for growth in the
BI space.
Analyst firm IDC stated that the market is
now forecast to continue to grow at a 9.8%
compound annual growth rate through
2016. One key observation they made was
“the media attention on Big Data has put
broader business analytics on the agenda of
more senior executives.”
ETL (Extract/Transform/Load) Software
Here we see another surprising result of our
survey. Microsoft finished first in the ETL
Vendor software survey. Informatica
PowerCenter, the most widely known vendor
finished second, closely followed by IBM’s
combined offering at third. It is interesting to
note that ‘other’ came in first above
Microsoft. There is still a large contingent of
companies using home-grown and open
source software in the marketplace.
According to analyst firm Forrester, “The
enterprise ETL market continues to grow at a
healthy pace as more enterprises replace
manual scripts with packaged ETL solutions.
This migration…toward ETL tools is driven by
the need to support growing and increasingly
complex data management requirements.”
page 4copyright Real-Time Technology Solutions, Inc., updated January, 2015
Current Data Warehouse Size
We inquired as to the current size of the data warehouse implementations. We discovered that 91%
were less than 100 Terabytes and 52% were less than 500 Gigabytes. Interestingly, the largest sector
(33%) of implementations was between 1 Terabyte and 100 Terabytes. This is a significant increase in
data size when the largest sector in our poll 2 years ago was measured in Gigabytes.
SECTION II.
In Section II, we analyzed the current quality situation of firms to determine their effectiveness.
page 5copyright Real-Time Technology Solutions, Inc., updated January, 2015
Current Testing Strategy
We inquired as to which test strategy was
being implemented: (1) testing across ETL legs
(source-to-target DWH, DWH to data mart, etc.), (2)
utilizing Minus Queries (see full definition of minus
queries here http://bit.ly/13lmp8N), and/or (3)
comparing row counts from source-to-target.
The preferred strategy is to utilize a
combination of the three, depending upon the
role (tester, ETL developer, operations). But if
one were going to choose a single strategy to
check for data quality, it would be (1). We
found that 30% of companies polled only verify
row counts and only 7% implemented all 3 as
their preferred strategy. Many singled out a
lack of automated testing and a lack of testing
resources as reasons they did not deploy a
more rigorous testing strategy.
Current Test Execution Method
When surveying customers on their current
test execution method, we found that 60% of
testing is currently performed manually.
Manual testing consists of extracting data from
the source databases, files and XML and also
extracting data from the data warehouse after
it goes through the ETL process and then
comparing these data sets manually, by eye.
This is quite extraordinary when considering
that the average data warehouse is measured
in gigabytes and typical tests return millions of
rows and upwards of hundreds of columns,
meaning millions of sets of data to compare.
Therefore, testers can only sample the
comparisons for practical purposes. Vendor
tools come in second and a home-grown
finished third.
Data Quality and Testing Challenges
When customers were asked about their biggest challenges when it came to testing the data for
accuracy, the clear top choice was ‘No Automation’. This goes hand-in-hand with the 2nd biggest
challenge – that testing manually is very time consuming. This was followed by the lack of a test
management process and/or tool, not enough data coverage and no reporting on the testing effort.
page 6copyright Real-Time Technology Solutions, Inc., updated January, 2015
Percent of data coverage by current test process
When determining the amount of
data coverage that companies’
current test process provides, it is
clear that there is not much data
coverage at all. Of those companies
surveyed, 84% had less than 50
percent data coverage, 58% had less
than 25 percent coverage, 33% had
less than 5 percent and 29% of
companies had less than 1 percent.
The 14% who had 100 percent
coverage had data warehouse
implementations less than 500GB in
size and were interested in finding a
data warehouse testing tool that could speed up the testing cycle. Also, the coverage represents the
amount of data brought back by SQL queries, not the amount compared. Since comparisons (as noted
above) are typically performed by visually reconciling the 2 data sets, it is impractical for more than 5-
10% of the data to be compared.
Ratio of Developers to Testers
Average ratio: 2.1 to 1
Median ratio: 1.7 to 1
Highest ratio: 20 to 1
Lowest ratio: 2 to 3
We found that, for the most part, the ratio of developer to tester was standard in principle. The issue
that most firms surveyed stated was that they could not get enough data coverage. The reason: ETL
developers are utilizing some form of tool to make their jobs faster and more efficient while most
testers are not.
page 7copyright Real-Time Technology Solutions, Inc., updated January, 2015
Effects of Bad Data
We surveyed customers on the impact of bad data on their organizations. Of the ones who answered,
100 percent said that they experienced some form of bad data in their data warehouses. Below are
samplings of their free-form answers on the effects that bad data caused them.
 Incorrect business intelligence reports
 Poor delivery quality & customer dissatisfaction resulting in re-work
 Missing revenue opportunities
 Critical business decisions rely on underlying bad data
 Bad quality of projects
 Negative business Impact
 SLA Issue with our customers
 Major embarrassments to our team
 Long working hours to fix bad data
Conclusion
Many companies are using Business Intelligence (BI) to make strategic decisions in the hope of gaining a
competitive advantage in a tough business landscape. But Bad Data will cause them to make decisions that
will cost their firms millions of dollars. It is clear from the results of this survey that companies are not
providing the level of data quality that C-level executives need to make a strategic decisions. Most firms
test far less than 10% of their data by sampling the data and the comparisons. Therefore, at least 90% of
data remains untested. Since bad data exists in all databases, firms need to test closer to 100% of their data
and guarantee that this critical information is accurate.
There is no practical way for testers to verify this level of coverage without the use of automated testing
tools. An automated testing solution will speed up the process, provide much more data coverage,
perform comparisons automated, and provide reports for audit trails. It is clear that as data grows
exponentially, a more complete solution is needed to keep enterprise-level data clean.
page 8copyright Real-Time Technology Solutions, Inc., updated January, 2015
About The Author
Bill Hayduk
Founder, CEO, President
Bill founded software and services firm RTTS in 1996. Under Bill's
guidance, RTTS has supported over 600 projects at over 400
corporations, from Fortune 500 to midsize firms. Bill is also the
business leader on QuerySurge, RTTS’ industry-leading data warehouse
testing tool.
Bill holds an MS in Computer Information Systems from the Zicklin
School of Business (Baruch College) and a BA in Economics from
Villanova University.
References
 Gartner: Magic Quadrant for Data Warehouse Database Management Systems (January 31, 2013)
 IDC: Worldwide Business Analytics Software 2012-2016 Forecast and 2011 Vendor Shares (June 2012)
 The Forrester Wave™: Enterprise ETL, Q1 2012 (February 27, 2012)
 InformationWeek: 2012 BI and Information Management Trends (November 2011)
 RTTS 2013 Client Survey on BI and Data Warehouse Quality

Weitere ähnliche Inhalte

Was ist angesagt?

Survey Results Age Of Unbounded Data June 03 10
Survey Results Age Of Unbounded Data June 03 10Survey Results Age Of Unbounded Data June 03 10
Survey Results Age Of Unbounded Data June 03 10nhaque
 
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...Cognizant
 
Analytics in-action-survey
Analytics in-action-surveyAnalytics in-action-survey
Analytics in-action-surveyAnjan Das
 
"Big data in western europe today" Forrester / Xerox 2015
"Big data in western europe today" Forrester / Xerox 2015"Big data in western europe today" Forrester / Xerox 2015
"Big data in western europe today" Forrester / Xerox 2015yann le gigan
 
Datacenter industry survey 2015
Datacenter industry survey 2015Datacenter industry survey 2015
Datacenter industry survey 2015sher lion
 
Analytics solution
Analytics solutionAnalytics solution
Analytics solutioncamssguide
 
NI Automated Test Outlook 2016
NI Automated Test Outlook 2016NI Automated Test Outlook 2016
NI Automated Test Outlook 2016Hank Lydick
 
Utilities Can Ramp Up CX with a Customer Data Platform
Utilities Can Ramp Up CX with a Customer Data PlatformUtilities Can Ramp Up CX with a Customer Data Platform
Utilities Can Ramp Up CX with a Customer Data PlatformCognizant
 
2019 Data Trends Survey Results
2019 Data Trends Survey Results2019 Data Trends Survey Results
2019 Data Trends Survey ResultsPrecisely
 
Augmented Data Management
Augmented Data ManagementAugmented Data Management
Augmented Data ManagementFORMCEPT
 
Big Data Alchemy: How can Banks Maximize the Value of their Customer Data?
Big Data Alchemy: How can Banks Maximize the Value of their Customer Data?Big Data Alchemy: How can Banks Maximize the Value of their Customer Data?
Big Data Alchemy: How can Banks Maximize the Value of their Customer Data?Capgemini
 
Accenture - GE Industrial Internet Changing Competitive Landscape Industries ...
Accenture - GE Industrial Internet Changing Competitive Landscape Industries ...Accenture - GE Industrial Internet Changing Competitive Landscape Industries ...
Accenture - GE Industrial Internet Changing Competitive Landscape Industries ...polenumerique33
 
An Analysis of Big Data Computing for Efficiency of Business Operations Among...
An Analysis of Big Data Computing for Efficiency of Business Operations Among...An Analysis of Big Data Computing for Efficiency of Business Operations Among...
An Analysis of Big Data Computing for Efficiency of Business Operations Among...AnthonyOtuonye
 
To Become a Data-Driven Enterprise, Data Democratization is Essential
To Become a Data-Driven Enterprise, Data Democratization is EssentialTo Become a Data-Driven Enterprise, Data Democratization is Essential
To Become a Data-Driven Enterprise, Data Democratization is EssentialCognizant
 
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...Cognizant
 
Impact of Data Analytics in Changing the Future of Business and Challenges Fa...
Impact of Data Analytics in Changing the Future of Business and Challenges Fa...Impact of Data Analytics in Changing the Future of Business and Challenges Fa...
Impact of Data Analytics in Changing the Future of Business and Challenges Fa...IJSRP Journal
 

Was ist angesagt? (19)

Survey Results Age Of Unbounded Data June 03 10
Survey Results Age Of Unbounded Data June 03 10Survey Results Age Of Unbounded Data June 03 10
Survey Results Age Of Unbounded Data June 03 10
 
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
 
Analytics in-action-survey
Analytics in-action-surveyAnalytics in-action-survey
Analytics in-action-survey
 
"Big data in western europe today" Forrester / Xerox 2015
"Big data in western europe today" Forrester / Xerox 2015"Big data in western europe today" Forrester / Xerox 2015
"Big data in western europe today" Forrester / Xerox 2015
 
Ibm business trends
Ibm business trendsIbm business trends
Ibm business trends
 
Datacenter industry survey 2015
Datacenter industry survey 2015Datacenter industry survey 2015
Datacenter industry survey 2015
 
Analytics solution
Analytics solutionAnalytics solution
Analytics solution
 
NI Automated Test Outlook 2016
NI Automated Test Outlook 2016NI Automated Test Outlook 2016
NI Automated Test Outlook 2016
 
Utilities Can Ramp Up CX with a Customer Data Platform
Utilities Can Ramp Up CX with a Customer Data PlatformUtilities Can Ramp Up CX with a Customer Data Platform
Utilities Can Ramp Up CX with a Customer Data Platform
 
2019 Data Trends Survey Results
2019 Data Trends Survey Results2019 Data Trends Survey Results
2019 Data Trends Survey Results
 
Augmented Data Management
Augmented Data ManagementAugmented Data Management
Augmented Data Management
 
Big Data Alchemy: How can Banks Maximize the Value of their Customer Data?
Big Data Alchemy: How can Banks Maximize the Value of their Customer Data?Big Data Alchemy: How can Banks Maximize the Value of their Customer Data?
Big Data Alchemy: How can Banks Maximize the Value of their Customer Data?
 
Accenture - GE Industrial Internet Changing Competitive Landscape Industries ...
Accenture - GE Industrial Internet Changing Competitive Landscape Industries ...Accenture - GE Industrial Internet Changing Competitive Landscape Industries ...
Accenture - GE Industrial Internet Changing Competitive Landscape Industries ...
 
Innovating with analytics
Innovating with analyticsInnovating with analytics
Innovating with analytics
 
An Analysis of Big Data Computing for Efficiency of Business Operations Among...
An Analysis of Big Data Computing for Efficiency of Business Operations Among...An Analysis of Big Data Computing for Efficiency of Business Operations Among...
An Analysis of Big Data Computing for Efficiency of Business Operations Among...
 
To Become a Data-Driven Enterprise, Data Democratization is Essential
To Become a Data-Driven Enterprise, Data Democratization is EssentialTo Become a Data-Driven Enterprise, Data Democratization is Essential
To Become a Data-Driven Enterprise, Data Democratization is Essential
 
Unlocking big data
Unlocking big dataUnlocking big data
Unlocking big data
 
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
 
Impact of Data Analytics in Changing the Future of Business and Challenges Fa...
Impact of Data Analytics in Changing the Future of Business and Challenges Fa...Impact of Data Analytics in Changing the Future of Business and Challenges Fa...
Impact of Data Analytics in Changing the Future of Business and Challenges Fa...
 

Ähnlich wie Data Quality Conundrum: Automated Testing Needed

Going Big : Why Companies Need to Focus on Operational Analytics
Going Big : Why Companies Need to Focus on Operational Analytics Going Big : Why Companies Need to Focus on Operational Analytics
Going Big : Why Companies Need to Focus on Operational Analytics Capgemini
 
Big data-analytics-2013-peer-research-report
Big data-analytics-2013-peer-research-reportBig data-analytics-2013-peer-research-report
Big data-analytics-2013-peer-research-reportAravindharamanan S
 
Leveraging Automated Data Validation to Reduce Software Development Timeline...
Leveraging Automated Data Validation  to Reduce Software Development Timeline...Leveraging Automated Data Validation  to Reduce Software Development Timeline...
Leveraging Automated Data Validation to Reduce Software Development Timeline...Cognizant
 
Big Data Management: Work Smarter Not Harder
Big Data Management: Work Smarter Not HarderBig Data Management: Work Smarter Not Harder
Big Data Management: Work Smarter Not HarderJennifer Walker
 
Big Data and Analytics: The New Underpinning for Supply Chain Success? - 17 F...
Big Data and Analytics: The New Underpinning for Supply Chain Success? - 17 F...Big Data and Analytics: The New Underpinning for Supply Chain Success? - 17 F...
Big Data and Analytics: The New Underpinning for Supply Chain Success? - 17 F...Lora Cecere
 
Building an Effective Data Management Strategy
Building an Effective Data Management StrategyBuilding an Effective Data Management Strategy
Building an Effective Data Management StrategyHarley Capewell
 
Amcis2015 erf strategic_informationsystemsevaluationtrack_improvingenterprise...
Amcis2015 erf strategic_informationsystemsevaluationtrack_improvingenterprise...Amcis2015 erf strategic_informationsystemsevaluationtrack_improvingenterprise...
Amcis2015 erf strategic_informationsystemsevaluationtrack_improvingenterprise...Stephan ZODER
 
Amcis2015 erf strategic_informationsystemsevaluationtrack_improvingenterprise...
Amcis2015 erf strategic_informationsystemsevaluationtrack_improvingenterprise...Amcis2015 erf strategic_informationsystemsevaluationtrack_improvingenterprise...
Amcis2015 erf strategic_informationsystemsevaluationtrack_improvingenterprise...Stephan ZODER
 
Modernizing Data Quality & Governance: Unlocking Performance & Reducing Risk
Modernizing Data Quality & Governance: Unlocking Performance & Reducing RiskModernizing Data Quality & Governance: Unlocking Performance & Reducing Risk
Modernizing Data Quality & Governance: Unlocking Performance & Reducing RiskWorldwide Business Research
 
The Digital Procurement Era
The Digital Procurement EraThe Digital Procurement Era
The Digital Procurement EraTejari
 
AI-Led-Cognitive-Data-Quality.pdf
AI-Led-Cognitive-Data-Quality.pdfAI-Led-Cognitive-Data-Quality.pdf
AI-Led-Cognitive-Data-Quality.pdfarifulislam946965
 
Infographic Things You Should Know About Big Data Testing
Infographic Things You Should Know About Big Data TestingInfographic Things You Should Know About Big Data Testing
Infographic Things You Should Know About Big Data TestingKiwiQA
 
Infographic | Quality of Data & Cost of Bad Data | Sapience Analytics
Infographic | Quality of Data & Cost of Bad Data | Sapience AnalyticsInfographic | Quality of Data & Cost of Bad Data | Sapience Analytics
Infographic | Quality of Data & Cost of Bad Data | Sapience AnalyticsSapience Analytics
 
Data Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdfData Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdfData Science Council of America
 
Move It Don't Lose It: Is Your Big Data Collecting Dust?
Move It Don't Lose It: Is Your Big Data Collecting Dust?Move It Don't Lose It: Is Your Big Data Collecting Dust?
Move It Don't Lose It: Is Your Big Data Collecting Dust?Jennifer Walker
 
Data Trends for 2019: Extracting Value from Data
Data Trends for 2019: Extracting Value from DataData Trends for 2019: Extracting Value from Data
Data Trends for 2019: Extracting Value from DataPrecisely
 
Activating Big Data: The Key To Success with Machine Learning Advanced Analyt...
Activating Big Data: The Key To Success with Machine Learning Advanced Analyt...Activating Big Data: The Key To Success with Machine Learning Advanced Analyt...
Activating Big Data: The Key To Success with Machine Learning Advanced Analyt...Vasu S
 

Ähnlich wie Data Quality Conundrum: Automated Testing Needed (20)

Going Big : Why Companies Need to Focus on Operational Analytics
Going Big : Why Companies Need to Focus on Operational Analytics Going Big : Why Companies Need to Focus on Operational Analytics
Going Big : Why Companies Need to Focus on Operational Analytics
 
Big data-analytics-2013-peer-research-report
Big data-analytics-2013-peer-research-reportBig data-analytics-2013-peer-research-report
Big data-analytics-2013-peer-research-report
 
Leveraging Automated Data Validation to Reduce Software Development Timeline...
Leveraging Automated Data Validation  to Reduce Software Development Timeline...Leveraging Automated Data Validation  to Reduce Software Development Timeline...
Leveraging Automated Data Validation to Reduce Software Development Timeline...
 
Big Data Management: Work Smarter Not Harder
Big Data Management: Work Smarter Not HarderBig Data Management: Work Smarter Not Harder
Big Data Management: Work Smarter Not Harder
 
Integrated_Insights_Platform_web
Integrated_Insights_Platform_webIntegrated_Insights_Platform_web
Integrated_Insights_Platform_web
 
Big Data and Analytics: The New Underpinning for Supply Chain Success? - 17 F...
Big Data and Analytics: The New Underpinning for Supply Chain Success? - 17 F...Big Data and Analytics: The New Underpinning for Supply Chain Success? - 17 F...
Big Data and Analytics: The New Underpinning for Supply Chain Success? - 17 F...
 
Building an Effective Data Management Strategy
Building an Effective Data Management StrategyBuilding an Effective Data Management Strategy
Building an Effective Data Management Strategy
 
Amcis2015 erf strategic_informationsystemsevaluationtrack_improvingenterprise...
Amcis2015 erf strategic_informationsystemsevaluationtrack_improvingenterprise...Amcis2015 erf strategic_informationsystemsevaluationtrack_improvingenterprise...
Amcis2015 erf strategic_informationsystemsevaluationtrack_improvingenterprise...
 
Amcis2015 erf strategic_informationsystemsevaluationtrack_improvingenterprise...
Amcis2015 erf strategic_informationsystemsevaluationtrack_improvingenterprise...Amcis2015 erf strategic_informationsystemsevaluationtrack_improvingenterprise...
Amcis2015 erf strategic_informationsystemsevaluationtrack_improvingenterprise...
 
Modernizing Data Quality & Governance: Unlocking Performance & Reducing Risk
Modernizing Data Quality & Governance: Unlocking Performance & Reducing RiskModernizing Data Quality & Governance: Unlocking Performance & Reducing Risk
Modernizing Data Quality & Governance: Unlocking Performance & Reducing Risk
 
The Digital Procurement Era
The Digital Procurement EraThe Digital Procurement Era
The Digital Procurement Era
 
dq_fail.pdf
dq_fail.pdfdq_fail.pdf
dq_fail.pdf
 
AI-Led-Cognitive-Data-Quality.pdf
AI-Led-Cognitive-Data-Quality.pdfAI-Led-Cognitive-Data-Quality.pdf
AI-Led-Cognitive-Data-Quality.pdf
 
Infographic Things You Should Know About Big Data Testing
Infographic Things You Should Know About Big Data TestingInfographic Things You Should Know About Big Data Testing
Infographic Things You Should Know About Big Data Testing
 
Infographic | Quality of Data & Cost of Bad Data | Sapience Analytics
Infographic | Quality of Data & Cost of Bad Data | Sapience AnalyticsInfographic | Quality of Data & Cost of Bad Data | Sapience Analytics
Infographic | Quality of Data & Cost of Bad Data | Sapience Analytics
 
The Face of the New Enterprise
The Face of the New EnterpriseThe Face of the New Enterprise
The Face of the New Enterprise
 
Data Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdfData Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdf
 
Move It Don't Lose It: Is Your Big Data Collecting Dust?
Move It Don't Lose It: Is Your Big Data Collecting Dust?Move It Don't Lose It: Is Your Big Data Collecting Dust?
Move It Don't Lose It: Is Your Big Data Collecting Dust?
 
Data Trends for 2019: Extracting Value from Data
Data Trends for 2019: Extracting Value from DataData Trends for 2019: Extracting Value from Data
Data Trends for 2019: Extracting Value from Data
 
Activating Big Data: The Key To Success with Machine Learning Advanced Analyt...
Activating Big Data: The Key To Success with Machine Learning Advanced Analyt...Activating Big Data: The Key To Success with Machine Learning Advanced Analyt...
Activating Big Data: The Key To Success with Machine Learning Advanced Analyt...
 

Mehr von RTTS

Automated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsRTTS
 
QuerySurge AI webinar
QuerySurge AI webinarQuerySurge AI webinar
QuerySurge AI webinarRTTS
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023RTTS
 
TestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data TestingTestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data TestingRTTS
 
Creating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentCreating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentRTTS
 
RTTS Postman and API Testing Webinar Slides.pdf
RTTS Postman and API Testing Webinar  Slides.pdfRTTS Postman and API Testing Webinar  Slides.pdf
RTTS Postman and API Testing Webinar Slides.pdfRTTS
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP TestingRTTS
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarRTTS
 
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 Webinar - QuerySurge and Azure DevOps in the Azure Cloud Webinar - QuerySurge and Azure DevOps in the Azure Cloud
Webinar - QuerySurge and Azure DevOps in the Azure CloudRTTS
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyRTTS
 
Implementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectImplementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectRTTS
 
An introduction to QuerySurge webinar
An introduction to QuerySurge webinarAn introduction to QuerySurge webinar
An introduction to QuerySurge webinarRTTS
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryRTTS
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessRTTS
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World DistilledRTTS
 
QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOpsRTTS
 
Leveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaLeveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaRTTS
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...RTTS
 
Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and DatabasesRTTS
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingRTTS
 

Mehr von RTTS (20)

Automated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
 
QuerySurge AI webinar
QuerySurge AI webinarQuerySurge AI webinar
QuerySurge AI webinar
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023
 
TestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data TestingTestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data Testing
 
Creating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentCreating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing Assignment
 
RTTS Postman and API Testing Webinar Slides.pdf
RTTS Postman and API Testing Webinar  Slides.pdfRTTS Postman and API Testing Webinar  Slides.pdf
RTTS Postman and API Testing Webinar Slides.pdf
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 Webinar - QuerySurge and Azure DevOps in the Azure Cloud Webinar - QuerySurge and Azure DevOps in the Azure Cloud
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
 
Implementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectImplementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing Project
 
An introduction to QuerySurge webinar
An introduction to QuerySurge webinarAn introduction to QuerySurge webinar
An introduction to QuerySurge webinar
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical Industry
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = Success
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOps
 
Leveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaLeveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE Vertica
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
 
Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and Databases
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programming
 

Kürzlich hochgeladen

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 

Kürzlich hochgeladen (20)

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 

Data Quality Conundrum: Automated Testing Needed

  • 1. page 1copyright Real-Time Technology Solutions, Inc., updated January, 2015 Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum A study by RTTS For Business and IT Professionals By Bill Hayduk
  • 2. page 2copyright Real-Time Technology Solutions, Inc., updated January, 2015 Enterprise BI & Data Warehousing: the Data Quality Conundrum Poor data quality is a huge and exponentially growing problem. Big Data is causing massive increases in the volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sources) of data. Therefore, concern for poor data quality or ‘bad data’ is now a critical issue. According to Gartner, “the average organization loses $8.2 million annually through poor data quality, with 22% estimated their annual losses resulting from bad data at $20 million and 4% put that figure as high as an astounding $100 million”. And InformationWeek found that “46% of companies cite data quality as a barrier for adopting Business Intelligence products”. We recently performed a study that included responses from over 200 companies interested in improving the data quality in their Business Intelligence, ETL software and Enterprise Data Warehouse implementations. Below are our findings, along with context around these results. SECTION I. Our first section polled customers on their architecture: specifically on their data warehouse, ETL and business intelligence software and vendors. Enterprise Data Warehouse Software The top data warehouse vendors are Oracle (plus MySQL and Exadata) as number one and Teradata (and Aster Data) as number two and every other one far behind. This is in sync with research by most analyst firms that track this platform. Oracle and Teradata dominate the space with both their loyal customer base and their innovation. Analyst firm Gartner, in its 2013 Magic Quadrant for Data Warehouse Database Management Systems report, projected a 10% growth in the database management system market and it pinpointed a significant increase in organizations seeking to deploy data warehouses for the first time.
  • 3. page 3copyright Real-Time Technology Solutions, Inc., updated January, 2015 Business Intelligence Software IBM leads the BI space with Cognos, followed by the surprising performance of Microsoft at second and Oracle’s combined offering at 3rd. Unexpectedly, ‘other vendors’ were chosen 18% of the time, meaning there is still room for growth in the BI space. Analyst firm IDC stated that the market is now forecast to continue to grow at a 9.8% compound annual growth rate through 2016. One key observation they made was “the media attention on Big Data has put broader business analytics on the agenda of more senior executives.” ETL (Extract/Transform/Load) Software Here we see another surprising result of our survey. Microsoft finished first in the ETL Vendor software survey. Informatica PowerCenter, the most widely known vendor finished second, closely followed by IBM’s combined offering at third. It is interesting to note that ‘other’ came in first above Microsoft. There is still a large contingent of companies using home-grown and open source software in the marketplace. According to analyst firm Forrester, “The enterprise ETL market continues to grow at a healthy pace as more enterprises replace manual scripts with packaged ETL solutions. This migration…toward ETL tools is driven by the need to support growing and increasingly complex data management requirements.”
  • 4. page 4copyright Real-Time Technology Solutions, Inc., updated January, 2015 Current Data Warehouse Size We inquired as to the current size of the data warehouse implementations. We discovered that 91% were less than 100 Terabytes and 52% were less than 500 Gigabytes. Interestingly, the largest sector (33%) of implementations was between 1 Terabyte and 100 Terabytes. This is a significant increase in data size when the largest sector in our poll 2 years ago was measured in Gigabytes. SECTION II. In Section II, we analyzed the current quality situation of firms to determine their effectiveness.
  • 5. page 5copyright Real-Time Technology Solutions, Inc., updated January, 2015 Current Testing Strategy We inquired as to which test strategy was being implemented: (1) testing across ETL legs (source-to-target DWH, DWH to data mart, etc.), (2) utilizing Minus Queries (see full definition of minus queries here http://bit.ly/13lmp8N), and/or (3) comparing row counts from source-to-target. The preferred strategy is to utilize a combination of the three, depending upon the role (tester, ETL developer, operations). But if one were going to choose a single strategy to check for data quality, it would be (1). We found that 30% of companies polled only verify row counts and only 7% implemented all 3 as their preferred strategy. Many singled out a lack of automated testing and a lack of testing resources as reasons they did not deploy a more rigorous testing strategy. Current Test Execution Method When surveying customers on their current test execution method, we found that 60% of testing is currently performed manually. Manual testing consists of extracting data from the source databases, files and XML and also extracting data from the data warehouse after it goes through the ETL process and then comparing these data sets manually, by eye. This is quite extraordinary when considering that the average data warehouse is measured in gigabytes and typical tests return millions of rows and upwards of hundreds of columns, meaning millions of sets of data to compare. Therefore, testers can only sample the comparisons for practical purposes. Vendor tools come in second and a home-grown finished third. Data Quality and Testing Challenges When customers were asked about their biggest challenges when it came to testing the data for accuracy, the clear top choice was ‘No Automation’. This goes hand-in-hand with the 2nd biggest challenge – that testing manually is very time consuming. This was followed by the lack of a test management process and/or tool, not enough data coverage and no reporting on the testing effort.
  • 6. page 6copyright Real-Time Technology Solutions, Inc., updated January, 2015 Percent of data coverage by current test process When determining the amount of data coverage that companies’ current test process provides, it is clear that there is not much data coverage at all. Of those companies surveyed, 84% had less than 50 percent data coverage, 58% had less than 25 percent coverage, 33% had less than 5 percent and 29% of companies had less than 1 percent. The 14% who had 100 percent coverage had data warehouse implementations less than 500GB in size and were interested in finding a data warehouse testing tool that could speed up the testing cycle. Also, the coverage represents the amount of data brought back by SQL queries, not the amount compared. Since comparisons (as noted above) are typically performed by visually reconciling the 2 data sets, it is impractical for more than 5- 10% of the data to be compared. Ratio of Developers to Testers Average ratio: 2.1 to 1 Median ratio: 1.7 to 1 Highest ratio: 20 to 1 Lowest ratio: 2 to 3 We found that, for the most part, the ratio of developer to tester was standard in principle. The issue that most firms surveyed stated was that they could not get enough data coverage. The reason: ETL developers are utilizing some form of tool to make their jobs faster and more efficient while most testers are not.
  • 7. page 7copyright Real-Time Technology Solutions, Inc., updated January, 2015 Effects of Bad Data We surveyed customers on the impact of bad data on their organizations. Of the ones who answered, 100 percent said that they experienced some form of bad data in their data warehouses. Below are samplings of their free-form answers on the effects that bad data caused them.  Incorrect business intelligence reports  Poor delivery quality & customer dissatisfaction resulting in re-work  Missing revenue opportunities  Critical business decisions rely on underlying bad data  Bad quality of projects  Negative business Impact  SLA Issue with our customers  Major embarrassments to our team  Long working hours to fix bad data Conclusion Many companies are using Business Intelligence (BI) to make strategic decisions in the hope of gaining a competitive advantage in a tough business landscape. But Bad Data will cause them to make decisions that will cost their firms millions of dollars. It is clear from the results of this survey that companies are not providing the level of data quality that C-level executives need to make a strategic decisions. Most firms test far less than 10% of their data by sampling the data and the comparisons. Therefore, at least 90% of data remains untested. Since bad data exists in all databases, firms need to test closer to 100% of their data and guarantee that this critical information is accurate. There is no practical way for testers to verify this level of coverage without the use of automated testing tools. An automated testing solution will speed up the process, provide much more data coverage, perform comparisons automated, and provide reports for audit trails. It is clear that as data grows exponentially, a more complete solution is needed to keep enterprise-level data clean.
  • 8. page 8copyright Real-Time Technology Solutions, Inc., updated January, 2015 About The Author Bill Hayduk Founder, CEO, President Bill founded software and services firm RTTS in 1996. Under Bill's guidance, RTTS has supported over 600 projects at over 400 corporations, from Fortune 500 to midsize firms. Bill is also the business leader on QuerySurge, RTTS’ industry-leading data warehouse testing tool. Bill holds an MS in Computer Information Systems from the Zicklin School of Business (Baruch College) and a BA in Economics from Villanova University. References  Gartner: Magic Quadrant for Data Warehouse Database Management Systems (January 31, 2013)  IDC: Worldwide Business Analytics Software 2012-2016 Forecast and 2011 Vendor Shares (June 2012)  The Forrester Wave™: Enterprise ETL, Q1 2012 (February 27, 2012)  InformationWeek: 2012 BI and Information Management Trends (November 2011)  RTTS 2013 Client Survey on BI and Data Warehouse Quality