SlideShare ist ein Scribd-Unternehmen logo
1 von 2
Downloaden Sie, um offline zu lesen
© 2014 Real-Time Technology Solutions, Inc. All Rights Reserved. 
(212) 240-9050 | info@rttsweb.com 
Common Defects in Big Data & Data Warehouses 
Below is a description of common defects that QuerySurge typically finds in Big Data and Data Warehouse projects. These defects will cause your project and, ultimately, your Business Intelligence and Analytics reports to have bad data. Since C-level executives base their strategic decisions on this data, this could cause a loss of millions of dollars. According to Gartner, bad data costs companies $8.2 million annually. Issue Description Possible Causes Example(s) 
Missing Data 
Data that does not make it into the target database 
- By invalid or incorrect lookup table in the transformation logic 
- Bad data from the source database (Needs cleansing) 
- Invalid joins 
Lookup table should contain a field value of “High” which maps to “Critical”. However, Source data field contains “Hig” - missing the h and fails the lookup, resulting in the target data field containing null. If this occurs on a key field, a possible join would be missed and the entire row could fall out. 
Truncation of Data 
Data being lost by truncation of the data field 
- Invalid field lengths on target database 
- Transformation logic not taking into account field lengths from source 
Source field value “New Mexico City” is being truncated to “New Mexico C” since the source data field did not have the correct length to capture the entire field. 
Data Type Mismatch 
Data types not setup correctly on target database 
Source data field not configured correctly 
Source data field was required to be a date, however, when initially configured, was setup as a VarChar. 
Null Translation 
Null source values not being transformed to correct target values 
Development team did not include the null translation in the transformation logic 
A Source data field for null was supposed to be transformed to ‘None’ in the target data field. However, the logic was not implemented, resulting in the target data field containing null values. 
Wrong Translation 
Opposite of the Null Translation error. Field should be null but is populated with a non- null value or field should be populated but with wrong value 
Development team incorrectly translated the source field for certain values 
Ex. 1) Target field should only be populated when the source field contains certain values, otherwise should be set to null 
Ex. 2) Target field should be “Odd” if the source value is an odd number but target field is “Even” (This is a very basic example.) 
Misplaced Data 
Source data fields not being transformed to the correct target data field 
Development team inadvertently mapped the source data field to the wrong target data field 
A source data field was supposed to be transformed to target data field ‘Last_Name’. However, the development team inadvertently mapped the source data field to ‘First_Name’ 
Extra Records 
Records which should not be in the ETL are included in the ETL 
Development team did not include filter in their code 
If a case has the deleted field populated, the case and any data related to the case should not be in any ETL 
Not Enough Records 
Records which should be in the ETL are not included in the ETL 
Development team had a filter in their code which should not have been there 
If a case was in a certain state, it should be ETL’d over to the data warehouse but not the data mart
© 2014 Real-Time Technology Solutions, Inc. All Rights Reserved. 
(212) 240-9050 | info@rttsweb.com 
Common Defects in Big Data & Data Warehouses 
Transformation Logic Errors/Holes 
Testing sometimes can lead to finding “holes” in the transformation logic or realizing the logic is unclear 
Development team did not take into account special cases. For example international cities that contain special language specific characters might need to be dealt with in the ETL code 
Ex. 1) Most cases fall into a certain branch of logic for a transformation, but a small subset of cases (sometimes with unusual data) may not fall into any branches. How the testers’ and developers’ coding handles these cases could be different (and may both end up being wrong) and the logic is changed to accommodate the cases. 
Ex. 2) Tester and developer have different interpretation of transformation logic, which results in different values. This leads to the logic being re-written to become clearer. 
Simple/Small Errors 
Capitalization, spacing and other small errors 
Development team did not add an additional space after a comma for populating the target field. 
Product names on a case should be separated by a comma and then a space but target field only has it separated by a comma 
Sequence Generator 
Ensuring that the sequence number of reports are in the correct order is very important when processing follow up reports or answering to an audit 
Development team did not configure the sequence generator correctly resulting in records with a duplicate sequence number 
Duplicate records in the sales report were doubling up several sales transactions which skewed the report significantly 
Undocumented Requirements 
Find requirements that are “understood” but are not actually documented anywhere 
Several of the members of the development team did not understand the “understood” undocumented requirements. 
There was a restriction in the “where” clause, limiting how certain reports were brought over. Used in mappings that were understood to be necessary, but were not actually in the requirements. Occasionally, it turns out that the understood requirements are not what the business wanted. 
Duplicate Records 
Duplicate records are two or more records that contain the same data 
Development team did not add the appropriate code to filter out duplicate records 
Duplicate records in the sales report were doubling up several sales transactions which skewed the report significantly 
Numeric Field Precision 
Numbers that are not formatted to the correct decimal point or not rounded per specifications 
Development team rounded the numbers to the wrong decimal point 
The sales data did not contain the correct precision and all sales were being rounded to the whole dollar 
Rejected Rows 
Data rows that get rejected due to data issues 
Development team did not take into account data conditions that break the ETL for a particular row 
Missing data rows on the sales table caused major issues with the end of year sales report 
For more information on QuerySurge or to download a trial, visit www.querysurge.com 
QuerySurge is the collaborative Data Testing solution for Big Data that finds bad data 
and provides a holistic view of your data’s health.

Weitere ähnliche Inhalte

Mehr von RTTS

Automated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsRTTS
 
QuerySurge AI webinar
QuerySurge AI webinarQuerySurge AI webinar
QuerySurge AI webinarRTTS
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023RTTS
 
TestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data TestingTestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data TestingRTTS
 
Creating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentCreating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentRTTS
 
RTTS Postman and API Testing Webinar Slides.pdf
RTTS Postman and API Testing Webinar  Slides.pdfRTTS Postman and API Testing Webinar  Slides.pdf
RTTS Postman and API Testing Webinar Slides.pdfRTTS
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP TestingRTTS
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarRTTS
 
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 Webinar - QuerySurge and Azure DevOps in the Azure Cloud Webinar - QuerySurge and Azure DevOps in the Azure Cloud
Webinar - QuerySurge and Azure DevOps in the Azure CloudRTTS
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyRTTS
 
Implementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectImplementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectRTTS
 
An introduction to QuerySurge webinar
An introduction to QuerySurge webinarAn introduction to QuerySurge webinar
An introduction to QuerySurge webinarRTTS
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryRTTS
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessRTTS
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World DistilledRTTS
 
QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOpsRTTS
 
Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and DatabasesRTTS
 
Case study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriverCase study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriverRTTS
 
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality ConundrumEnterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality ConundrumRTTS
 
Big Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data QualityBig Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data QualityRTTS
 

Mehr von RTTS (20)

Automated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
 
QuerySurge AI webinar
QuerySurge AI webinarQuerySurge AI webinar
QuerySurge AI webinar
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023
 
TestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data TestingTestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data Testing
 
Creating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentCreating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing Assignment
 
RTTS Postman and API Testing Webinar Slides.pdf
RTTS Postman and API Testing Webinar  Slides.pdfRTTS Postman and API Testing Webinar  Slides.pdf
RTTS Postman and API Testing Webinar Slides.pdf
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 Webinar - QuerySurge and Azure DevOps in the Azure Cloud Webinar - QuerySurge and Azure DevOps in the Azure Cloud
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
 
Implementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectImplementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing Project
 
An introduction to QuerySurge webinar
An introduction to QuerySurge webinarAn introduction to QuerySurge webinar
An introduction to QuerySurge webinar
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical Industry
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = Success
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOps
 
Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and Databases
 
Case study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriverCase study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriver
 
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality ConundrumEnterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
 
Big Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data QualityBig Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data Quality
 

Kürzlich hochgeladen

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Kürzlich hochgeladen (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Bad Data - Common Big Data & Data Warehouse Defects

  • 1. © 2014 Real-Time Technology Solutions, Inc. All Rights Reserved. (212) 240-9050 | info@rttsweb.com Common Defects in Big Data & Data Warehouses Below is a description of common defects that QuerySurge typically finds in Big Data and Data Warehouse projects. These defects will cause your project and, ultimately, your Business Intelligence and Analytics reports to have bad data. Since C-level executives base their strategic decisions on this data, this could cause a loss of millions of dollars. According to Gartner, bad data costs companies $8.2 million annually. Issue Description Possible Causes Example(s) Missing Data Data that does not make it into the target database - By invalid or incorrect lookup table in the transformation logic - Bad data from the source database (Needs cleansing) - Invalid joins Lookup table should contain a field value of “High” which maps to “Critical”. However, Source data field contains “Hig” - missing the h and fails the lookup, resulting in the target data field containing null. If this occurs on a key field, a possible join would be missed and the entire row could fall out. Truncation of Data Data being lost by truncation of the data field - Invalid field lengths on target database - Transformation logic not taking into account field lengths from source Source field value “New Mexico City” is being truncated to “New Mexico C” since the source data field did not have the correct length to capture the entire field. Data Type Mismatch Data types not setup correctly on target database Source data field not configured correctly Source data field was required to be a date, however, when initially configured, was setup as a VarChar. Null Translation Null source values not being transformed to correct target values Development team did not include the null translation in the transformation logic A Source data field for null was supposed to be transformed to ‘None’ in the target data field. However, the logic was not implemented, resulting in the target data field containing null values. Wrong Translation Opposite of the Null Translation error. Field should be null but is populated with a non- null value or field should be populated but with wrong value Development team incorrectly translated the source field for certain values Ex. 1) Target field should only be populated when the source field contains certain values, otherwise should be set to null Ex. 2) Target field should be “Odd” if the source value is an odd number but target field is “Even” (This is a very basic example.) Misplaced Data Source data fields not being transformed to the correct target data field Development team inadvertently mapped the source data field to the wrong target data field A source data field was supposed to be transformed to target data field ‘Last_Name’. However, the development team inadvertently mapped the source data field to ‘First_Name’ Extra Records Records which should not be in the ETL are included in the ETL Development team did not include filter in their code If a case has the deleted field populated, the case and any data related to the case should not be in any ETL Not Enough Records Records which should be in the ETL are not included in the ETL Development team had a filter in their code which should not have been there If a case was in a certain state, it should be ETL’d over to the data warehouse but not the data mart
  • 2. © 2014 Real-Time Technology Solutions, Inc. All Rights Reserved. (212) 240-9050 | info@rttsweb.com Common Defects in Big Data & Data Warehouses Transformation Logic Errors/Holes Testing sometimes can lead to finding “holes” in the transformation logic or realizing the logic is unclear Development team did not take into account special cases. For example international cities that contain special language specific characters might need to be dealt with in the ETL code Ex. 1) Most cases fall into a certain branch of logic for a transformation, but a small subset of cases (sometimes with unusual data) may not fall into any branches. How the testers’ and developers’ coding handles these cases could be different (and may both end up being wrong) and the logic is changed to accommodate the cases. Ex. 2) Tester and developer have different interpretation of transformation logic, which results in different values. This leads to the logic being re-written to become clearer. Simple/Small Errors Capitalization, spacing and other small errors Development team did not add an additional space after a comma for populating the target field. Product names on a case should be separated by a comma and then a space but target field only has it separated by a comma Sequence Generator Ensuring that the sequence number of reports are in the correct order is very important when processing follow up reports or answering to an audit Development team did not configure the sequence generator correctly resulting in records with a duplicate sequence number Duplicate records in the sales report were doubling up several sales transactions which skewed the report significantly Undocumented Requirements Find requirements that are “understood” but are not actually documented anywhere Several of the members of the development team did not understand the “understood” undocumented requirements. There was a restriction in the “where” clause, limiting how certain reports were brought over. Used in mappings that were understood to be necessary, but were not actually in the requirements. Occasionally, it turns out that the understood requirements are not what the business wanted. Duplicate Records Duplicate records are two or more records that contain the same data Development team did not add the appropriate code to filter out duplicate records Duplicate records in the sales report were doubling up several sales transactions which skewed the report significantly Numeric Field Precision Numbers that are not formatted to the correct decimal point or not rounded per specifications Development team rounded the numbers to the wrong decimal point The sales data did not contain the correct precision and all sales were being rounded to the whole dollar Rejected Rows Data rows that get rejected due to data issues Development team did not take into account data conditions that break the ETL for a particular row Missing data rows on the sales table caused major issues with the end of year sales report For more information on QuerySurge or to download a trial, visit www.querysurge.com QuerySurge is the collaborative Data Testing solution for Big Data that finds bad data and provides a holistic view of your data’s health.