SlideShare a Scribd company logo
1 of 19
How to Test Big Data
Systems
The
definition
of Big
Data
| Big Data is perceived as a huge amount of data and information
| However, it is a lot more than this
| Big Data may be said to be a whole set of approaches, tools and
methods of processing large volumes of unstructured as well as
structured data
| Big Data is defined on three parameters
| These describe how you have to process an enormous amount of
data in different formats at different rates
2
The three parameters on which Big
Data is defined
3
Testing Big Data can be quite a
challenge for organizations
| Traditional analysis techniques have certain limitations
| Dealing with such large sets of data owes to its complexity
| Especially challenging is Testing Big Data for organizations with very little knowledge
with regard to what to test and how to test it
| There are certain basic aspects of Big Data processing
| On that basis further testing procedures can be determined
4
Aspects of Big Data Testing
5
Risk of failing
| Failure in Big Data Testing could have negative consequences and It may result in:
| Production of poor quality of data
| Delays in testing
| Increased cost of testing
| Big Data Testing can be performed in two ways: functional and nonfunctional testing
| A very strong test data and test environment management are required to ensure
error-free processing of data
6
Functional
Testing
| Functional Testing is performed in three stages:
| Pre-Hadoop Process Testing
| MapReduce Process Validation
| Extract-Transform-Load Process Validation and Report Testing
7
Pre-Hadoop Process Testing
| HDFS stands for Hadoop Distributed File System
| HDFS lets you store huge amount of data on a cloud of machines Pre-Hadoop Process
Testing
| When the data is extracted from various sources such as web logs, social media,
RDBMS, etc., and uploaded into HDFS, an initial stage of testing is carried out
8
Initial stage of Testing
| Verification of the data acquired from the original source to check if it is corrupted or
not
| Validation of data files if they were uploaded into correct HDFS location
| Checking the file partition and then copying them to different data units
| Determination of a complete set of data to be checked
| Verification of synchronicity of the source data with that of the data uploaded into
HDFS
9
MapReduce Process Validation
| MapReduce Processing is a data processing concept used to compress the massive
amount of data into practical aggregated compact data packets:
| Testing of business logic first on a single node then on a set of nodes or multiple nodes
| Validation of the MapReduce process to ensure the correct generation of the “key-
value” pair
| After the “reduce” operation, validation of aggregation and consolidation of data
| Comparison of the output generated data with the input files to make sure the
generated output file meets all the requirements
10
Extract-Transform-Load Process
Validation and Report Testing
| ETL Process Validation and Report Testing: ETL stands for Extraction, Transformation,
and Load testing approach. This is the last stage of testing in the queue where data
generated by the previous stage is first unloaded and then loaded into the downstream
repository system i.e. Enterprise Data Warehouse (EDW) where reports are generated
or a transactional system analysis is done for further processing.
11
Purposes of ETL Process Validation &
Report Testing
| To check the correct application of transformation rules
| Inspection of data aggregation to ensure there is no distortion of data and it is loaded
into the target system
| To ensure there is no data corruption by comparing with the HDFS file system data
| Validation of reports that include the required data and all indicators are displayed
correctly
12
Non-
Functional
Testing
| Hadoop processes large chunks of data of varying variety and speed
| Hence it becomes imperative to perform architectural testing of the
Big Data systems
| To ensure success of the projects in question
| This non-functional testing is performed in two ways:
| 1) Performance Testing
| 2) Failover Testing
13
Performance Testing
| Performance Testing performs the testing of:
| Job completion time
| Memory utilization
| Data throughput of big Data Systems
| The main objective of performance testing is not restricted to only an acknowledgment
of application performance
| But to improve the performance of the Big Data system as whole too
14
Performance Testing Process
| Obtain the metrics of performance of Big Data systems i.e. response time, maximum
data processing capacity, speed of data consumption, etc.
| Determine conditions which cause performance problems i.e. assessing performance
limiting conditions
| Verification of speed with which MapReduce processing (sorts, merges) is executed
| Verification of storage of data at different nodes
| Test JVM Parameters such as heap size, GC Collection Algorithms, etc.
| Test the values for connection timeout, query timeout, etc.
15
Failover Testing
| Failover testing is done to verify seamless processing of data in case of failure of data
nodes
| It validates the recovery process and the processing of data when switched to other
data nodes
| Two types of metrics are observed during this testing:
| 1) Recovery Time Objective
| 2) Recovery Point Objective
16
Big Data Testing Process
17
Conclusion
| Many big firms including cloud enablers and various project management tools
platforms are using Big Data
| The main challenge faced by such organizations today is how to test Big Data and how
to improve the performance and processing power of Big Data systems
| The aforementioned Testing is performed to ensure all is working well - the data
extracted and processed is undistorted and in sync with the original data
| Big Data processing could be batch, real-time or interactive
| Hence when dealing with such huge amount of data, Big Data testing becomes
imperative as well as inevitable
18
www.QualiTestGroup.com
Thank You!

More Related Content

What's hot

Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcaderoIasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Codecamp Romania
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
RTTS
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical Industry
RTTS
 
QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOps
RTTS
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
RTTS
 

What's hot (20)

Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcaderoIasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = Success
 
Leveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaLeveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE Vertica
 
Implementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectImplementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing Project
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 
Improve the Health of Your Data
Improve the Health of Your DataImprove the Health of Your Data
Improve the Health of Your Data
 
Applying Testing Techniques for Big Data and Hadoop
Applying Testing Techniques for Big Data and HadoopApplying Testing Techniques for Big Data and Hadoop
Applying Testing Techniques for Big Data and Hadoop
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
 
An introduction to QuerySurge webinar
An introduction to QuerySurge webinarAn introduction to QuerySurge webinar
An introduction to QuerySurge webinar
 
Hadoop testing workshop - july 2013
Hadoop testing workshop - july 2013Hadoop testing workshop - july 2013
Hadoop testing workshop - july 2013
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical Industry
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOps
 
Big Data Testing Strategies
Big Data Testing StrategiesBig Data Testing Strategies
Big Data Testing Strategies
 
QuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solutionQuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solution
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programming
 
TESTING IN BIG DATA WORLD
TESTING IN BIG DATA  WORLDTESTING IN BIG DATA  WORLD
TESTING IN BIG DATA WORLD
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
 
Big Data – A New Testing Challenge
Big Data – A New Testing ChallengeBig Data – A New Testing Challenge
Big Data – A New Testing Challenge
 

Similar to How to Test Big Data Systems | QualiTest Group

Data Collection Process And Integrity
Data Collection Process And IntegrityData Collection Process And Integrity
Data Collection Process And Integrity
Gerrit Klaschke, CSM
 
performancetestinganoverview-110206071921-phpapp02.pdf
performancetestinganoverview-110206071921-phpapp02.pdfperformancetestinganoverview-110206071921-phpapp02.pdf
performancetestinganoverview-110206071921-phpapp02.pdf
MAshok10
 

Similar to How to Test Big Data Systems | QualiTest Group (20)

Understanding big data testing
Understanding big data testingUnderstanding big data testing
Understanding big data testing
 
F1803013034
F1803013034F1803013034
F1803013034
 
All You Need To Know About Big Data Testing - Bahaa Al Zubaidi.pdf
All You Need To Know About Big Data Testing - Bahaa Al Zubaidi.pdfAll You Need To Know About Big Data Testing - Bahaa Al Zubaidi.pdf
All You Need To Know About Big Data Testing - Bahaa Al Zubaidi.pdf
 
Testing insights from data lakes
Testing insights from data lakesTesting insights from data lakes
Testing insights from data lakes
 
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
 
Big Data Testing Approach - Rohit Kharabe
Big Data Testing Approach - Rohit KharabeBig Data Testing Approach - Rohit Kharabe
Big Data Testing Approach - Rohit Kharabe
 
Strengthening the Quality of Big Data Implementations
Strengthening the Quality of Big Data ImplementationsStrengthening the Quality of Big Data Implementations
Strengthening the Quality of Big Data Implementations
 
From Relational Database Management to Big Data: Solutions for Data Migration...
From Relational Database Management to Big Data: Solutions for Data Migration...From Relational Database Management to Big Data: Solutions for Data Migration...
From Relational Database Management to Big Data: Solutions for Data Migration...
 
Data Collection Process And Integrity
Data Collection Process And IntegrityData Collection Process And Integrity
Data Collection Process And Integrity
 
20171019 data migration (rk)
20171019 data migration (rk)20171019 data migration (rk)
20171019 data migration (rk)
 
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity ChallengesBuilding a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
 
Best Practices for Applications Performance Testing
Best Practices for Applications Performance TestingBest Practices for Applications Performance Testing
Best Practices for Applications Performance Testing
 
Tufts Research: Strategies from Data Management Leaders to Speed Clinical Trials
Tufts Research: Strategies from Data Management Leaders to Speed Clinical TrialsTufts Research: Strategies from Data Management Leaders to Speed Clinical Trials
Tufts Research: Strategies from Data Management Leaders to Speed Clinical Trials
 
Mind Map Test Data Management Overview
Mind Map Test Data Management OverviewMind Map Test Data Management Overview
Mind Map Test Data Management Overview
 
Test data management
Test data managementTest data management
Test data management
 
Performance Testing
Performance TestingPerformance Testing
Performance Testing
 
July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...
July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...
July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...
 
DATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing PlanDATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing Plan
 
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium EnterpriseA Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
 
performancetestinganoverview-110206071921-phpapp02.pdf
performancetestinganoverview-110206071921-phpapp02.pdfperformancetestinganoverview-110206071921-phpapp02.pdf
performancetestinganoverview-110206071921-phpapp02.pdf
 

More from Qualitest

More from Qualitest (20)

10 must do’s for perfect customer experience (Cx) -Qualitest
10 must do’s for perfect customer experience (Cx) -Qualitest10 must do’s for perfect customer experience (Cx) -Qualitest
10 must do’s for perfect customer experience (Cx) -Qualitest
 
Don’t Let Missed Bugs Cause Mayhem in your Organization!
Don’t Let Missed Bugs Cause Mayhem in your Organization!Don’t Let Missed Bugs Cause Mayhem in your Organization!
Don’t Let Missed Bugs Cause Mayhem in your Organization!
 
DevOps and Groupthink An Oxymoron?
DevOps and Groupthink An Oxymoron?DevOps and Groupthink An Oxymoron?
DevOps and Groupthink An Oxymoron?
 
Google SLS Outsourcing by Jeffrey Roth
Google SLS Outsourcing by Jeffrey RothGoogle SLS Outsourcing by Jeffrey Roth
Google SLS Outsourcing by Jeffrey Roth
 
Successful Offshore Practices by Ofer Glanz
Successful Offshore Practices by Ofer GlanzSuccessful Offshore Practices by Ofer Glanz
Successful Offshore Practices by Ofer Glanz
 
Pricing Models by Michel Sharvit
Pricing Models by Michel SharvitPricing Models by Michel Sharvit
Pricing Models by Michel Sharvit
 
5 keys to success at MTS by Tzahi Falkovich
5 keys to success at MTS by Tzahi Falkovich5 keys to success at MTS by Tzahi Falkovich
5 keys to success at MTS by Tzahi Falkovich
 
The Journey of QualiTest by Ayal Zylberman
The Journey of QualiTest by Ayal ZylbermanThe Journey of QualiTest by Ayal Zylberman
The Journey of QualiTest by Ayal Zylberman
 
Designing for the internet - Page Objects for the Real World
Designing for the internet - Page Objects for the Real WorldDesigning for the internet - Page Objects for the Real World
Designing for the internet - Page Objects for the Real World
 
DevSecOps - It can change your life (cycle)
DevSecOps - It can change your life (cycle)DevSecOps - It can change your life (cycle)
DevSecOps - It can change your life (cycle)
 
IoT Quality Challenges - Testing & Engineering
IoT Quality Challenges - Testing & EngineeringIoT Quality Challenges - Testing & Engineering
IoT Quality Challenges - Testing & Engineering
 
Webinar: How to get localization and testing for medical devices done right
Webinar: How to get localization and testing for medical devices done right Webinar: How to get localization and testing for medical devices done right
Webinar: How to get localization and testing for medical devices done right
 
Webinar: DevOps challenges facing QA
Webinar: DevOps challenges facing QAWebinar: DevOps challenges facing QA
Webinar: DevOps challenges facing QA
 
Root Cause Analysis | QualiTest Group
Root Cause Analysis | QualiTest GroupRoot Cause Analysis | QualiTest Group
Root Cause Analysis | QualiTest Group
 
Testing for a Great App and Web Experience | QualiTest Group
Testing for a Great App and Web Experience | QualiTest GroupTesting for a Great App and Web Experience | QualiTest Group
Testing for a Great App and Web Experience | QualiTest Group
 
DevOps 101
DevOps 101 DevOps 101
DevOps 101
 
Killing the Myths of Outsourced Software Testing
Killing the Myths of Outsourced Software TestingKilling the Myths of Outsourced Software Testing
Killing the Myths of Outsourced Software Testing
 
Why do we need a Scrum Master?
Why do we need a Scrum Master?Why do we need a Scrum Master?
Why do we need a Scrum Master?
 
The changing role of a QA | QualiTest Group
The changing role of a QA | QualiTest GroupThe changing role of a QA | QualiTest Group
The changing role of a QA | QualiTest Group
 
Roaming Assurance | QualiTest Group
Roaming Assurance | QualiTest GroupRoaming Assurance | QualiTest Group
Roaming Assurance | QualiTest Group
 

Recently uploaded

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Recently uploaded (20)

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 

How to Test Big Data Systems | QualiTest Group

  • 1. How to Test Big Data Systems
  • 2. The definition of Big Data | Big Data is perceived as a huge amount of data and information | However, it is a lot more than this | Big Data may be said to be a whole set of approaches, tools and methods of processing large volumes of unstructured as well as structured data | Big Data is defined on three parameters | These describe how you have to process an enormous amount of data in different formats at different rates 2
  • 3. The three parameters on which Big Data is defined 3
  • 4. Testing Big Data can be quite a challenge for organizations | Traditional analysis techniques have certain limitations | Dealing with such large sets of data owes to its complexity | Especially challenging is Testing Big Data for organizations with very little knowledge with regard to what to test and how to test it | There are certain basic aspects of Big Data processing | On that basis further testing procedures can be determined 4
  • 5. Aspects of Big Data Testing 5
  • 6. Risk of failing | Failure in Big Data Testing could have negative consequences and It may result in: | Production of poor quality of data | Delays in testing | Increased cost of testing | Big Data Testing can be performed in two ways: functional and nonfunctional testing | A very strong test data and test environment management are required to ensure error-free processing of data 6
  • 7. Functional Testing | Functional Testing is performed in three stages: | Pre-Hadoop Process Testing | MapReduce Process Validation | Extract-Transform-Load Process Validation and Report Testing 7
  • 8. Pre-Hadoop Process Testing | HDFS stands for Hadoop Distributed File System | HDFS lets you store huge amount of data on a cloud of machines Pre-Hadoop Process Testing | When the data is extracted from various sources such as web logs, social media, RDBMS, etc., and uploaded into HDFS, an initial stage of testing is carried out 8
  • 9. Initial stage of Testing | Verification of the data acquired from the original source to check if it is corrupted or not | Validation of data files if they were uploaded into correct HDFS location | Checking the file partition and then copying them to different data units | Determination of a complete set of data to be checked | Verification of synchronicity of the source data with that of the data uploaded into HDFS 9
  • 10. MapReduce Process Validation | MapReduce Processing is a data processing concept used to compress the massive amount of data into practical aggregated compact data packets: | Testing of business logic first on a single node then on a set of nodes or multiple nodes | Validation of the MapReduce process to ensure the correct generation of the “key- value” pair | After the “reduce” operation, validation of aggregation and consolidation of data | Comparison of the output generated data with the input files to make sure the generated output file meets all the requirements 10
  • 11. Extract-Transform-Load Process Validation and Report Testing | ETL Process Validation and Report Testing: ETL stands for Extraction, Transformation, and Load testing approach. This is the last stage of testing in the queue where data generated by the previous stage is first unloaded and then loaded into the downstream repository system i.e. Enterprise Data Warehouse (EDW) where reports are generated or a transactional system analysis is done for further processing. 11
  • 12. Purposes of ETL Process Validation & Report Testing | To check the correct application of transformation rules | Inspection of data aggregation to ensure there is no distortion of data and it is loaded into the target system | To ensure there is no data corruption by comparing with the HDFS file system data | Validation of reports that include the required data and all indicators are displayed correctly 12
  • 13. Non- Functional Testing | Hadoop processes large chunks of data of varying variety and speed | Hence it becomes imperative to perform architectural testing of the Big Data systems | To ensure success of the projects in question | This non-functional testing is performed in two ways: | 1) Performance Testing | 2) Failover Testing 13
  • 14. Performance Testing | Performance Testing performs the testing of: | Job completion time | Memory utilization | Data throughput of big Data Systems | The main objective of performance testing is not restricted to only an acknowledgment of application performance | But to improve the performance of the Big Data system as whole too 14
  • 15. Performance Testing Process | Obtain the metrics of performance of Big Data systems i.e. response time, maximum data processing capacity, speed of data consumption, etc. | Determine conditions which cause performance problems i.e. assessing performance limiting conditions | Verification of speed with which MapReduce processing (sorts, merges) is executed | Verification of storage of data at different nodes | Test JVM Parameters such as heap size, GC Collection Algorithms, etc. | Test the values for connection timeout, query timeout, etc. 15
  • 16. Failover Testing | Failover testing is done to verify seamless processing of data in case of failure of data nodes | It validates the recovery process and the processing of data when switched to other data nodes | Two types of metrics are observed during this testing: | 1) Recovery Time Objective | 2) Recovery Point Objective 16
  • 17. Big Data Testing Process 17
  • 18. Conclusion | Many big firms including cloud enablers and various project management tools platforms are using Big Data | The main challenge faced by such organizations today is how to test Big Data and how to improve the performance and processing power of Big Data systems | The aforementioned Testing is performed to ensure all is working well - the data extracted and processed is undistorted and in sync with the original data | Big Data processing could be batch, real-time or interactive | Hence when dealing with such huge amount of data, Big Data testing becomes imperative as well as inevitable 18