SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Bug bites Elephant?
Test-driven Quality Assurance
in Big Data Application Development
Dr. Dominik Benz, inovex GmbH
2013/06/03, Berlin Buzzwords
? TDD!
?
?
?
?
?
?
?
Write/execute tests, specify
acceptance criteria, …
2
Who speaks… … the Elephant language?
Class A
extends
Mapper…
ROI, $$, …
apt-get
install…
3
The road… … to Big Data QA
our Big Data
QA problem
the FitNesse
approach
test data
definition /
selection
job & workflow
control
result
inspection
4
QA problem Web Intelligence @ 1&1
DWH
Hadoop Cluster
~ 1 billion log
events / day,
~ 1 TB (thrift)
logfiles
chains of MR jobs,
running on
20 nodes / 8 cores /
96 GB RAM (CDH)
BI reporting, web
analytics, …
5
QA problem An exemplary workflow
Log Files
(thrift)
Log Files
(thrift)
Log Files
(thrift)
Inter-
mediate
result
(avro)
MR
job 1
… DWH
(RDBMS)
MR
job 2
create
(sample)
input data
create
(sample)
input data
? inspect
(binary)
formats
inspect
(binary)
formats
? control
workflows
control
workflows
?
method tests what? issues for our usecase
JUnit isolated functions no integration, Java syntax
MRUnit 1 mapper + 1 reducer „little“ integration, Java
syntax
iTest hadoop jobs/workflows Java / Groovy syntax
Scripts/CLI (manual) scripting/inspect. „script chaos“, syntax
6
QA problem Existing Approaches
FitNesse as suitable addition / solution!
7
The road… … to Big Data QA
Big Data QA
is different!
the FitNesse
approach
test data
definition /
selection
job & workflow
control
result
inspection
8
FitNesse In a nutshell
„fully integrated
standalone wiki and
acceptance testing
framework”
• „executable“ Wiki-
Pages (returning test
results)
• (almost) natural
language test
specification
• connection to SUT
via (Java-)“Fixtures“
9
FitNesse Architecture Overview
script |
check |
num results |
3 |
Browser
FitNesse
Server
public int
numResults
{ ... }
System under Test
Fixtures
„calling java methods from
wiki“, compare return values
Integrates with REST, Jenkins…
10
FitNesse An Exemplary Test
11
FitNesse Exemplary Test Source
!path /home/inovex/lib/*.jar
| script | Hadoop |
| upload | viewLog.csv | to hdfs | /testdata/ |
| hadoop job from jar | viewLog.jar | [...] |
| show | job output |
| check | number of output files | 3 |
12
FitNesse Hadoop Fixture Java Code
public class Hadoop {
public boolean uploadToHdfs(String localFile,
String remoteFile) {...}
public boolean hadoopJobFromJar(String jar,
String input, String output) {...}
public String jobOutput() {...}
public String numberOfOutputFiles() {...}
}
13
The road… … to Big Data QA
Big Data QA
is different!
Fitnesse Wiki
test execution!
test data
definition /
selection
job & workflow
control
result
inspection
14
Test Data CSV
‣ Big Data: Efficient data transfer among heterogeneous sources
‣ Define Interface via IDL, Compiler for many languages
15
Test Data Thrift
‣ Dev/Test Hadoop Cluster: Identical Hardware like Prod, but
fewer nodes
‣ (random/biased) sampling e.g. on daily basis
‣ Feedback loop:
‣ identify „special cases“ from real data
‣ include them in (manual) data definition
‣ Gradually increase test coverage / artefact quality
16
Test Data Real World Data
17
The road… … to Big Data QA
Big Data QA
is different!
FitNesse Wiki
test execution!
Define CSV /
thrift / real-
world test data!
job & workflow
control
result
inspection
‣ Execute arbitrary (shell) commands
‣ Mainly a wrapper around apache.commons.exec.CommandLine
18
Job Control Swiss Army Knife: Shell
‣ Hide complexity from test authors
‣ „define“ appropriate test language via (Java) method names
‣ re-use other fixtures (Shell, …) internally
19
Job Control Hadoop Fixture
‣ FitNesse allows to group tests into suites
‣ Can be used to simulate MR processing chains
‣ SetupSuite / TearDownSuite for creating /
destroying test conditions
‣ Tests can still be executed individually
20
Job Control Workflows & Suites
MR job 1
MR job 2
21
The road… … to Big Data QA
Big Data QA
is different!
FitNesse Wiki
test execution!
Define CSV /
thrift / real-
world data!
Use suites & fixtures
for jobs/workflows!
result
inspection
‣ Validate RDBMS contents (via JDBC)
‣ E.g. for checking the final result
‣ Or use Hive + Hive-Server to query raw data
22
Results Data Warehouse / Hive
‣ Execute arbitrary pig commands from Wiki page
‣ Inspect e.g. binary intermediate results (avro, …)
23
Results Pig
public class PigConsole extends PigServer {
public void loadAvroFileUsingAlias(String
filename, String alias) {
this.registerQuery(
alias + "= LOAD" + filename + "USING" +
AVRO_STORAGE_LOADER + ";");
}
}
24
Results Pig Fixture extends PigServer
25
Results Server Infrastructure
Fitnesse Master
TestEnvironments
ProjA ProjB
TestConfigurations
ProjA ProjB
dev qs live dev qs live
Import / edit
tests remotely
QS
ProjA Slave
Dev
ProjA Slave
Live
ProjA SlaveProjA
QS
ProjA Slave
Dev
ProjA Slave
Live
ProjA Slave
Import / edit
config remotely
dev qs live
26
Thank you! dominik.benz@inovex.de
Big Data QA
is different!
FitNesse Wiki
test execution!
Define CSV /
thrift / real-
world data!
Inspect results
via Pig/Hive
Use suites & fixtures
for jobs/workflows!
27
Want more? Inovex trains you!
Android Developer Training (3 days, Karlsruhe/München)
Hadoop Developer Training (3 days, Karlsruhe/Köln)
Certified Scrum Developer Training (5 days, Köln)
Pentaho Data Integration Training (4 days, München/Köln)
Liferay Portal-Admin Training (3 days, Karlsruhe)
Liferay Portal-Developer Training (4 days, Karlsruhe)
information and registration at
www.inovex.de/offene-trainings
28
Inovex @bbuzz
Stefan Kathrin Bernhard
Jörg
Andrew
Christian Christian

Weitere ähnliche Inhalte

Was ist angesagt?

RDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsRDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsNextMove Software
 
2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_on2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_onSri Ambati
 
A Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiA Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiGrowth Intelligence
 
Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases MongoDB
 
Onyx data processing the clojure way
Onyx   data processing  the clojure wayOnyx   data processing  the clojure way
Onyx data processing the clojure wayBahadir Cambel
 
Deep Dumpster Diving
Deep Dumpster DivingDeep Dumpster Diving
Deep Dumpster DivingRonnBlack
 
Presentation of OrientDB v2.2 - Webinar
Presentation of OrientDB v2.2 - WebinarPresentation of OrientDB v2.2 - Webinar
Presentation of OrientDB v2.2 - WebinarOrient Technologies
 
Interactive Session on Sparkling Water
Interactive Session on Sparkling WaterInteractive Session on Sparkling Water
Interactive Session on Sparkling WaterSri Ambati
 

Was ist angesagt? (10)

Ajax Lecture Notes
Ajax Lecture NotesAjax Lecture Notes
Ajax Lecture Notes
 
RDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsRDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical Depictions
 
Meetup spark structured streaming
Meetup spark structured streamingMeetup spark structured streaming
Meetup spark structured streaming
 
2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_on2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_on
 
A Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiA Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with Luigi
 
Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases
 
Onyx data processing the clojure way
Onyx   data processing  the clojure wayOnyx   data processing  the clojure way
Onyx data processing the clojure way
 
Deep Dumpster Diving
Deep Dumpster DivingDeep Dumpster Diving
Deep Dumpster Diving
 
Presentation of OrientDB v2.2 - Webinar
Presentation of OrientDB v2.2 - WebinarPresentation of OrientDB v2.2 - Webinar
Presentation of OrientDB v2.2 - Webinar
 
Interactive Session on Sparkling Water
Interactive Session on Sparkling WaterInteractive Session on Sparkling Water
Interactive Session on Sparkling Water
 

Andere mochten auch

Dudley dolan q-validus_big data_workshop_dcu_event_aqua_smart_march16_ final
Dudley dolan q-validus_big data_workshop_dcu_event_aqua_smart_march16_ finalDudley dolan q-validus_big data_workshop_dcu_event_aqua_smart_march16_ final
Dudley dolan q-validus_big data_workshop_dcu_event_aqua_smart_march16_ finalAquaSmartData
 
Metadata quality in digital repositories
Metadata quality in digital repositoriesMetadata quality in digital repositories
Metadata quality in digital repositoriesNikos Palavitsinis, PhD
 
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...BigDataEverywhere
 
Big data governance as a corporate governance imperative
Big data governance as a corporate governance imperativeBig data governance as a corporate governance imperative
Big data governance as a corporate governance imperativeGuy Pearce
 
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data QualityBig Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data QualityBigDataExpo
 
Data donderdag data quality sas
Data donderdag data quality sasData donderdag data quality sas
Data donderdag data quality sasCre-Aid
 
Quality Approaches to Big Data
Quality Approaches to Big DataQuality Approaches to Big Data
Quality Approaches to Big DataPiet J.H. Daas
 
Big Data Quality Panel : Diachron Workshop @EDBT
Big Data Quality Panel: Diachron Workshop @EDBTBig Data Quality Panel: Diachron Workshop @EDBT
Big Data Quality Panel : Diachron Workshop @EDBTPaolo Missier
 
Data quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeData quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeStefan Kühn
 
Big data and the data quality imperative
Big data and the data quality imperativeBig data and the data quality imperative
Big data and the data quality imperativeTrillium Software
 
Strengthening the Quality of Big Data Implementations
Strengthening the Quality of Big Data ImplementationsStrengthening the Quality of Big Data Implementations
Strengthening the Quality of Big Data ImplementationsCognizant
 
Acquire Grow & Retain customers - The business imperative for Big Data
Acquire Grow & Retain customers - The business imperative for Big DataAcquire Grow & Retain customers - The business imperative for Big Data
Acquire Grow & Retain customers - The business imperative for Big DataIBM Software India
 
Towards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICETowards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICEPooyan Jamshidi
 
Big data for quality education
Big data for quality educationBig data for quality education
Big data for quality educationMalintha Adikari
 
Data profiling-best-practices
Data profiling-best-practicesData profiling-best-practices
Data profiling-best-practicesBlaise Cheuteu
 
DICE & Cloudify – Quality Big Data Made Easy
DICE & Cloudify – Quality Big Data Made EasyDICE & Cloudify – Quality Big Data Made Easy
DICE & Cloudify – Quality Big Data Made EasyCloudify Community
 
Data Quality Challenges to Big Data_Practical Insights_KPMG Presentation 20.4...
Data Quality Challenges to Big Data_Practical Insights_KPMG Presentation 20.4...Data Quality Challenges to Big Data_Practical Insights_KPMG Presentation 20.4...
Data Quality Challenges to Big Data_Practical Insights_KPMG Presentation 20.4...Hugo van Hoogstraten
 

Andere mochten auch (20)

Dudley dolan q-validus_big data_workshop_dcu_event_aqua_smart_march16_ final
Dudley dolan q-validus_big data_workshop_dcu_event_aqua_smart_march16_ finalDudley dolan q-validus_big data_workshop_dcu_event_aqua_smart_march16_ final
Dudley dolan q-validus_big data_workshop_dcu_event_aqua_smart_march16_ final
 
Metadata quality in digital repositories
Metadata quality in digital repositoriesMetadata quality in digital repositories
Metadata quality in digital repositories
 
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
 
Big data governance as a corporate governance imperative
Big data governance as a corporate governance imperativeBig data governance as a corporate governance imperative
Big data governance as a corporate governance imperative
 
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data QualityBig Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
 
Data donderdag data quality sas
Data donderdag data quality sasData donderdag data quality sas
Data donderdag data quality sas
 
Quality Approaches to Big Data
Quality Approaches to Big DataQuality Approaches to Big Data
Quality Approaches to Big Data
 
Big Data Quality Panel : Diachron Workshop @EDBT
Big Data Quality Panel: Diachron Workshop @EDBTBig Data Quality Panel: Diachron Workshop @EDBT
Big Data Quality Panel : Diachron Workshop @EDBT
 
Data quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeData quality - The True Big Data Challenge
Data quality - The True Big Data Challenge
 
Big data and the data quality imperative
Big data and the data quality imperativeBig data and the data quality imperative
Big data and the data quality imperative
 
Strengthening the Quality of Big Data Implementations
Strengthening the Quality of Big Data ImplementationsStrengthening the Quality of Big Data Implementations
Strengthening the Quality of Big Data Implementations
 
Acquire Grow & Retain customers - The business imperative for Big Data
Acquire Grow & Retain customers - The business imperative for Big DataAcquire Grow & Retain customers - The business imperative for Big Data
Acquire Grow & Retain customers - The business imperative for Big Data
 
Towards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICETowards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICE
 
Big data for quality education
Big data for quality educationBig data for quality education
Big data for quality education
 
Data profiling-best-practices
Data profiling-best-practicesData profiling-best-practices
Data profiling-best-practices
 
Metadata Workshop
Metadata WorkshopMetadata Workshop
Metadata Workshop
 
HUGE List of IEP Goals
HUGE List of IEP Goals HUGE List of IEP Goals
HUGE List of IEP Goals
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
DICE & Cloudify – Quality Big Data Made Easy
DICE & Cloudify – Quality Big Data Made EasyDICE & Cloudify – Quality Big Data Made Easy
DICE & Cloudify – Quality Big Data Made Easy
 
Data Quality Challenges to Big Data_Practical Insights_KPMG Presentation 20.4...
Data Quality Challenges to Big Data_Practical Insights_KPMG Presentation 20.4...Data Quality Challenges to Big Data_Practical Insights_KPMG Presentation 20.4...
Data Quality Challenges to Big Data_Practical Insights_KPMG Presentation 20.4...
 

Ähnlich wie Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development

QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiouslyQA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiouslyQAFest
 
Testing Big Data solutions fast and furiously
Testing Big Data solutions fast and furiouslyTesting Big Data solutions fast and furiously
Testing Big Data solutions fast and furiouslyKatherine Golovinova
 
Productionalizing ML : Real Experience
Productionalizing ML : Real ExperienceProductionalizing ML : Real Experience
Productionalizing ML : Real ExperienceIhor Bobak
 
Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0Lars Albertsson
 
Hadoop: Code Injection, Distributed Fault Injection
Hadoop: Code Injection, Distributed Fault InjectionHadoop: Code Injection, Distributed Fault Injection
Hadoop: Code Injection, Distributed Fault InjectionCloudera, Inc.
 
Java one 2010
Java one 2010Java one 2010
Java one 2010scdn
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Inside Azure Diagnostics (DevLink 2014)
Inside Azure Diagnostics (DevLink 2014)Inside Azure Diagnostics (DevLink 2014)
Inside Azure Diagnostics (DevLink 2014)Michael Collier
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJANicolas Poggi
 
What to expect from Java 9
What to expect from Java 9What to expect from Java 9
What to expect from Java 9Ivan Krylov
 
Inside Azure Diagnostics
Inside Azure DiagnosticsInside Azure Diagnostics
Inside Azure DiagnosticsMichael Collier
 
Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInVitaly Gordon
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingBart Vandewoestyne
 
Start with version control and experiments management in machine learning
Start with version control and experiments management in machine learningStart with version control and experiments management in machine learning
Start with version control and experiments management in machine learningMikhail Rozhkov
 
Basic of Big Data
Basic of Big Data Basic of Big Data
Basic of Big Data Amar kumar
 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using MahoutIMC Institute
 
Basics of big data analytics hadoop
Basics of big data analytics hadoopBasics of big data analytics hadoop
Basics of big data analytics hadoopAmbuj Kumar
 
OWASP ZAP Workshop for QA Testers
OWASP ZAP Workshop for QA TestersOWASP ZAP Workshop for QA Testers
OWASP ZAP Workshop for QA TestersJavan Rasokat
 

Ähnlich wie Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development (20)

QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiouslyQA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
 
Testing Big Data solutions fast and furiously
Testing Big Data solutions fast and furiouslyTesting Big Data solutions fast and furiously
Testing Big Data solutions fast and furiously
 
Productionalizing ML : Real Experience
Productionalizing ML : Real ExperienceProductionalizing ML : Real Experience
Productionalizing ML : Real Experience
 
Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0
 
Hadoop: Code Injection, Distributed Fault Injection
Hadoop: Code Injection, Distributed Fault InjectionHadoop: Code Injection, Distributed Fault Injection
Hadoop: Code Injection, Distributed Fault Injection
 
Java one 2010
Java one 2010Java one 2010
Java one 2010
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Inside Azure Diagnostics (DevLink 2014)
Inside Azure Diagnostics (DevLink 2014)Inside Azure Diagnostics (DevLink 2014)
Inside Azure Diagnostics (DevLink 2014)
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJA
 
What to expect from Java 9
What to expect from Java 9What to expect from Java 9
What to expect from Java 9
 
Inside Azure Diagnostics
Inside Azure DiagnosticsInside Azure Diagnostics
Inside Azure Diagnostics
 
Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedIn
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarking
 
Deep Dive Modern Apps Lifecycle with Visual Studio 2012: How to create cross ...
Deep Dive Modern Apps Lifecycle with Visual Studio 2012: How to create cross ...Deep Dive Modern Apps Lifecycle with Visual Studio 2012: How to create cross ...
Deep Dive Modern Apps Lifecycle with Visual Studio 2012: How to create cross ...
 
Start with version control and experiments management in machine learning
Start with version control and experiments management in machine learningStart with version control and experiments management in machine learning
Start with version control and experiments management in machine learning
 
Basic of Big Data
Basic of Big Data Basic of Big Data
Basic of Big Data
 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using Mahout
 
Basics of big data analytics hadoop
Basics of big data analytics hadoopBasics of big data analytics hadoop
Basics of big data analytics hadoop
 
OWASP ZAP Workshop for QA Testers
OWASP ZAP Workshop for QA TestersOWASP ZAP Workshop for QA Testers
OWASP ZAP Workshop for QA Testers
 

Mehr von inovex GmbH

lldb – Debugger auf Abwegen
lldb – Debugger auf Abwegenlldb – Debugger auf Abwegen
lldb – Debugger auf Abwegeninovex GmbH
 
Are you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AIAre you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AIinovex GmbH
 
Why natural language is next step in the AI evolution
Why natural language is next step in the AI evolutionWhy natural language is next step in the AI evolution
Why natural language is next step in the AI evolutioninovex GmbH
 
Network Policies
Network PoliciesNetwork Policies
Network Policiesinovex GmbH
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learninginovex GmbH
 
Jenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen UmgebungenJenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen Umgebungeninovex GmbH
 
AI auf Edge-Geraeten
AI auf Edge-GeraetenAI auf Edge-Geraeten
AI auf Edge-Geraeteninovex GmbH
 
Prometheus on Kubernetes
Prometheus on KubernetesPrometheus on Kubernetes
Prometheus on Kubernetesinovex GmbH
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systemsinovex GmbH
 
Representation Learning von Zeitreihen
Representation Learning von ZeitreihenRepresentation Learning von Zeitreihen
Representation Learning von Zeitreiheninovex GmbH
 
Talk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale AssistentenTalk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale Assistenteninovex GmbH
 
Künstlich intelligent?
Künstlich intelligent?Künstlich intelligent?
Künstlich intelligent?inovex GmbH
 
Das Android Open Source Project
Das Android Open Source ProjectDas Android Open Source Project
Das Android Open Source Projectinovex GmbH
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretabilityinovex GmbH
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use caseinovex GmbH
 
People & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madnessPeople & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madnessinovex GmbH
 
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with PulumiInfrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with Pulumiinovex GmbH
 

Mehr von inovex GmbH (20)

lldb – Debugger auf Abwegen
lldb – Debugger auf Abwegenlldb – Debugger auf Abwegen
lldb – Debugger auf Abwegen
 
Are you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AIAre you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AI
 
Why natural language is next step in the AI evolution
Why natural language is next step in the AI evolutionWhy natural language is next step in the AI evolution
Why natural language is next step in the AI evolution
 
WWDC 2019 Recap
WWDC 2019 RecapWWDC 2019 Recap
WWDC 2019 Recap
 
Network Policies
Network PoliciesNetwork Policies
Network Policies
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
 
Jenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen UmgebungenJenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen Umgebungen
 
AI auf Edge-Geraeten
AI auf Edge-GeraetenAI auf Edge-Geraeten
AI auf Edge-Geraeten
 
Prometheus on Kubernetes
Prometheus on KubernetesPrometheus on Kubernetes
Prometheus on Kubernetes
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Azure IoT Edge
Azure IoT EdgeAzure IoT Edge
Azure IoT Edge
 
Representation Learning von Zeitreihen
Representation Learning von ZeitreihenRepresentation Learning von Zeitreihen
Representation Learning von Zeitreihen
 
Talk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale AssistentenTalk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale Assistenten
 
Künstlich intelligent?
Künstlich intelligent?Künstlich intelligent?
Künstlich intelligent?
 
Dev + Ops = Go
Dev + Ops = GoDev + Ops = Go
Dev + Ops = Go
 
Das Android Open Source Project
Das Android Open Source ProjectDas Android Open Source Project
Das Android Open Source Project
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use case
 
People & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madnessPeople & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madness
 
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with PulumiInfrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
 

Kürzlich hochgeladen

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development

  • 1. Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, inovex GmbH 2013/06/03, Berlin Buzzwords
  • 2. ? TDD! ? ? ? ? ? ? ? Write/execute tests, specify acceptance criteria, … 2 Who speaks… … the Elephant language? Class A extends Mapper… ROI, $$, … apt-get install…
  • 3. 3 The road… … to Big Data QA our Big Data QA problem the FitNesse approach test data definition / selection job & workflow control result inspection
  • 4. 4 QA problem Web Intelligence @ 1&1 DWH Hadoop Cluster ~ 1 billion log events / day, ~ 1 TB (thrift) logfiles chains of MR jobs, running on 20 nodes / 8 cores / 96 GB RAM (CDH) BI reporting, web analytics, …
  • 5. 5 QA problem An exemplary workflow Log Files (thrift) Log Files (thrift) Log Files (thrift) Inter- mediate result (avro) MR job 1 … DWH (RDBMS) MR job 2 create (sample) input data create (sample) input data ? inspect (binary) formats inspect (binary) formats ? control workflows control workflows ?
  • 6. method tests what? issues for our usecase JUnit isolated functions no integration, Java syntax MRUnit 1 mapper + 1 reducer „little“ integration, Java syntax iTest hadoop jobs/workflows Java / Groovy syntax Scripts/CLI (manual) scripting/inspect. „script chaos“, syntax 6 QA problem Existing Approaches FitNesse as suitable addition / solution!
  • 7. 7 The road… … to Big Data QA Big Data QA is different! the FitNesse approach test data definition / selection job & workflow control result inspection
  • 8. 8 FitNesse In a nutshell „fully integrated standalone wiki and acceptance testing framework” • „executable“ Wiki- Pages (returning test results) • (almost) natural language test specification • connection to SUT via (Java-)“Fixtures“
  • 9. 9 FitNesse Architecture Overview script | check | num results | 3 | Browser FitNesse Server public int numResults { ... } System under Test Fixtures „calling java methods from wiki“, compare return values Integrates with REST, Jenkins…
  • 11. 11 FitNesse Exemplary Test Source !path /home/inovex/lib/*.jar | script | Hadoop | | upload | viewLog.csv | to hdfs | /testdata/ | | hadoop job from jar | viewLog.jar | [...] | | show | job output | | check | number of output files | 3 |
  • 12. 12 FitNesse Hadoop Fixture Java Code public class Hadoop { public boolean uploadToHdfs(String localFile, String remoteFile) {...} public boolean hadoopJobFromJar(String jar, String input, String output) {...} public String jobOutput() {...} public String numberOfOutputFiles() {...} }
  • 13. 13 The road… … to Big Data QA Big Data QA is different! Fitnesse Wiki test execution! test data definition / selection job & workflow control result inspection
  • 15. ‣ Big Data: Efficient data transfer among heterogeneous sources ‣ Define Interface via IDL, Compiler for many languages 15 Test Data Thrift
  • 16. ‣ Dev/Test Hadoop Cluster: Identical Hardware like Prod, but fewer nodes ‣ (random/biased) sampling e.g. on daily basis ‣ Feedback loop: ‣ identify „special cases“ from real data ‣ include them in (manual) data definition ‣ Gradually increase test coverage / artefact quality 16 Test Data Real World Data
  • 17. 17 The road… … to Big Data QA Big Data QA is different! FitNesse Wiki test execution! Define CSV / thrift / real- world test data! job & workflow control result inspection
  • 18. ‣ Execute arbitrary (shell) commands ‣ Mainly a wrapper around apache.commons.exec.CommandLine 18 Job Control Swiss Army Knife: Shell
  • 19. ‣ Hide complexity from test authors ‣ „define“ appropriate test language via (Java) method names ‣ re-use other fixtures (Shell, …) internally 19 Job Control Hadoop Fixture
  • 20. ‣ FitNesse allows to group tests into suites ‣ Can be used to simulate MR processing chains ‣ SetupSuite / TearDownSuite for creating / destroying test conditions ‣ Tests can still be executed individually 20 Job Control Workflows & Suites MR job 1 MR job 2
  • 21. 21 The road… … to Big Data QA Big Data QA is different! FitNesse Wiki test execution! Define CSV / thrift / real- world data! Use suites & fixtures for jobs/workflows! result inspection
  • 22. ‣ Validate RDBMS contents (via JDBC) ‣ E.g. for checking the final result ‣ Or use Hive + Hive-Server to query raw data 22 Results Data Warehouse / Hive
  • 23. ‣ Execute arbitrary pig commands from Wiki page ‣ Inspect e.g. binary intermediate results (avro, …) 23 Results Pig
  • 24. public class PigConsole extends PigServer { public void loadAvroFileUsingAlias(String filename, String alias) { this.registerQuery( alias + "= LOAD" + filename + "USING" + AVRO_STORAGE_LOADER + ";"); } } 24 Results Pig Fixture extends PigServer
  • 25. 25 Results Server Infrastructure Fitnesse Master TestEnvironments ProjA ProjB TestConfigurations ProjA ProjB dev qs live dev qs live Import / edit tests remotely QS ProjA Slave Dev ProjA Slave Live ProjA SlaveProjA QS ProjA Slave Dev ProjA Slave Live ProjA Slave Import / edit config remotely dev qs live
  • 26. 26 Thank you! dominik.benz@inovex.de Big Data QA is different! FitNesse Wiki test execution! Define CSV / thrift / real- world data! Inspect results via Pig/Hive Use suites & fixtures for jobs/workflows!
  • 27. 27 Want more? Inovex trains you! Android Developer Training (3 days, Karlsruhe/München) Hadoop Developer Training (3 days, Karlsruhe/Köln) Certified Scrum Developer Training (5 days, Köln) Pentaho Data Integration Training (4 days, München/Köln) Liferay Portal-Admin Training (3 days, Karlsruhe) Liferay Portal-Developer Training (4 days, Karlsruhe) information and registration at www.inovex.de/offene-trainings
  • 28. 28 Inovex @bbuzz Stefan Kathrin Bernhard Jörg Andrew Christian Christian