SlideShare ist ein Scribd-Unternehmen logo
1 von 55
Agile Analysis
Pipeline
Andy Brown
New Pipeline Development
Who Are We?
Who Are We?
One of the world's largest DNA
Sequencing Centres
Second largest compute centre after
CERN in Europe
What Do We Do?
Human, Mouse, Zebrafish and
Pathogen Genome Projects
Post sequencing analysis, annotation
and maintenance
(It's never truly finished!)
Who Am I?
Tracking systems and analysis pipeline
for Next Generation Sequencing
Technologies
Perl, Web Technologies, Moose
Next Generation Sequencing?
Massively Parallel DNA Sequencing
Producing Millions of Reads per run
~38 instruments
~5Tb of data a day
Managing quick turnaround on Staging
of 320Tb data a month
Analysis
Convert Images to Bases
Obtain quality values
Recalibrate quality
Separate up DNA sequences from
different projects
Do this in parallel
Be able to extend this
Analysis
Current analysis running script was
unable to cope with changing demands
What Did I Have?
A Brief
Run
Completes
Bustard
Adaptor
Removal
Split
by Tag
CIF
Qseq, Sig2
Split
by Tag
Calibrate
Scores
Index: rejectsIndex: rejects
Index: + tags
Split
by Tag
Split
by Tag
Split
by Tag
Create Cal
Table
Cal Table
Control Refs
Calibrate
Scores
Consent
Align
Index: + tags
Cal-Qseq
Consent
Align
K-mer Error
Correction
Cal-Qseq
Index: + consent
K-mer Error
Correction
K-mer Error
Correction
K-mer Error
Correction
K-mer Error
Correction
K-mer Error
Correction
Index: + consent
Index: + rejects
K-mer Error
Correction
K-mer Error
Correction
K-mer Error
Correction
Create Fastq
K-mer Error
Correction
K-mer Error
Correction
K-mer Error
Correction
Align to
Ref
Index: + rejects
Fastq
K-mer Error
Correction
K-mer Error
Correction
Create SRF
Control Refs
Sample Refs
Next
Page!
BAM
Initial Product
Creation
Initial Product
Creation
Gray boxes may
be pass-through
Control Refs
SRF
Sig2 Index fastq BAM
Run Summary (Summary.htm stuff)
IVC Plots
Q20 Counts
Fastqcheck
Insert Size Histogram
Error rates and QQ-Plots
Heatmaps
SNP Finder
... And Anything Else You Can Think Of
Human QC
Fuse
Archive
QC and Archival
Working in a Agile Manner
Current manner – still close to Cascade,
some idea of iterations
I wanted more agility – defined iterations
Got close
First Iteration - It1
Chop down the brief into stories
Spoke with creator of the brief, my boss
& team about what was needed
Pluggable, Automatic, Auto QC
It1: First bit of Coding
Read old code – anything I can steal –
yes!
Write some 'in principle' tests to get an
idea of the way to go.
Write some code for those tests.
It1: Prototype
Launch next
LaunchSelforFinish
LSF DEPENDENCIES
It1: Fail
Test Principle – Worked
Reality – Too Unwieldy
It1: Evaluation
Too much wrapping
Too much could go wrong with lots of parts
Out the Window!
Second Iteration - It2
So, I'm Agile. I don't see this as a set
back.
Opportunity to try a different approach.
I sketch it out.
Flag Waver
Function b Function c Function d Function eFunction a
Object to
Launch Ca
Object to
Launch Cb
Object to
Launch Cc
Object to
Launch Cd
Object to
Launch Ce
Component
a
Component
b
Component
c
Component
d
Component
e
It2: Second lot of Coding
Again, start off with in principle tests
Write some code to pass those tests
Select a bit of real world to apply it to
It2: Pass
This real world bit works
All jobs are launched as expected
Replace the old section with this bit
It still works :) A perfect replacement
It2: Evaluation
Success :)
The Flag Waver model - functions that
know what to do, but no knowledge of
other functions
This should make it pluggable
It2: Evaluation
Bulky data getting generated multiple
times over – Needs more DRYness
It3: Some new requests
It would be easier to code if we didn't
have users of the applications!
The first new request comes in for some
automated QC
Just launch them at the correct time
It3: Scrum
So, I scrum.
The objective: Work out priorities for
this iteration.
There are many 'stories', I decide on the
following.
It3: Scrum
Write something to make data
construction and passing more DRY
Write another replacement pipeline
section
Try to incorporate 1 QC into previous
pipeline section
It3: Tests
I write some tests to assess launching
the analysis pipeline
I write some tests to incorporate a QC
launch into the post analysis pipeline
I run the tests, which fail
It3: Code
I decide first to add the QC launch
My boss wants to start getting the data
I get a quick view of how pluggable the
system actually is
It is good :)
It3: Code
The analysis guys want their pipeline to
start showing up
Good reason - a new version of the
scripts have appeared, and they don't
want to patch the old
This takes the rest of the iteration
It3: Release
The most important release so far
Completely replace old code with new
Took about 2 days, with bug fixing
It3: Evaluation
Bugs on Release - tests don't always
prove everything!
No time to DRY out the code
Successful product into production
Old code has gone to 'silicon heaven'
It4: Scrum
I again scrum
So far, iterations have been quite quick
In order for some time to pass for the
pipeline, I decide to do refactoring this
time
It4: Scrum
Utilising more Inheritance (using Moose
Roles)
Create external role to translate
attributes without building hashes each
time
It4: In Brief
After 2 weeks
» a nicely refactored pipeline
» external role to DRY out data (released to
CPAN)
» time to have monitored how the pipeline
was running
Release and go
The next few iterations
Iterations continue, releasing every 2-3
weeks :)
Until it all broke :(
The Broken Pipeline Iteration
Up until now, the pipeline had been
behaving itself.
New analysis code came from our
supplier, our R&D team would test, then
I would throw the switch and release.
The Broken Pipeline Iteration
However, they changed something we
didn't find in testing.
Runs with multiplexed lanes broke, as
they have an extra 'barcode' read
The Broken Pipeline Iteration
Luckily, here is where being agile really
helped.
Whilst I had just 'scrummed' to decide
my priorities, I just dropped them
New Priority – Fix the Pipeline
The Broken Pipeline Iteration
Pluggable, so could a function or two be
moved to help?
Yes! 1 function move would halve the
problem.
Run on example – expected outcome
The Broken Pipeline Iteration
Now to fix the 3 read / 2 read problem
Again, write tests, test, code, test, run on
example, write tests for bugs, test, code,
test, run on example ....
End of this iteration, able to release a
fully fixed pipeline
The Broken Pipeline Iteration
Evaluation:
Being Agile, both in project management
and design, helped here.
How?
The Broken Pipeline Iteration
Design:
Plugin design of the pipeline - half the
problem was solved just by moving
something.
The other part just by writing a new
module.
It just worked!
The Broken Pipeline Iteration
Project Management:
Changing an iterations priorities so that
the urgently required fix could be
done...
...barely disrupting the flow of work on
feature requests
What has happened since?
Development has settled into a 2-3
week release cycle
Team knows development position
Made it easier for them to cover me
What else happened since?
Acknowledgements
David Jackson
Guoying Qi
John O'Brien
Marina
Gourtovaia
Sri Deevi
Tom Skelly
Irina Abnizova
Steve Leonard
Tony Cox
You
Contact Me!
http://software-east.net/profile/AndyBrown
setitesuk@gmail.com
http://vampiresoftware.blogspot.com
http://twitter.com/setitesuk
http://www.slideshare.net/setitesuk
http://github.com/setitesuk

Weitere ähnliche Inhalte

Was ist angesagt?

Automated Regression Testing for Embedded Systems in Action
Automated Regression Testing for Embedded Systems in ActionAutomated Regression Testing for Embedded Systems in Action
Automated Regression Testing for Embedded Systems in ActionAANDTech
 
Automatic testing in DevOps
Automatic testing in DevOpsAutomatic testing in DevOps
Automatic testing in DevOpsBenoit Baudry
 
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Sung Kim
 
PVS-Studio and static code analysis technique
PVS-Studio and static code analysis techniquePVS-Studio and static code analysis technique
PVS-Studio and static code analysis techniqueAndrey Karpov
 
Refactoring Legacy Code
Refactoring Legacy CodeRefactoring Legacy Code
Refactoring Legacy CodeAdam Culp
 
When Good Code Goes Bad: Tools and Techniques for Troubleshooting Plone
When Good Code Goes Bad: Tools and Techniques for Troubleshooting PloneWhen Good Code Goes Bad: Tools and Techniques for Troubleshooting Plone
When Good Code Goes Bad: Tools and Techniques for Troubleshooting PloneDavid Glick
 
PVS-Studio Has Finally Got to Boost
PVS-Studio Has Finally Got to BoostPVS-Studio Has Finally Got to Boost
PVS-Studio Has Finally Got to BoostAndrey Karpov
 
Interpreter RPG to Java
Interpreter RPG to JavaInterpreter RPG to Java
Interpreter RPG to Javafarerobe
 
Testing distributed systems in production
Testing distributed systems in productionTesting distributed systems in production
Testing distributed systems in productionPaul Bakker
 
Code Smells Part 1: Basic Smells
Code Smells Part 1:  Basic SmellsCode Smells Part 1:  Basic Smells
Code Smells Part 1: Basic SmellsNancy Henson
 
SophiaConf 2018 - P. Urso (Activeeon)
SophiaConf 2018 - P. Urso (Activeeon)SophiaConf 2018 - P. Urso (Activeeon)
SophiaConf 2018 - P. Urso (Activeeon)TelecomValley
 
Test Presentation
Test PresentationTest Presentation
Test Presentationsetitesuk
 
Into The Box 2018 | Assert control over your legacy applications
Into The Box 2018 | Assert control over your legacy applicationsInto The Box 2018 | Assert control over your legacy applications
Into The Box 2018 | Assert control over your legacy applicationsOrtus Solutions, Corp
 
Unit & integration testing
Unit & integration testingUnit & integration testing
Unit & integration testingPavlo Hodysh
 
PVS-Studio confesses its love for Linux
PVS-Studio confesses its love for LinuxPVS-Studio confesses its love for Linux
PVS-Studio confesses its love for LinuxPVS-Studio
 
What I Learned From Writing a Test Framework (And Why I May Never Write One A...
What I Learned From Writing a Test Framework (And Why I May Never Write One A...What I Learned From Writing a Test Framework (And Why I May Never Write One A...
What I Learned From Writing a Test Framework (And Why I May Never Write One A...Daryl Walleck
 
Static Code Analysis: Keeping the Cost of Bug Fixing Down
Static Code Analysis:  Keeping the Cost of Bug Fixing DownStatic Code Analysis:  Keeping the Cost of Bug Fixing Down
Static Code Analysis: Keeping the Cost of Bug Fixing DownAndrey Karpov
 

Was ist angesagt? (20)

Automated Regression Testing for Embedded Systems in Action
Automated Regression Testing for Embedded Systems in ActionAutomated Regression Testing for Embedded Systems in Action
Automated Regression Testing for Embedded Systems in Action
 
PHPUnit - Unit testing
PHPUnit - Unit testingPHPUnit - Unit testing
PHPUnit - Unit testing
 
Automatic testing in DevOps
Automatic testing in DevOpsAutomatic testing in DevOps
Automatic testing in DevOps
 
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
 
PVS-Studio and static code analysis technique
PVS-Studio and static code analysis techniquePVS-Studio and static code analysis technique
PVS-Studio and static code analysis technique
 
Refactoring Legacy Code
Refactoring Legacy CodeRefactoring Legacy Code
Refactoring Legacy Code
 
When Good Code Goes Bad: Tools and Techniques for Troubleshooting Plone
When Good Code Goes Bad: Tools and Techniques for Troubleshooting PloneWhen Good Code Goes Bad: Tools and Techniques for Troubleshooting Plone
When Good Code Goes Bad: Tools and Techniques for Troubleshooting Plone
 
PVS-Studio Has Finally Got to Boost
PVS-Studio Has Finally Got to BoostPVS-Studio Has Finally Got to Boost
PVS-Studio Has Finally Got to Boost
 
Interpreter RPG to Java
Interpreter RPG to JavaInterpreter RPG to Java
Interpreter RPG to Java
 
Pragmatic Code Coverage
Pragmatic Code CoveragePragmatic Code Coverage
Pragmatic Code Coverage
 
Testing distributed systems in production
Testing distributed systems in productionTesting distributed systems in production
Testing distributed systems in production
 
Ensuring Software Quality in the cloud
Ensuring Software Quality in the cloudEnsuring Software Quality in the cloud
Ensuring Software Quality in the cloud
 
Code Smells Part 1: Basic Smells
Code Smells Part 1:  Basic SmellsCode Smells Part 1:  Basic Smells
Code Smells Part 1: Basic Smells
 
SophiaConf 2018 - P. Urso (Activeeon)
SophiaConf 2018 - P. Urso (Activeeon)SophiaConf 2018 - P. Urso (Activeeon)
SophiaConf 2018 - P. Urso (Activeeon)
 
Test Presentation
Test PresentationTest Presentation
Test Presentation
 
Into The Box 2018 | Assert control over your legacy applications
Into The Box 2018 | Assert control over your legacy applicationsInto The Box 2018 | Assert control over your legacy applications
Into The Box 2018 | Assert control over your legacy applications
 
Unit & integration testing
Unit & integration testingUnit & integration testing
Unit & integration testing
 
PVS-Studio confesses its love for Linux
PVS-Studio confesses its love for LinuxPVS-Studio confesses its love for Linux
PVS-Studio confesses its love for Linux
 
What I Learned From Writing a Test Framework (And Why I May Never Write One A...
What I Learned From Writing a Test Framework (And Why I May Never Write One A...What I Learned From Writing a Test Framework (And Why I May Never Write One A...
What I Learned From Writing a Test Framework (And Why I May Never Write One A...
 
Static Code Analysis: Keeping the Cost of Bug Fixing Down
Static Code Analysis:  Keeping the Cost of Bug Fixing DownStatic Code Analysis:  Keeping the Cost of Bug Fixing Down
Static Code Analysis: Keeping the Cost of Bug Fixing Down
 

Andere mochten auch

Pluggable Pipelines
Pluggable PipelinesPluggable Pipelines
Pluggable Pipelinessetitesuk
 
20120301 prezentacja slc niezbędnik właściciela sklepu internetowego
20120301 prezentacja slc niezbędnik właściciela sklepu internetowego20120301 prezentacja slc niezbędnik właściciela sklepu internetowego
20120301 prezentacja slc niezbędnik właściciela sklepu internetowegomarcinblaszyk
 
pipeline_structure_overview
pipeline_structure_overviewpipeline_structure_overview
pipeline_structure_overviewsetitesuk
 
Data Formats
Data FormatsData Formats
Data Formatssetitesuk
 
Pomodoro lightning talk
Pomodoro lightning talkPomodoro lightning talk
Pomodoro lightning talksetitesuk
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerLuminary Labs
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsLinkedIn
 

Andere mochten auch (7)

Pluggable Pipelines
Pluggable PipelinesPluggable Pipelines
Pluggable Pipelines
 
20120301 prezentacja slc niezbędnik właściciela sklepu internetowego
20120301 prezentacja slc niezbędnik właściciela sklepu internetowego20120301 prezentacja slc niezbędnik właściciela sklepu internetowego
20120301 prezentacja slc niezbędnik właściciela sklepu internetowego
 
pipeline_structure_overview
pipeline_structure_overviewpipeline_structure_overview
pipeline_structure_overview
 
Data Formats
Data FormatsData Formats
Data Formats
 
Pomodoro lightning talk
Pomodoro lightning talkPomodoro lightning talk
Pomodoro lightning talk
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 

Ähnlich wie Agile analysis development

Ntd2015_pt_kanban_ppt
Ntd2015_pt_kanban_pptNtd2015_pt_kanban_ppt
Ntd2015_pt_kanban_pptJokin Aspiazu
 
DevOps - Boldly Go for Distro
DevOps - Boldly Go for DistroDevOps - Boldly Go for Distro
DevOps - Boldly Go for DistroPaul Boos
 
Automating good coding practices
Automating good coding practicesAutomating good coding practices
Automating good coding practicesKevin Peterson
 
Serverless in production, an experience report (Going Serverless, 28 Feb 2018)
Serverless in production, an experience report (Going Serverless, 28 Feb 2018)Serverless in production, an experience report (Going Serverless, 28 Feb 2018)
Serverless in production, an experience report (Going Serverless, 28 Feb 2018)Domas Lasauskas
 
Life after Calc core change
Life after Calc core changeLife after Calc core change
Life after Calc core changeKohei Yoshida
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairClaire Le Goues
 
iHale Milestone 1 Feedback
iHale Milestone 1 FeedbackiHale Milestone 1 Feedback
iHale Milestone 1 FeedbackPhilip Johnson
 
Ruby codebases in an entropic universe
Ruby codebases in an entropic universeRuby codebases in an entropic universe
Ruby codebases in an entropic universeNiranjan Paranjape
 
Static Code Analysis PHP[tek] 2023
Static Code Analysis PHP[tek] 2023Static Code Analysis PHP[tek] 2023
Static Code Analysis PHP[tek] 2023Scott Keck-Warren
 
RedisConf18 - Implementing a New Data Structure for Redis
RedisConf18 - Implementing a New Data Structure for Redis  RedisConf18 - Implementing a New Data Structure for Redis
RedisConf18 - Implementing a New Data Structure for Redis Redis Labs
 
TDD Walkthrough - Encryption
TDD Walkthrough - EncryptionTDD Walkthrough - Encryption
TDD Walkthrough - EncryptionPeterKha2
 
STAMP, or Test Amplification to DevTestOps service, OW2con'18, June 7-8, 2018...
STAMP, or Test Amplification to DevTestOps service, OW2con'18, June 7-8, 2018...STAMP, or Test Amplification to DevTestOps service, OW2con'18, June 7-8, 2018...
STAMP, or Test Amplification to DevTestOps service, OW2con'18, June 7-8, 2018...OW2
 
Assessing Unit Test Quality
Assessing Unit Test QualityAssessing Unit Test Quality
Assessing Unit Test Qualityguest268ee8
 
Static analysis should be used regularly
Static analysis should be used regularlyStatic analysis should be used regularly
Static analysis should be used regularlyPVS-Studio
 
Test Driven Development on Android (Kotlin Kenya)
Test Driven Development on Android (Kotlin Kenya)Test Driven Development on Android (Kotlin Kenya)
Test Driven Development on Android (Kotlin Kenya)Danny Preussler
 
Poing: a coder’s take on protein modelling
Poing: a coder’s take on protein modellingPoing: a coder’s take on protein modelling
Poing: a coder’s take on protein modellingBiogeeks
 
Introduzione allo Unit Testing
Introduzione allo Unit TestingIntroduzione allo Unit Testing
Introduzione allo Unit TestingStefano Ottaviani
 
TDD super mondays-june-2014
TDD super mondays-june-2014TDD super mondays-june-2014
TDD super mondays-june-2014Alex Kavanagh
 

Ähnlich wie Agile analysis development (20)

Ntd2015_pt_kanban_ppt
Ntd2015_pt_kanban_pptNtd2015_pt_kanban_ppt
Ntd2015_pt_kanban_ppt
 
DevOps - Boldly Go for Distro
DevOps - Boldly Go for DistroDevOps - Boldly Go for Distro
DevOps - Boldly Go for Distro
 
Automating good coding practices
Automating good coding practicesAutomating good coding practices
Automating good coding practices
 
Serverless in production, an experience report (Going Serverless, 28 Feb 2018)
Serverless in production, an experience report (Going Serverless, 28 Feb 2018)Serverless in production, an experience report (Going Serverless, 28 Feb 2018)
Serverless in production, an experience report (Going Serverless, 28 Feb 2018)
 
Life after Calc core change
Life after Calc core changeLife after Calc core change
Life after Calc core change
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
 
iHale Milestone 1 Feedback
iHale Milestone 1 FeedbackiHale Milestone 1 Feedback
iHale Milestone 1 Feedback
 
Ruby codebases in an entropic universe
Ruby codebases in an entropic universeRuby codebases in an entropic universe
Ruby codebases in an entropic universe
 
Static Code Analysis PHP[tek] 2023
Static Code Analysis PHP[tek] 2023Static Code Analysis PHP[tek] 2023
Static Code Analysis PHP[tek] 2023
 
RedisConf18 - Implementing a New Data Structure for Redis
RedisConf18 - Implementing a New Data Structure for Redis  RedisConf18 - Implementing a New Data Structure for Redis
RedisConf18 - Implementing a New Data Structure for Redis
 
TDD Walkthrough - Encryption
TDD Walkthrough - EncryptionTDD Walkthrough - Encryption
TDD Walkthrough - Encryption
 
Testing In Django
Testing In DjangoTesting In Django
Testing In Django
 
STAMP, or Test Amplification to DevTestOps service, OW2con'18, June 7-8, 2018...
STAMP, or Test Amplification to DevTestOps service, OW2con'18, June 7-8, 2018...STAMP, or Test Amplification to DevTestOps service, OW2con'18, June 7-8, 2018...
STAMP, or Test Amplification to DevTestOps service, OW2con'18, June 7-8, 2018...
 
Assessing Unit Test Quality
Assessing Unit Test QualityAssessing Unit Test Quality
Assessing Unit Test Quality
 
Static analysis should be used regularly
Static analysis should be used regularlyStatic analysis should be used regularly
Static analysis should be used regularly
 
Test Driven Development on Android (Kotlin Kenya)
Test Driven Development on Android (Kotlin Kenya)Test Driven Development on Android (Kotlin Kenya)
Test Driven Development on Android (Kotlin Kenya)
 
Poing: a coder’s take on protein modelling
Poing: a coder’s take on protein modellingPoing: a coder’s take on protein modelling
Poing: a coder’s take on protein modelling
 
Introduzione allo Unit Testing
Introduzione allo Unit TestingIntroduzione allo Unit Testing
Introduzione allo Unit Testing
 
A Tale of Two Apps
A Tale of Two AppsA Tale of Two Apps
A Tale of Two Apps
 
TDD super mondays-june-2014
TDD super mondays-june-2014TDD super mondays-june-2014
TDD super mondays-june-2014
 

Kürzlich hochgeladen

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Agile analysis development

  • 3. Who Are We? One of the world's largest DNA Sequencing Centres Second largest compute centre after CERN in Europe
  • 4. What Do We Do? Human, Mouse, Zebrafish and Pathogen Genome Projects Post sequencing analysis, annotation and maintenance (It's never truly finished!)
  • 5. Who Am I? Tracking systems and analysis pipeline for Next Generation Sequencing Technologies Perl, Web Technologies, Moose
  • 6. Next Generation Sequencing? Massively Parallel DNA Sequencing Producing Millions of Reads per run ~38 instruments ~5Tb of data a day Managing quick turnaround on Staging of 320Tb data a month
  • 7. Analysis Convert Images to Bases Obtain quality values Recalibrate quality Separate up DNA sequences from different projects Do this in parallel Be able to extend this
  • 8. Analysis Current analysis running script was unable to cope with changing demands
  • 9. What Did I Have?
  • 10.
  • 12. Run Completes Bustard Adaptor Removal Split by Tag CIF Qseq, Sig2 Split by Tag Calibrate Scores Index: rejectsIndex: rejects Index: + tags Split by Tag Split by Tag Split by Tag Create Cal Table Cal Table Control Refs Calibrate Scores Consent Align Index: + tags Cal-Qseq Consent Align K-mer Error Correction Cal-Qseq Index: + consent K-mer Error Correction K-mer Error Correction K-mer Error Correction K-mer Error Correction K-mer Error Correction Index: + consent Index: + rejects K-mer Error Correction K-mer Error Correction K-mer Error Correction Create Fastq K-mer Error Correction K-mer Error Correction K-mer Error Correction Align to Ref Index: + rejects Fastq K-mer Error Correction K-mer Error Correction Create SRF Control Refs Sample Refs Next Page! BAM Initial Product Creation Initial Product Creation Gray boxes may be pass-through
  • 13. Control Refs SRF Sig2 Index fastq BAM Run Summary (Summary.htm stuff) IVC Plots Q20 Counts Fastqcheck Insert Size Histogram Error rates and QQ-Plots Heatmaps SNP Finder ... And Anything Else You Can Think Of Human QC Fuse Archive QC and Archival
  • 14.
  • 15.
  • 16. Working in a Agile Manner Current manner – still close to Cascade, some idea of iterations I wanted more agility – defined iterations Got close
  • 17. First Iteration - It1 Chop down the brief into stories Spoke with creator of the brief, my boss & team about what was needed Pluggable, Automatic, Auto QC
  • 18. It1: First bit of Coding Read old code – anything I can steal – yes! Write some 'in principle' tests to get an idea of the way to go. Write some code for those tests.
  • 20. It1: Fail Test Principle – Worked Reality – Too Unwieldy
  • 21. It1: Evaluation Too much wrapping Too much could go wrong with lots of parts Out the Window!
  • 22. Second Iteration - It2 So, I'm Agile. I don't see this as a set back. Opportunity to try a different approach. I sketch it out.
  • 23. Flag Waver Function b Function c Function d Function eFunction a Object to Launch Ca Object to Launch Cb Object to Launch Cc Object to Launch Cd Object to Launch Ce Component a Component b Component c Component d Component e
  • 24. It2: Second lot of Coding Again, start off with in principle tests Write some code to pass those tests Select a bit of real world to apply it to
  • 25. It2: Pass This real world bit works All jobs are launched as expected Replace the old section with this bit It still works :) A perfect replacement
  • 26. It2: Evaluation Success :) The Flag Waver model - functions that know what to do, but no knowledge of other functions This should make it pluggable
  • 27. It2: Evaluation Bulky data getting generated multiple times over – Needs more DRYness
  • 28. It3: Some new requests It would be easier to code if we didn't have users of the applications! The first new request comes in for some automated QC Just launch them at the correct time
  • 29. It3: Scrum So, I scrum. The objective: Work out priorities for this iteration. There are many 'stories', I decide on the following.
  • 30. It3: Scrum Write something to make data construction and passing more DRY Write another replacement pipeline section Try to incorporate 1 QC into previous pipeline section
  • 31. It3: Tests I write some tests to assess launching the analysis pipeline I write some tests to incorporate a QC launch into the post analysis pipeline I run the tests, which fail
  • 32. It3: Code I decide first to add the QC launch My boss wants to start getting the data I get a quick view of how pluggable the system actually is It is good :)
  • 33. It3: Code The analysis guys want their pipeline to start showing up Good reason - a new version of the scripts have appeared, and they don't want to patch the old This takes the rest of the iteration
  • 34. It3: Release The most important release so far Completely replace old code with new Took about 2 days, with bug fixing
  • 35. It3: Evaluation Bugs on Release - tests don't always prove everything! No time to DRY out the code Successful product into production Old code has gone to 'silicon heaven'
  • 36. It4: Scrum I again scrum So far, iterations have been quite quick In order for some time to pass for the pipeline, I decide to do refactoring this time
  • 37. It4: Scrum Utilising more Inheritance (using Moose Roles) Create external role to translate attributes without building hashes each time
  • 38. It4: In Brief After 2 weeks » a nicely refactored pipeline » external role to DRY out data (released to CPAN) » time to have monitored how the pipeline was running Release and go
  • 39. The next few iterations Iterations continue, releasing every 2-3 weeks :) Until it all broke :(
  • 40. The Broken Pipeline Iteration Up until now, the pipeline had been behaving itself. New analysis code came from our supplier, our R&D team would test, then I would throw the switch and release.
  • 41. The Broken Pipeline Iteration However, they changed something we didn't find in testing. Runs with multiplexed lanes broke, as they have an extra 'barcode' read
  • 42. The Broken Pipeline Iteration Luckily, here is where being agile really helped. Whilst I had just 'scrummed' to decide my priorities, I just dropped them New Priority – Fix the Pipeline
  • 43. The Broken Pipeline Iteration Pluggable, so could a function or two be moved to help? Yes! 1 function move would halve the problem. Run on example – expected outcome
  • 44. The Broken Pipeline Iteration Now to fix the 3 read / 2 read problem Again, write tests, test, code, test, run on example, write tests for bugs, test, code, test, run on example .... End of this iteration, able to release a fully fixed pipeline
  • 45. The Broken Pipeline Iteration Evaluation: Being Agile, both in project management and design, helped here. How?
  • 46. The Broken Pipeline Iteration Design: Plugin design of the pipeline - half the problem was solved just by moving something. The other part just by writing a new module. It just worked!
  • 47. The Broken Pipeline Iteration Project Management: Changing an iterations priorities so that the urgently required fix could be done... ...barely disrupting the flow of work on feature requests
  • 48. What has happened since? Development has settled into a 2-3 week release cycle Team knows development position Made it easier for them to cover me
  • 50.
  • 51.
  • 52.
  • 53.
  • 54. Acknowledgements David Jackson Guoying Qi John O'Brien Marina Gourtovaia Sri Deevi Tom Skelly Irina Abnizova Steve Leonard Tony Cox You

Hinweis der Redaktion

  1. Intro myself Here to talk about the agile development process of the analysis pipeline I developed
  2. We are the Welcome Trust Sanger Institute. Here is a picture of our campius which is south of Cambridge in Hinxton
  3. We are one of the worlds largest DNA sequencing centres. Until fairly recently, the largest, but we have been overtaken in the last few years by some centres in America, and then the Chnese have blown all the competition away. We also have the largest compute centre in Europe after CERN. Biology has very much moved into the informatics domain, and unlike many other disciplines such as Physics, which have had the time to develop their compute infrastructure over the 15+yrs it takes to design the rest of the experiments, Biology has jumped such that we are sometimes lucky to get 1.5 motnhs.
  4. Originally set up to sequence 1 third of the Human Genome, we also worked on the Mouse Genome. We have also sequenced other organisms and Pathgens ourselves. We are also involved in the Post sequencing analysis, annotation and sequence maintenance. Contrary to popular belief, the sequences are never truly finished.
  5. I'm a software developer in a group responsible for producing the tracking systems and running the primary analysis pipeline for the Next Generation Sequencing Instruments. I mostly develop using Perl, Web Technologies and Moose
  6. What is Next Generation Sequencing. The Human Genome cost millions of dollars, and took around 15 years to complete. Very costly, as it used a sequencing techinique which just did a few strands of DNA at a time. NGS is Massively Parallel Sequencing of strads, sequenicng millions at a time. However, with approx 38 instruments, producing around 5Tb of data a day, that is a lot forus to deal with. We have a quick turnaround of 320Tb of data a month.
  7. Here is an outline of the primary amalysis requirements, which needs to be done within 2 weeks of the run completing on an instrument.
  8. Our current analysis pipeline running script was unable to cope with the changing demands.
  9. Time – I had a little bit of time to look at how to approach this, and the best way to structure it
  10. A suggestion of what the whole pipeline would be needed to do
  11. Vision, Ideas and Enthusiasm
  12. Desire to develop in a more agile way, even if the rest of my teams focus wasn't quite along with that
  13. We had always had visions of working in an agile manner, but in reality it was still quite close to cascade. We had tried to apply some idea of iterations, but it mostly meant I've done a feature, if no-one has objections, I'll release it. I wanted more than that, I wanted Agility ad the idea of defined iterations. I got close with this.
  14. For my first iteration, I decided to look at the brief, chop it down into manageable chunks or stories, and decide look at what we wanted, then prototype it I spoke with the creater of the brief (who had been running the previous script up until now), my boss and got some ideas from my team. We wanted something that was pluggable, automatic, and would be able to produce some QC.
  15. So, in my first bout of coding, I started by reading the old, and now convoluted, script. Mostly looking for anything I could steal for a prototype. I wrote some tests of how I wanted it to work. I then worte my prototype
  16. I had heard of someone elses pipeline which involved a daemon running constantly, polling for finished jobs beofre launching the next. We had already decicded that this was not that feasible for us, since it would involve too much overhead of a daemon per run, approx 5 new daemons per day. I hit upon an idea of a script which would Know the order to launch jobs Know which job had just been launched – via a command line parameter Launch the next job Launch itself, with the state, and a dependency on the previous job, to feed through to launch the next, or just finish
  17. In principle, it worked. However, in reality it was an epic fail, It got too unwieldywith more that parts, and that was before we wanted to parallise jobs
  18. Also it had too much wrapping code, dealing with processing what had gone before it. We decided that just too much could go wrong with it. So we threw it out of the window.
  19. Part of being agile is not to worry. I had only spent a week on this, so don't see it as a set back. It was a prototype, and a first attempt. I learmt from it. This surprised my boss, since I'm normally quite particlular about my work, and I don't like things going wrong. Well, unless I deliberately plan it that way. I spent 3 weeks trying to prove that for us, message queues miight be unreliable, and finally did. I took this as a chance to try a different approach. I sketched it out.
  20. This approach got termed the flag waver. A central function would launch in turn smaller functions which would call out to objects. Those objects would know how to launch other programs from the pipeline, and would only feedback the minimal amount of information required to launch the next one. Using lsf, this would only need to be the job ids of the launched programs, which would then be used as job requirements for the next launched job. This fitted nicely with the idea of it being pluggable, as the different functions should be loosely coupled. The order might be important a before b, but it might not matter if c got put in between them.
  21. With this idea, I got back on coding. I write some in principle test, and some code to pass those tests. I then try it on some real world to see it it still is ok, as before.
  22. This time the real world works! All my jobs launch and go through an example as expected. I replace the old section I had taken (some post analysis analysis) and actually install it. WooHoo, it is a perfect replacement.
  23. Evaluating again – success! I now have a model which is loosely coupled enough to make it pluggable – I hope. I'm also happy enough with this as a prototype to move it into a production scale framework
  24. However, I notice something about the way the functions are getting called. Some data is in the form of bulky hashrefs, and geneated multiple times over
  25. My boss and other team members have read some of my books
  26. We scrum daily
  27. Sprints and feature requests are tracked using RT
  28. Our productivity has increased