SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Pistoia	
  Alliance	
  Sequence	
  Squeeze	
  
              Using	
  a	
  compe--on	
  model	
  to	
  spur	
  development	
  of	
  novel	
  open-­‐source	
  algorithms	
  



              Richard	
  Holland	
  (Eagle/Pistoia),	
  Nick	
  Lynch	
  (AZ/Pistoia)	
  

              BOSC	
                                                                                    July	
  2012	
  


©Eagle	
  Genomics	
  Ltd.   	
  	
  

	
  




                                                                    ©Eagle	
  Genomics	
  Ltd	
  	
  
Order	
  of	
  Service	
  


•       What/who	
  is	
  the	
  Pistoia	
  Alliance?	
  
•       What	
  is/was	
  Sequence	
  Squeeze?	
  
•       Who	
  won,	
  how,	
  and	
  why?	
  
•       Why	
  did	
  Pistoia	
  do	
  this?	
  
•       Why	
  is	
  this	
  good	
  for	
  BOSC	
  delegates?	
  
•       Will	
  it	
  happen	
  again?	
  


Pistoia	
  Alliance	
  Sequence	
  Squeeze	
              ©Eagle	
  Genomics	
  Ltd	
  	
     July	
  14,	
  2012	
     2	
  
What/who	
  is	
  the	
  Pistoia	
  Alliance?	
  


Pistoia	
  Alliance	
  Sequence	
  Squeeze	
     ©Eagle	
  Genomics	
  Ltd	
  	
     July	
  14,	
  2012	
     3	
  
Who	
  is	
  Pistoia?	
  

•  The	
  Pistoia	
  Alliance	
  is	
  
           –       global	
  
           –       not-­‐for-­‐profit	
  
           –       precompeWWve	
  alliance	
  	
  
           –       life	
  science	
  companies,	
  vendors,	
  publishers,	
  and	
  academic	
  groups	
  
           –       aims	
  to	
  lower	
  barriers	
  to	
  innovaWon	
  	
  
           –       by	
  improving	
  the	
  interoperability	
  of	
  R&D	
  business	
  processes.	
  
•  We	
  differ	
  from	
  standards	
  groups	
  because	
  	
  
           –  we	
  bring	
  together	
  the	
  key	
  consWtuents	
  to	
  idenWfy	
  the	
  root	
  causes	
  that	
  
              lead	
  to	
  R&D	
  inefficiencies	
  	
  
           –  develop	
  best	
  pracWces	
  and	
  technology	
  pilots	
  to	
  overcome	
  common	
  
              obstacles.	
  



Pistoia	
  Alliance	
  Sequence	
  Squeeze	
                ©Eagle	
  Genomics	
  Ltd	
  	
     July	
  14,	
  2012	
     4	
  
What	
  is/was	
  Sequence	
  Squeeze?	
  


Pistoia	
  Alliance	
  Sequence	
  Squeeze	
     ©Eagle	
  Genomics	
  Ltd	
  	
     July	
  14,	
  2012	
     5	
  
	
  
The	
  NGS	
  problem	
  


•  Storing	
  millions	
  of	
  NGS	
  reads	
  and	
  their	
  quality	
  scores	
  
   uncompressed	
  is	
  imprac,cal,	
  yet	
  current	
  compression	
  
   technologies	
  are	
  becoming	
  inadequate.	
  	
  
•  There	
  is	
  a	
  need	
  for	
  a	
  new	
  and	
  novel	
  method	
  of	
  
   compressing	
  sequence	
  reads	
  and	
  their	
  quality	
  scores	
  in	
  
   a	
  way	
  that	
  preserves	
  100%	
  of	
  the	
  informa,on	
  whilst	
  
   achieving	
  much-­‐improved	
  linear	
  (or,	
  even	
  beer,	
  non-­‐
   linear)	
  compression	
  raWos.	
  



Pistoia	
  Alliance	
  Sequence	
  Squeeze	
            ©Eagle	
  Genomics	
  Ltd	
  	
     July	
  14,	
  2012	
     6	
  
What	
  was	
  Sequence	
  Squeeze?	
  

•  Contest	
  to	
  find	
  a	
  beer	
  FASTQ	
  compression	
  algorithm	
  
           –  easiest	
  format	
  for	
  ranking	
  entries	
  in	
  an	
  automated	
  se_ng.	
  
•  Open	
  source,	
  non-­‐restricWve	
  licence	
  required	
  for	
  entries	
  
           –  benefit	
  the	
  whole	
  community.	
  
•  Entries	
  tested	
  on	
  an	
  extract	
  of	
  the	
  1000	
  genomes	
  data	
  stored	
  in	
  AWS.	
  
•  Prize	
  fund	
  of	
  US$15,000	
  to	
  the	
  best	
  algorithm	
  submied	
  before	
  the	
  
   closing	
  date	
  of	
  15	
  March	
  2012.	
  	
  
•  Winner	
  was	
  announced	
  at	
  the	
  Pistoia	
  Alliance	
  Conference	
  in	
  Boston	
  MA	
  
   on	
  24	
  April	
  2012	
  
           –  more	
  on	
  that	
  story	
  later.	
  
•  Organised	
  and	
  administered	
  by	
  Eagle	
  under	
  contract	
  to	
  Pistoia.	
  




Pistoia	
  Alliance	
  Sequence	
  Squeeze	
                   ©Eagle	
  Genomics	
  Ltd	
  	
     July	
  14,	
  2012	
     7	
  
Who	
  entered?	
  


•  108	
  disWnct	
  entries.	
  
•  But	
  all	
  these	
  from	
  only	
  12	
  entrants!	
  
           –  some	
  entrants	
  were	
  groups	
  or	
  consorWa	
  but	
  most	
  
              were	
  individuals.	
  
•  Public	
  leaderboard	
  encouraged	
  fiercer	
  
   compeWWon.	
  
•  Entrants	
  seemingly	
  driven	
  to	
  outdo	
  their	
  
   compeWtors.	
  

Pistoia	
  Alliance	
  Sequence	
  Squeeze	
           ©Eagle	
  Genomics	
  Ltd	
  	
     July	
  14,	
  2012	
     8	
  
Who	
  judged?	
  


•  Yingrui	
  Li	
  –	
  Duty	
  OperaWon	
  Officer	
  of	
  Science	
  &	
  
   Technology	
  Department	
  of	
  the	
  BGI-­‐Shenzhen.	
  
•  Nick	
  Lynch	
  –	
  President	
  of	
  the	
  Pistoia	
  Alliance	
  
   (2009-­‐11).	
  
•  Guy	
  Coates	
  –	
  leader	
  of	
  the	
  InformaWcs	
  Systems	
  
   Group	
  at	
  the	
  Wellcome	
  Trust	
  Sanger	
  InsWtute.	
  
•  Tim	
  Fennell	
  –	
  Assistant	
  Director	
  for	
  Sequencing	
  
   Pipeline	
  InformaWcs	
  at	
  the	
  Broad	
  InsWtute.	
  

Pistoia	
  Alliance	
  Sequence	
  Squeeze	
           ©Eagle	
  Genomics	
  Ltd	
  	
     July	
  14,	
  2012	
     9	
  
Who	
  won,	
  how,	
  and	
  why?	
  


Pistoia	
  Alliance	
  Sequence	
  Squeeze	
              ©Eagle	
  Genomics	
  Ltd	
  	
     July	
  14,	
  2012	
     10	
  
	
  
What	
  were	
  the	
  results?	
  


•  Entrants	
  were	
  judged	
  by	
  
           –       compression	
  raWo	
  
           –       compression	
  Wme	
  and	
  memory	
  
           –       decompression	
  Wme	
  and	
  memory	
  
           –       accuracy	
  (lossiness	
  –	
  100%	
  target)	
  
           –       manual	
  review	
  for	
  code	
  quality,	
  scalability,	
  and	
  other	
  factors.	
  
•  The	
  same	
  three	
  people	
  showed	
  up	
  at	
  the	
  top	
  of	
  every	
  
   category	
  
           –  in	
  a	
  different	
  order	
  
           –  with	
  different	
  versions	
  of	
  their	
  entries.	
  



Pistoia	
  Alliance	
  Sequence	
  Squeeze	
               ©Eagle	
  Genomics	
  Ltd	
  	
     July	
  14,	
  2012	
     11	
  
Who	
  won,	
  and	
  why?	
  


•  James	
  Bonfield	
  won	
  overall	
  
           –  majority	
  of	
  top	
  places	
  in	
  each	
  category	
  
           –  using	
  various	
  versions	
  of	
  his	
  entry	
  
           –  forming	
  a	
  suite	
  of	
  suitable	
  tools.	
  
•  11.41%	
  compression	
  raWo	
  (test	
  data	
  ~6GB)	
  
           –       or	
  109.90	
  seconds	
  compression	
  Wme	
  
           –       or	
  100.91	
  seconds	
  decompression	
  Wme	
  
           –       or	
  35.76MB	
  compression	
  memory	
  usage	
  
           –       or	
  16.01MB	
  decompression	
  memory	
  usage	
  
           –       but	
  not	
  all	
  at	
  once!	
  



Pistoia	
  Alliance	
  Sequence	
  Squeeze	
             ©Eagle	
  Genomics	
  Ltd	
  	
     July	
  14,	
  2012	
     12	
  
ImplicaWons	
  of	
  winning	
  	
  
                                       entry	
  

•  The	
  approach	
  is	
  very	
  simple	
  –	
  essenWally:	
  
           –  convert	
  the	
  FASTQ	
  to	
  BAM	
  alignments	
  against	
  a	
  
              reference	
  genome,	
  preserving	
  quality	
  scores.	
  
           –  compress	
  the	
  BAM	
  files.	
  	
  
•  Many	
  other	
  entries	
  followed	
  the	
  same	
  paern:	
  	
  
           –  convert	
  to	
  some	
  other	
  format	
  then	
  compress	
  using	
  
              standard	
  techniques.	
  



Pistoia	
  Alliance	
  Sequence	
  Squeeze	
             ©Eagle	
  Genomics	
  Ltd	
  	
     July	
  14,	
  2012	
     13	
  
Other	
  interesWng	
  	
  
                                       results	
  
•  Ma	
  Mahoney	
  (Dell)	
  submied	
  a	
  specialised	
  version	
  of	
  the	
  
   standard	
  tool	
  paq	
  which	
  performed	
  extremely	
  well.	
  
•  Even	
  vanilla	
  paq	
  wasn’t	
  too	
  bad.	
  
•  Discarding	
  the	
  quality	
  scores	
  enWrely	
  gets	
  a	
  compression	
  raWo	
  of	
  
   2.87%	
  vs.	
  the	
  original	
  FASTQ	
  (not	
  FASTA).	
  
•  If	
  this	
  contest	
  truly	
  represented	
  the	
  latest	
  and	
  greatest	
  ideas	
  in	
  the	
  
   field,	
  then	
  NGS	
  storage	
  must	
  therefore	
  either	
  be	
  	
  
           –  highly	
  compressed,	
  very	
  slow	
  access,	
  	
  
           –  or	
  less	
  compressed,	
  relaWvely	
  fast	
  access.	
  
•  Its	
  quite	
  hard	
  to	
  beat	
  bzip2.	
  




Pistoia	
  Alliance	
  Sequence	
  Squeeze	
              ©Eagle	
  Genomics	
  Ltd	
  	
     July	
  14,	
  2012	
     14	
  
David	
  Flanders	
  (Eagle	
  CEO)	
  and	
  John	
  Wise	
  (Pistoia	
  chairman)	
  present	
  James	
  Bonfield	
  with	
  his	
  prize.	
  




And	
  unexpected	
  benefits	
  
James	
  Bonfield	
  donated	
  his	
  enWre	
  prize	
  fund	
  –	
  US$15,000	
  –	
  to	
  charity.	
  
           50%	
  to	
  the	
  Wellcome	
  Trust	
  Sanger	
  InsWtute.	
  
           50%	
  to	
  the	
  BriWsh	
  Heart	
  FoundaWon.	
  
	
  

Pistoia	
  Alliance	
  Sequence	
  Squeeze	
                                                  ©Eagle	
  Genomics	
  Ltd	
  	
     July	
  14,	
  2012	
                                                     15	
  
PublicaWon	
  

•  Formal	
  paper	
  being	
  wrien	
  at	
  the	
  moment	
  by	
  James	
  Bonfield	
  
           –  in	
  collaboraWon	
  with	
  close-­‐second	
  Ma	
  Mahoney	
  
           –  and	
  judge	
  Nick	
  Lynch	
  
           –  and	
  the	
  authors	
  of	
  other	
  significant	
  entries.	
  
•  Source	
  code	
  of	
  ALL	
  entries	
  is	
  available	
  at	
  www.sequencesqueeze.org	
  	
  
           –  all	
  under	
  BSD	
  licence	
  
           –  all	
  hosted	
  at	
  SourceForge	
  or	
  similar	
  
           –  click	
  entry	
  names	
  to	
  be	
  taken	
  to	
  download	
  page.	
  
•  Interviews	
  with	
  entrants	
  at	
  the	
  Pistoia	
  blog	
  www.pistoiaalliance.org/blog	
  
           –  search	
  for	
  arWcles	
  with	
  the	
  tag	
  ‘compression	
  algorithms’.	
  




Pistoia	
  Alliance	
  Sequence	
  Squeeze	
                      ©Eagle	
  Genomics	
  Ltd	
  	
     July	
  14,	
  2012	
     16	
  
Why	
  did	
  Pistoia	
  do	
  this?	
  


Pistoia	
  Alliance	
  Sequence	
  Squeeze	
                   ©Eagle	
  Genomics	
  Ltd	
  	
     July	
  14,	
  2012	
     17	
  
	
  
Why	
  did	
  Pistoia	
  do	
  this?	
  


•  Encouraging	
  innovaWon	
  through	
  prize-­‐backed	
  
   contests.	
  	
  
•  Open	
  innovaWon	
  model	
  allows	
  industry	
  to	
  
   state	
  its	
  requirements	
  
           –  then	
  let	
  the	
  free	
  market	
  decide	
  how	
  to	
  deliver	
  
              something	
  that	
  saWsfies	
  these.	
  



Pistoia	
  Alliance	
  Sequence	
  Squeeze	
               ©Eagle	
  Genomics	
  Ltd	
  	
     July	
  14,	
  2012	
     18	
  
Why	
  did	
  Pistoia	
  do	
  this?	
  

•  Typical	
  bioinformaWcs	
  open-­‐source	
  hackers	
  do	
  things	
  because	
  they	
  
   enjoy	
  them	
  
           –  but	
  someWmes	
  also	
  because	
  of	
  the	
  challenge,	
  the	
  kudos,	
  the	
  
              saWsfacWon	
  of	
  solving	
  a	
  real-­‐world	
  problem.	
  
•  James’	
  charity	
  donaWon	
  is	
  a	
  great	
  example	
  of	
  this	
  
           –  he	
  wasn’t	
  in	
  it	
  for	
  the	
  money	
  
           –  but	
  the	
  prize	
  fund	
  created	
  a	
  tangible	
  goal	
  to	
  aim	
  at.	
  
•  Amazon	
  kindly	
  sponsored	
  vouchers	
  for	
  all	
  parWcipants	
  that	
  should	
  
   have	
  covered	
  the	
  cost	
  of	
  developing	
  and	
  submi_ng	
  an	
  entry	
  
           –  contest	
  was	
  AWS-­‐based	
  
           –  entries	
  had	
  to	
  be	
  submied	
  as	
  S3	
  buckets.	
  




Pistoia	
  Alliance	
  Sequence	
  Squeeze	
                     ©Eagle	
  Genomics	
  Ltd	
  	
     July	
  14,	
  2012	
     19	
  
Why	
  did	
  Pistoia	
  do	
  this?	
  


•  Leaderboard	
  encouraged	
  compeWWon	
  
           –  one-­‐upmanship	
  
           –  innovaWon.	
  
•  Does	
  not	
  discourage	
  collaboraWon	
  
           –  James	
  and	
  Ma	
  both	
  discussed	
  their	
  entries	
  with	
  
              the	
  data	
  compression	
  community	
  at	
  encode.ru	
  	
  



Pistoia	
  Alliance	
  Sequence	
  Squeeze	
               ©Eagle	
  Genomics	
  Ltd	
  	
     July	
  14,	
  2012	
     20	
  
Why	
  did	
  Pistoia	
  do	
  this?	
  


•  BSD-­‐licence	
  requirement	
  ensured	
  that	
  the	
  
   winning	
  entry	
  was	
  not	
  going	
  to	
  be	
  available	
  
   only	
  to	
  those	
  willing	
  to	
  pay	
  a	
  fee.	
  
•  EnWre	
  community	
  benefits,	
  not	
  just	
  Pistoia	
  
   members	
  or	
  those	
  with	
  deep	
  pockets	
  to	
  pay	
  
   for	
  sosware	
  licence	
  agreements.	
  


Pistoia	
  Alliance	
  Sequence	
  Squeeze	
               ©Eagle	
  Genomics	
  Ltd	
  	
     July	
  14,	
  2012	
     21	
  
Why	
  is	
  this	
  good	
  for	
  BOSC	
  delegates?	
  


Pistoia	
  Alliance	
  Sequence	
  Squeeze	
     ©Eagle	
  Genomics	
  Ltd	
  	
     July	
  14,	
  2012	
     22	
  
	
  
Why	
  is	
  this	
  good	
  for	
  	
  
                                       BOSC	
  delegates?	
  
•  If	
  the	
  entries	
  had	
  been	
  closed/commercial	
  then	
  only	
  organisaWons	
  willing	
  
   to	
  pay	
  to	
  licence/buy	
  the	
  resulWng	
  products	
  would	
  benefit.	
  
•  But	
  this	
  way	
  the	
  enWre	
  community	
  benefits	
  from	
  results,	
  for	
  free,	
  without	
  
   restricWon.	
  	
  
•  Beneficiaries	
  include	
  big	
  pharma	
  and	
  other	
  large	
  corporaWons	
  that	
  
   commissioned	
  the	
  contest	
  	
  
           –       but	
  also	
  all	
  universiWes	
  	
  
           –       all	
  non-­‐profits	
  
           –       all	
  small	
  businesses	
  in	
  biotech	
  
           –       and	
  everyone	
  else	
  involved	
  in	
  NGS	
  work.	
  
•  Pistoia	
  is	
  about	
  pre-­‐compeWWve	
  alliance	
  	
  
           –  there	
  is	
  no	
  reason	
  to	
  make	
  the	
  Alliance’s	
  output	
  exclusive	
  
           –  they	
  are	
  there	
  to	
  develop	
  and	
  share	
  ideas,	
  not	
  to	
  build	
  an	
  empire.	
  



Pistoia	
  Alliance	
  Sequence	
  Squeeze	
                         ©Eagle	
  Genomics	
  Ltd	
  	
     July	
  14,	
  2012	
     23	
  
Will	
  it	
  happen	
  again?	
  


Pistoia	
  Alliance	
  Sequence	
  Squeeze	
                    ©Eagle	
  Genomics	
  Ltd	
  	
     July	
  14,	
  2012	
     24	
  
	
  
Will	
  it	
  happen	
  again?	
  


•       Pleased	
  with	
  outcome	
  and	
  level	
  of	
  interest.	
  
•       So,	
  yes.	
  
•       Goal	
  is	
  to	
  run	
  two	
  such	
  contests	
  a	
  year.	
  
•       But,	
  your	
  community	
  needs	
  you!	
  
           –  we	
  need	
  a	
  topic/subject/idea	
  that	
  can	
  be	
  raWonally/objecWvely	
  
              judged/ranked	
  
           –  and	
  that	
  is	
  relevant	
  to	
  the	
  research	
  acWviWes	
  of	
  life	
  science	
  
              companies	
  and	
  other	
  Pistoia	
  members.	
  
•  Ideas	
  can	
  be	
  sent	
  to	
  Pistoia	
  Ops	
  team	
  c/o	
  
   execdirector@pistoiaalliance.org	
  	
  


Pistoia	
  Alliance	
  Sequence	
  Squeeze	
              ©Eagle	
  Genomics	
  Ltd	
  	
     July	
  14,	
  2012	
     25	
  
Credits	
  


•       Pistoia	
  Alliance	
  for	
  the	
  idea	
  and	
  funding.	
  
•       Eagle	
  for	
  organising	
  and	
  administering.	
  
•       All	
  contestants	
  for	
  entering.	
  
•       1000	
  Genomes	
  for	
  the	
  test	
  data.	
  
•       AWS	
  for	
  sponsoring	
  parWcipants.	
  
•       BOSC/OBF	
  for	
  accepWng	
  this	
  talk.	
  


Pistoia	
  Alliance	
  Sequence	
  Squeeze	
         ©Eagle	
  Genomics	
  Ltd	
  	
     July	
  14,	
  2012	
     26	
  
www.pistoiaalliance.org	
  
richard.holland@eaglegenomics.com	
                                                                                         www.sequencesqueeze.org	
                                                                        +44	
  (0)1223	
  654481	
  x3	
  
(ideas	
  to:	
  execdirector@pistoiaalliance.org	
  )	
  
                                                                                                                             www.eaglegenomics.com	
  
                                                                                                                                       	
  
                                                                                                                                   @eaglegen	
  
                                                                                                                                                                                                                                 blog.eaglegenomics.com	
  
                	
  	
  	
  	
  	
  	
  	
  facebook.com/eaglegenomics	
                                                              	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  @sequencesqueeze	
  
                                                                                                                                                                                                                              www.pistoiaalliance.org/blog	
  
                                                                                                                                                              	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  @pistoiaalliance	
  

        Eagle®	
  is	
  a	
  registered	
  trademark	
  no.	
  010418135	
  of	
  Eagle	
  Genomics	
  Ltd.	
  	
  	
  
        Postal	
  address:	
  Eagle	
  Genomics	
  Ltd.,	
  Babraham	
  Research	
  Campus,	
  Cambridge	
  CB22	
  3AT,	
  United	
  Kingdom.	
  




©Eagle	
  Genomics	
  Ltd.          	
  	
  

	
  




                                                                                                                                                               ©Eagle	
  Genomics	
  Ltd	
  	
  

Weitere ähnliche Inhalte

Ähnlich wie Holland R - Pistoia Alliance Sequence Squeeze

Will Spooner - Big Data in Mental Health - 23rd July 2014
Will Spooner - Big Data in Mental Health - 23rd July 2014Will Spooner - Big Data in Mental Health - 23rd July 2014
Will Spooner - Big Data in Mental Health - 23rd July 2014kclcompbio
 
Digital Processes with PowerPath Barcodes, Scanning and Digital Imaging
Digital Processes with PowerPath Barcodes, Scanning and Digital ImagingDigital Processes with PowerPath Barcodes, Scanning and Digital Imaging
Digital Processes with PowerPath Barcodes, Scanning and Digital ImagingChris Godin✪
 
Shona Wilson 1330 Wed
Shona Wilson   1330 WedShona Wilson   1330 Wed
Shona Wilson 1330 WedSankaran Nair
 
ISPE: A Catalyst for Change
ISPE: A Catalyst for ChangeISPE: A Catalyst for Change
ISPE: A Catalyst for ChangeLinda Brady
 
The quality attribute of upgradability
The quality attribute of upgradabilityThe quality attribute of upgradability
The quality attribute of upgradabilityLen Bass
 
Gemba walk: the start of your lean journey
Gemba walk: the start of your lean journeyGemba walk: the start of your lean journey
Gemba walk: the start of your lean journeyboscollkid
 
How to avoid 5 critical mistakes in CLIA manufacturing
How to avoid 5 critical mistakes in CLIA manufacturingHow to avoid 5 critical mistakes in CLIA manufacturing
How to avoid 5 critical mistakes in CLIA manufacturingandrea_castan
 
Groovy Testing Sep2009
Groovy Testing Sep2009Groovy Testing Sep2009
Groovy Testing Sep2009Paul King
 
2009 06 01 The Lean Startup Texas Edition
2009 06 01 The Lean Startup Texas Edition2009 06 01 The Lean Startup Texas Edition
2009 06 01 The Lean Startup Texas EditionEric Ries
 
Danube hack 2015 - Open (-data, -communities)
Danube hack 2015 - Open (-data, -communities)Danube hack 2015 - Open (-data, -communities)
Danube hack 2015 - Open (-data, -communities)Jachym Cepicky
 
Eric Ries Lean Startup Presentation For Web 2.0 Expo April 1 2009 A Disciplin...
Eric Ries Lean Startup Presentation For Web 2.0 Expo April 1 2009 A Disciplin...Eric Ries Lean Startup Presentation For Web 2.0 Expo April 1 2009 A Disciplin...
Eric Ries Lean Startup Presentation For Web 2.0 Expo April 1 2009 A Disciplin...Eric Ries
 
Introduction Challenges In Agile And How To Overcome Them
Introduction Challenges In Agile And How To Overcome ThemIntroduction Challenges In Agile And How To Overcome Them
Introduction Challenges In Agile And How To Overcome ThemConSanFrancisco123
 
Groovy Testing Aug2009
Groovy Testing Aug2009Groovy Testing Aug2009
Groovy Testing Aug2009guest4a266c
 
6C Skrøvseth Data-driven analytics for decision support EHiN 2014
6C Skrøvseth Data-driven analytics for decision support EHiN 20146C Skrøvseth Data-driven analytics for decision support EHiN 2014
6C Skrøvseth Data-driven analytics for decision support EHiN 2014IKT-Norge
 
Eye_Disease_Prem.pptx_MSDS.23.15[1].pptx
Eye_Disease_Prem.pptx_MSDS.23.15[1].pptxEye_Disease_Prem.pptx_MSDS.23.15[1].pptx
Eye_Disease_Prem.pptx_MSDS.23.15[1].pptxJafarHussain48
 
N.naumenko d.lapienis osa_project
N.naumenko d.lapienis osa_projectN.naumenko d.lapienis osa_project
N.naumenko d.lapienis osa_projectECR Community
 

Ähnlich wie Holland R - Pistoia Alliance Sequence Squeeze (20)

Will Spooner - Big Data in Mental Health - 23rd July 2014
Will Spooner - Big Data in Mental Health - 23rd July 2014Will Spooner - Big Data in Mental Health - 23rd July 2014
Will Spooner - Big Data in Mental Health - 23rd July 2014
 
Digital Processes with PowerPath Barcodes, Scanning and Digital Imaging
Digital Processes with PowerPath Barcodes, Scanning and Digital ImagingDigital Processes with PowerPath Barcodes, Scanning and Digital Imaging
Digital Processes with PowerPath Barcodes, Scanning and Digital Imaging
 
Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
 
Applying Lean Six Sigma in Healthcare
Applying Lean Six Sigma in HealthcareApplying Lean Six Sigma in Healthcare
Applying Lean Six Sigma in Healthcare
 
Shona Wilson 1330 Wed
Shona Wilson   1330 WedShona Wilson   1330 Wed
Shona Wilson 1330 Wed
 
ISPE: A Catalyst for Change
ISPE: A Catalyst for ChangeISPE: A Catalyst for Change
ISPE: A Catalyst for Change
 
Everyday de novo assembly
Everyday de novo assemblyEveryday de novo assembly
Everyday de novo assembly
 
The quality attribute of upgradability
The quality attribute of upgradabilityThe quality attribute of upgradability
The quality attribute of upgradability
 
Gemba walk: the start of your lean journey
Gemba walk: the start of your lean journeyGemba walk: the start of your lean journey
Gemba walk: the start of your lean journey
 
How to avoid 5 critical mistakes in CLIA manufacturing
How to avoid 5 critical mistakes in CLIA manufacturingHow to avoid 5 critical mistakes in CLIA manufacturing
How to avoid 5 critical mistakes in CLIA manufacturing
 
Groovy Testing Sep2009
Groovy Testing Sep2009Groovy Testing Sep2009
Groovy Testing Sep2009
 
2009 06 01 The Lean Startup Texas Edition
2009 06 01 The Lean Startup Texas Edition2009 06 01 The Lean Startup Texas Edition
2009 06 01 The Lean Startup Texas Edition
 
Danube hack 2015 - Open (-data, -communities)
Danube hack 2015 - Open (-data, -communities)Danube hack 2015 - Open (-data, -communities)
Danube hack 2015 - Open (-data, -communities)
 
Eric Ries Lean Startup Presentation For Web 2.0 Expo April 1 2009 A Disciplin...
Eric Ries Lean Startup Presentation For Web 2.0 Expo April 1 2009 A Disciplin...Eric Ries Lean Startup Presentation For Web 2.0 Expo April 1 2009 A Disciplin...
Eric Ries Lean Startup Presentation For Web 2.0 Expo April 1 2009 A Disciplin...
 
Introduction Challenges In Agile And How To Overcome Them
Introduction Challenges In Agile And How To Overcome ThemIntroduction Challenges In Agile And How To Overcome Them
Introduction Challenges In Agile And How To Overcome Them
 
II-SDV 2016 Expert System
II-SDV 2016 Expert SystemII-SDV 2016 Expert System
II-SDV 2016 Expert System
 
Groovy Testing Aug2009
Groovy Testing Aug2009Groovy Testing Aug2009
Groovy Testing Aug2009
 
6C Skrøvseth Data-driven analytics for decision support EHiN 2014
6C Skrøvseth Data-driven analytics for decision support EHiN 20146C Skrøvseth Data-driven analytics for decision support EHiN 2014
6C Skrøvseth Data-driven analytics for decision support EHiN 2014
 
Eye_Disease_Prem.pptx_MSDS.23.15[1].pptx
Eye_Disease_Prem.pptx_MSDS.23.15[1].pptxEye_Disease_Prem.pptx_MSDS.23.15[1].pptx
Eye_Disease_Prem.pptx_MSDS.23.15[1].pptx
 
N.naumenko d.lapienis osa_project
N.naumenko d.lapienis osa_projectN.naumenko d.lapienis osa_project
N.naumenko d.lapienis osa_project
 

Mehr von Jan Aerts

VIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationVIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationJan Aerts
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Jan Aerts
 
Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Jan Aerts
 
Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Jan Aerts
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Jan Aerts
 
Humanizing Data Analysis
Humanizing Data AnalysisHumanizing Data Analysis
Humanizing Data AnalysisJan Aerts
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualizationJan Aerts
 
L Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsL Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsJan Aerts
 
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...Jan Aerts
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloudJan Aerts
 
B Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing ConsortiumB Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing ConsortiumJan Aerts
 
J Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJ Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJan Aerts
 
B Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisB Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisJan Aerts
 
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...Jan Aerts
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...Jan Aerts
 
S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...Jan Aerts
 
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...Jan Aerts
 
A Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining componentsA Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining componentsJan Aerts
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesJan Aerts
 
B Kinoshita - Creating biology pipelines with BioUno
B Kinoshita - Creating biology pipelines with BioUnoB Kinoshita - Creating biology pipelines with BioUno
B Kinoshita - Creating biology pipelines with BioUnoJan Aerts
 

Mehr von Jan Aerts (20)

VIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationVIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic Variation
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?
 
Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?
 
Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)
 
Humanizing Data Analysis
Humanizing Data AnalysisHumanizing Data Analysis
Humanizing Data Analysis
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
 
L Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsL Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformatics
 
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 
B Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing ConsortiumB Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing Consortium
 
J Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJ Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis Framework
 
B Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisB Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysis
 
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
 
S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...
 
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
 
A Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining componentsA Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining components
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutes
 
B Kinoshita - Creating biology pipelines with BioUno
B Kinoshita - Creating biology pipelines with BioUnoB Kinoshita - Creating biology pipelines with BioUno
B Kinoshita - Creating biology pipelines with BioUno
 

Kürzlich hochgeladen

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Kürzlich hochgeladen (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Holland R - Pistoia Alliance Sequence Squeeze

  • 1. Pistoia  Alliance  Sequence  Squeeze   Using  a  compe--on  model  to  spur  development  of  novel  open-­‐source  algorithms   Richard  Holland  (Eagle/Pistoia),  Nick  Lynch  (AZ/Pistoia)   BOSC   July  2012   ©Eagle  Genomics  Ltd.       ©Eagle  Genomics  Ltd    
  • 2. Order  of  Service   •  What/who  is  the  Pistoia  Alliance?   •  What  is/was  Sequence  Squeeze?   •  Who  won,  how,  and  why?   •  Why  did  Pistoia  do  this?   •  Why  is  this  good  for  BOSC  delegates?   •  Will  it  happen  again?   Pistoia  Alliance  Sequence  Squeeze   ©Eagle  Genomics  Ltd     July  14,  2012   2  
  • 3. What/who  is  the  Pistoia  Alliance?   Pistoia  Alliance  Sequence  Squeeze   ©Eagle  Genomics  Ltd     July  14,  2012   3  
  • 4. Who  is  Pistoia?   •  The  Pistoia  Alliance  is   –  global   –  not-­‐for-­‐profit   –  precompeWWve  alliance     –  life  science  companies,  vendors,  publishers,  and  academic  groups   –  aims  to  lower  barriers  to  innovaWon     –  by  improving  the  interoperability  of  R&D  business  processes.   •  We  differ  from  standards  groups  because     –  we  bring  together  the  key  consWtuents  to  idenWfy  the  root  causes  that   lead  to  R&D  inefficiencies     –  develop  best  pracWces  and  technology  pilots  to  overcome  common   obstacles.   Pistoia  Alliance  Sequence  Squeeze   ©Eagle  Genomics  Ltd     July  14,  2012   4  
  • 5. What  is/was  Sequence  Squeeze?   Pistoia  Alliance  Sequence  Squeeze   ©Eagle  Genomics  Ltd     July  14,  2012   5    
  • 6. The  NGS  problem   •  Storing  millions  of  NGS  reads  and  their  quality  scores   uncompressed  is  imprac,cal,  yet  current  compression   technologies  are  becoming  inadequate.     •  There  is  a  need  for  a  new  and  novel  method  of   compressing  sequence  reads  and  their  quality  scores  in   a  way  that  preserves  100%  of  the  informa,on  whilst   achieving  much-­‐improved  linear  (or,  even  beer,  non-­‐ linear)  compression  raWos.   Pistoia  Alliance  Sequence  Squeeze   ©Eagle  Genomics  Ltd     July  14,  2012   6  
  • 7. What  was  Sequence  Squeeze?   •  Contest  to  find  a  beer  FASTQ  compression  algorithm   –  easiest  format  for  ranking  entries  in  an  automated  se_ng.   •  Open  source,  non-­‐restricWve  licence  required  for  entries   –  benefit  the  whole  community.   •  Entries  tested  on  an  extract  of  the  1000  genomes  data  stored  in  AWS.   •  Prize  fund  of  US$15,000  to  the  best  algorithm  submied  before  the   closing  date  of  15  March  2012.     •  Winner  was  announced  at  the  Pistoia  Alliance  Conference  in  Boston  MA   on  24  April  2012   –  more  on  that  story  later.   •  Organised  and  administered  by  Eagle  under  contract  to  Pistoia.   Pistoia  Alliance  Sequence  Squeeze   ©Eagle  Genomics  Ltd     July  14,  2012   7  
  • 8. Who  entered?   •  108  disWnct  entries.   •  But  all  these  from  only  12  entrants!   –  some  entrants  were  groups  or  consorWa  but  most   were  individuals.   •  Public  leaderboard  encouraged  fiercer   compeWWon.   •  Entrants  seemingly  driven  to  outdo  their   compeWtors.   Pistoia  Alliance  Sequence  Squeeze   ©Eagle  Genomics  Ltd     July  14,  2012   8  
  • 9. Who  judged?   •  Yingrui  Li  –  Duty  OperaWon  Officer  of  Science  &   Technology  Department  of  the  BGI-­‐Shenzhen.   •  Nick  Lynch  –  President  of  the  Pistoia  Alliance   (2009-­‐11).   •  Guy  Coates  –  leader  of  the  InformaWcs  Systems   Group  at  the  Wellcome  Trust  Sanger  InsWtute.   •  Tim  Fennell  –  Assistant  Director  for  Sequencing   Pipeline  InformaWcs  at  the  Broad  InsWtute.   Pistoia  Alliance  Sequence  Squeeze   ©Eagle  Genomics  Ltd     July  14,  2012   9  
  • 10. Who  won,  how,  and  why?   Pistoia  Alliance  Sequence  Squeeze   ©Eagle  Genomics  Ltd     July  14,  2012   10    
  • 11. What  were  the  results?   •  Entrants  were  judged  by   –  compression  raWo   –  compression  Wme  and  memory   –  decompression  Wme  and  memory   –  accuracy  (lossiness  –  100%  target)   –  manual  review  for  code  quality,  scalability,  and  other  factors.   •  The  same  three  people  showed  up  at  the  top  of  every   category   –  in  a  different  order   –  with  different  versions  of  their  entries.   Pistoia  Alliance  Sequence  Squeeze   ©Eagle  Genomics  Ltd     July  14,  2012   11  
  • 12. Who  won,  and  why?   •  James  Bonfield  won  overall   –  majority  of  top  places  in  each  category   –  using  various  versions  of  his  entry   –  forming  a  suite  of  suitable  tools.   •  11.41%  compression  raWo  (test  data  ~6GB)   –  or  109.90  seconds  compression  Wme   –  or  100.91  seconds  decompression  Wme   –  or  35.76MB  compression  memory  usage   –  or  16.01MB  decompression  memory  usage   –  but  not  all  at  once!   Pistoia  Alliance  Sequence  Squeeze   ©Eagle  Genomics  Ltd     July  14,  2012   12  
  • 13. ImplicaWons  of  winning     entry   •  The  approach  is  very  simple  –  essenWally:   –  convert  the  FASTQ  to  BAM  alignments  against  a   reference  genome,  preserving  quality  scores.   –  compress  the  BAM  files.     •  Many  other  entries  followed  the  same  paern:     –  convert  to  some  other  format  then  compress  using   standard  techniques.   Pistoia  Alliance  Sequence  Squeeze   ©Eagle  Genomics  Ltd     July  14,  2012   13  
  • 14. Other  interesWng     results   •  Ma  Mahoney  (Dell)  submied  a  specialised  version  of  the   standard  tool  paq  which  performed  extremely  well.   •  Even  vanilla  paq  wasn’t  too  bad.   •  Discarding  the  quality  scores  enWrely  gets  a  compression  raWo  of   2.87%  vs.  the  original  FASTQ  (not  FASTA).   •  If  this  contest  truly  represented  the  latest  and  greatest  ideas  in  the   field,  then  NGS  storage  must  therefore  either  be     –  highly  compressed,  very  slow  access,     –  or  less  compressed,  relaWvely  fast  access.   •  Its  quite  hard  to  beat  bzip2.   Pistoia  Alliance  Sequence  Squeeze   ©Eagle  Genomics  Ltd     July  14,  2012   14  
  • 15. David  Flanders  (Eagle  CEO)  and  John  Wise  (Pistoia  chairman)  present  James  Bonfield  with  his  prize.   And  unexpected  benefits   James  Bonfield  donated  his  enWre  prize  fund  –  US$15,000  –  to  charity.   50%  to  the  Wellcome  Trust  Sanger  InsWtute.   50%  to  the  BriWsh  Heart  FoundaWon.     Pistoia  Alliance  Sequence  Squeeze   ©Eagle  Genomics  Ltd     July  14,  2012   15  
  • 16. PublicaWon   •  Formal  paper  being  wrien  at  the  moment  by  James  Bonfield   –  in  collaboraWon  with  close-­‐second  Ma  Mahoney   –  and  judge  Nick  Lynch   –  and  the  authors  of  other  significant  entries.   •  Source  code  of  ALL  entries  is  available  at  www.sequencesqueeze.org     –  all  under  BSD  licence   –  all  hosted  at  SourceForge  or  similar   –  click  entry  names  to  be  taken  to  download  page.   •  Interviews  with  entrants  at  the  Pistoia  blog  www.pistoiaalliance.org/blog   –  search  for  arWcles  with  the  tag  ‘compression  algorithms’.   Pistoia  Alliance  Sequence  Squeeze   ©Eagle  Genomics  Ltd     July  14,  2012   16  
  • 17. Why  did  Pistoia  do  this?   Pistoia  Alliance  Sequence  Squeeze   ©Eagle  Genomics  Ltd     July  14,  2012   17    
  • 18. Why  did  Pistoia  do  this?   •  Encouraging  innovaWon  through  prize-­‐backed   contests.     •  Open  innovaWon  model  allows  industry  to   state  its  requirements   –  then  let  the  free  market  decide  how  to  deliver   something  that  saWsfies  these.   Pistoia  Alliance  Sequence  Squeeze   ©Eagle  Genomics  Ltd     July  14,  2012   18  
  • 19. Why  did  Pistoia  do  this?   •  Typical  bioinformaWcs  open-­‐source  hackers  do  things  because  they   enjoy  them   –  but  someWmes  also  because  of  the  challenge,  the  kudos,  the   saWsfacWon  of  solving  a  real-­‐world  problem.   •  James’  charity  donaWon  is  a  great  example  of  this   –  he  wasn’t  in  it  for  the  money   –  but  the  prize  fund  created  a  tangible  goal  to  aim  at.   •  Amazon  kindly  sponsored  vouchers  for  all  parWcipants  that  should   have  covered  the  cost  of  developing  and  submi_ng  an  entry   –  contest  was  AWS-­‐based   –  entries  had  to  be  submied  as  S3  buckets.   Pistoia  Alliance  Sequence  Squeeze   ©Eagle  Genomics  Ltd     July  14,  2012   19  
  • 20. Why  did  Pistoia  do  this?   •  Leaderboard  encouraged  compeWWon   –  one-­‐upmanship   –  innovaWon.   •  Does  not  discourage  collaboraWon   –  James  and  Ma  both  discussed  their  entries  with   the  data  compression  community  at  encode.ru     Pistoia  Alliance  Sequence  Squeeze   ©Eagle  Genomics  Ltd     July  14,  2012   20  
  • 21. Why  did  Pistoia  do  this?   •  BSD-­‐licence  requirement  ensured  that  the   winning  entry  was  not  going  to  be  available   only  to  those  willing  to  pay  a  fee.   •  EnWre  community  benefits,  not  just  Pistoia   members  or  those  with  deep  pockets  to  pay   for  sosware  licence  agreements.   Pistoia  Alliance  Sequence  Squeeze   ©Eagle  Genomics  Ltd     July  14,  2012   21  
  • 22. Why  is  this  good  for  BOSC  delegates?   Pistoia  Alliance  Sequence  Squeeze   ©Eagle  Genomics  Ltd     July  14,  2012   22    
  • 23. Why  is  this  good  for     BOSC  delegates?   •  If  the  entries  had  been  closed/commercial  then  only  organisaWons  willing   to  pay  to  licence/buy  the  resulWng  products  would  benefit.   •  But  this  way  the  enWre  community  benefits  from  results,  for  free,  without   restricWon.     •  Beneficiaries  include  big  pharma  and  other  large  corporaWons  that   commissioned  the  contest     –  but  also  all  universiWes     –  all  non-­‐profits   –  all  small  businesses  in  biotech   –  and  everyone  else  involved  in  NGS  work.   •  Pistoia  is  about  pre-­‐compeWWve  alliance     –  there  is  no  reason  to  make  the  Alliance’s  output  exclusive   –  they  are  there  to  develop  and  share  ideas,  not  to  build  an  empire.   Pistoia  Alliance  Sequence  Squeeze   ©Eagle  Genomics  Ltd     July  14,  2012   23  
  • 24. Will  it  happen  again?   Pistoia  Alliance  Sequence  Squeeze   ©Eagle  Genomics  Ltd     July  14,  2012   24    
  • 25. Will  it  happen  again?   •  Pleased  with  outcome  and  level  of  interest.   •  So,  yes.   •  Goal  is  to  run  two  such  contests  a  year.   •  But,  your  community  needs  you!   –  we  need  a  topic/subject/idea  that  can  be  raWonally/objecWvely   judged/ranked   –  and  that  is  relevant  to  the  research  acWviWes  of  life  science   companies  and  other  Pistoia  members.   •  Ideas  can  be  sent  to  Pistoia  Ops  team  c/o   execdirector@pistoiaalliance.org     Pistoia  Alliance  Sequence  Squeeze   ©Eagle  Genomics  Ltd     July  14,  2012   25  
  • 26. Credits   •  Pistoia  Alliance  for  the  idea  and  funding.   •  Eagle  for  organising  and  administering.   •  All  contestants  for  entering.   •  1000  Genomes  for  the  test  data.   •  AWS  for  sponsoring  parWcipants.   •  BOSC/OBF  for  accepWng  this  talk.   Pistoia  Alliance  Sequence  Squeeze   ©Eagle  Genomics  Ltd     July  14,  2012   26  
  • 27. www.pistoiaalliance.org   richard.holland@eaglegenomics.com   www.sequencesqueeze.org   +44  (0)1223  654481  x3   (ideas  to:  execdirector@pistoiaalliance.org  )   www.eaglegenomics.com     @eaglegen   blog.eaglegenomics.com                facebook.com/eaglegenomics                                  @sequencesqueeze   www.pistoiaalliance.org/blog                      @pistoiaalliance   Eagle®  is  a  registered  trademark  no.  010418135  of  Eagle  Genomics  Ltd.       Postal  address:  Eagle  Genomics  Ltd.,  Babraham  Research  Campus,  Cambridge  CB22  3AT,  United  Kingdom.   ©Eagle  Genomics  Ltd.       ©Eagle  Genomics  Ltd