Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

2018 Bio-IT World Agile in Wet Labs Speeds Big Data

129 Aufrufe

Veröffentlicht am

2018 Bio-IT World Using Agile Techniques in Wet Labs to Speed the Creation of Even More Big Data, with Kendra West

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

2018 Bio-IT World Agile in Wet Labs Speeds Big Data

  1. 1. Using Agile Techniques in Wet Labs to Speed the Creation of Even More Big Data Bruce Kozuma, Principal System Analyst Kendra West, Scrum Master, Data Sciences and Data Engineering Thursday 2018/05/17, Bio-IT World
  2. 2. About the Authors • Bruce Kozuma is a Principal Systems Analyst in IT • Connect via LinkedIn: https://linkedin.com/in/bkozuma • Kendra West is a Scrum Master in Data Sciences and Data Engineering • Connect via LinkedIn: https://linkedin.com/in/kendraleighwe st
  3. 3. Core Members ~10 Institute Members ~38 Associate Members ~322 Employees ~1000 Post-Docs, Fellows & Scholars in Residence ~100 Visiting Scientists, Staff & Researchers ~750 Students ~550 Post-Docs/Partner Institutions ~600 Over 3,400 Broadies working together
  4. 4. About the Broad Institute of MIT and Harvard • Propelling the understanding and treatment of disease • Collaborating deeply • Reaching globally • Empowering scientists • Building partnerships • Sharing data and knowledge • Promoting inclusion
  5. 5. The Agile Manifesto Individuals & Interactions > Processes & Tools *Delivering Value > Comprehensive Documentation Customer Collaboration > Contract Negotiation Responding to Change > Following a Plan *adapted to fit organizational needs
  6. 6. What is the Agile approach? • We follow Twelve Agile Principles behind the Manifesto: • Our highest priority is to satisfy the customer through early and continuous delivery of value • Welcome changing requirements, even late in development; Agile processes harness change for the customer's competitive advantage • Deliver frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale • Value delivery is the primary measure of progress Frequent delivery and feedback
  7. 7. What is the Agile approach? • We follow Twelve Agile Principles behind the Manifesto: • Business people and developers must work together daily throughout the project • The most efficient and effective method of conveying information to and within a development team is face-to-face conversation • The best architectures, requirements, and designs emerge from self-organizing teams • At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly Teams communicating openly
  8. 8. What is the Agile approach? • We follow Twelve Agile Principles behind the Manifesto: • Build projects around motivated individuals; Give them the environment and support they need, and trust them to get the job done • Agile processes promote sustainable development; The sponsors, developers, and users should be able to maintain a constant pace indefinitely • Continuous attention to technical excellence and good design enhances agility • Simplicity – the art of maximizing the amount of work not done – is essential Doing our best work
  9. 9. What is Scrum? • An Agile framework • Born in Boston • 90% of Agile teams worldwide use Scrum • Borrows its name from rugby
  10. 10. Scrum Values, Pillars, and Elements Scrum values OpeneSs Courage Respect FocUs ComMitment Scrum pillars • Transparency • Inspection • Adaptation Scrum team • Product Owner • Scrum Master • Development Team Scrum events • The Sprint • Sprint Planning • Daily Scrum • Sprint Review • Sprint Retrospective Scrum artifacts • Product Backlog • Sprint Backlog • Increment • Definition of Done
  11. 11. The Broad’s mission embodies many Agile values! Broad Mission • Propelling the understanding and treatment of disease • Collaborating deeply • Reaching globally • Empowering scientists • Building partnerships • Sharing data and knowledge • Promoting inclusion Agile themes • Frequent delivery & feedback • Teams communicating openly • Doing our best work Too many arrows!
  12. 12. How to measure Big Data? • Classic way is via Doug Laney’s Volume, Velocity, Variety model • Volume: size of data (e.g., total size of a data set, number of records, number of files, size of files) • Velocity: Rate at which data produced and changed (e.g., production of BAMs, changes in UCSC genome releases GRCh37 vs hg17) • Variety: • Diversity of formats (e.g., FASTQ, BAM, VCF, CRAM) • Non-aligned data structures (e.g., CDISC) • Inconsistent data semantics (e.g., cell line names)
  13. 13. Thesis of this talk! • Using Agile techniques in wet labs and computational science speeds production of big data in multiple dimensions • Volume • Increases number of samples sequenced • Lowers cost of sequencinganalysis and barriers to clinical sequencing • Velocity • Reduces cycle time of physical sample preparation prior to sequencing • Improves use of people and resources in lab work • Variety • Increases types of samples being sequenced (e.g., types of cells, diseases, ethnic and geographic diversity, nomenclatures, APIs, and repositories)
  14. 14. Broad Institute launched Initial $100M gift from Broad Foundations; A 10-year “experiment” in collaborative science Broad doubles in size Governed by MIT-Harvard leadership; Administratively managed within MIT Headquarters building opens 250,000 sq. ft. at 415 Main Street Broads double initial gift to $200M Unrestricted for Broad research and operations Creation of Stanley Center Founding $100M, 10-year gift from Stanley Medical Research Institute “Experiment” declared a success Broads announce new endowment of $400 million Combined $600M Current Use + Endowment Gift Carlos Slim Foundation provides $65M New initiative in genomic disease research; 1st U.S. collaboration to receive funding Stanley building opens at 75 Ames Street Second gift of $74M Slim Initiative for Genomic Medicine for the Americas 10th anniversary $100M gift from Broad Foundations to launch next decade of science Creation of the Klarman Cell Observatory Klarman Family Foundation gift of 33M Commitment of $650M Ted Stanley invests in psychiatric research 2002 2004 2007 2008 2009 2010 2012 2013 20142006 2015 Broad Genomics GP and DSP align Genomics Platform BSP Arrays and Sequencing merge Volume – Size of sequenced sample x # samples 100,000 genomes ~ 70 PB of data ~ 825K BAM files ~ 1.2 billion hours of streaming music Two major research groups come together Whitehead/MIT Center for Genome Research; Harvard Institute of Chemistry and Cell Biology Broad Institute, Inc. established 501(c)3 formed 9/08; Operations begin 7/09
  15. 15. Velocity • Sequence cost/genome fallen ~$1K • Cost to analyze a genome has also fallen to ~$5 • Why does this matter? Precision/Personalized medicine involves more sequencing • Assert: Agile increases velocity of reducing costs via shorter cycle times, cheaper reagents, reusable software, better use of people, etc.
  16. 16. Velocity – Sample preparation and sequencing • Reduces cycle time of physical sample preparation prior to sequencing • Improves use of people and resources in lab work • How? Using Dynamic Work Design • Principle #1: Constant reconciliation of intent and activity • Principle #2: Regular use of structured problem solving • Principle #3: Optimal challenge • Principle #4: Connect the human chain
  17. 17. Velocity – Sample preparation and sequencing • Genomics Platform achieves these results through better technology: • Instruments • Software • Reagents • Training • Organization
  18. 18. Velocity – Sample preparation and sequencing • Dynamic Work Design shares many similarities with Agile/Scrum and uses many of the same techniques: • Visual management • Morning production meeting • Pull system (Kanban)
  19. 19. Velocity – People and resources
  20. 20. Velocity – People and resources
  21. 21. Velocity – People and resources • PRISM for multiplexing screen of compounds against cancer cell lines (wet lab) • Dependency Map a public portal for cancer data (wet lab, COTS software, software development) Agile practices used • Retrospectives • Standups • Sprints • Kaizen • Visual board
  22. 22. Velocity – People and resources • Improving use of people and resources in data science by enabling reuse • Data Biosphere: modular and interoperable components that can be assembled into diverse data environments. The Data Biosphere should be based on four governing principles. It should be: • (1) modular, composed of functional components with well-specified interfaces • (2) community-driven, created by many groups to foster a diversity of ideas; • (3) open, developed under open-source licenses that enable extensibility and reuse, with users able to add custom, proprietary modules as needed • (4) standards-based, consistent with standards developed by coalitions such as the Global Alliance for Genomics and Health (GA4GH) Agile values • Deliver value • Work together • Self-organizing teams • Simplicity
  23. 23. Variety • Increases types of samples being sequenced in additional dimensions, e.g., • Types and sources of cells • Types of diseases • Ethnic and geographic diversity • Nomenclatures, APIs, and repositories • Agile practices being applied in each case, speeding the processing of samples and the creation of both sample metadata and genomic data
  24. 24. Variety – Types and sources of cells • Agile principles being used by Broad labs involved in Human Cell Atlas to manage wet lab work (e.g., visual boards, retrospectives) • Agile used to develop portals to enable patients, at scale, to sign up and consent for studies, and for sample processing
  25. 25. Variety – Ethnic and geographic diversity • In 2016, 81% of participants in Genome-Wide Association Studies (GWAS) of European descent, where African, Latin American, native or indigenous make up less than 4% • Agile practices used to further studies in under- represented populations (e.g., visual management, short delivery cycles)
  26. 26. Variety – Types of diseases • Agile practices used to aid the study of a wider range of diseases, e.g., • The Sabeti Lab uses Agile practices in their work on infectious diseases to enable real-time sharing of genomic data
  27. 27. Variety – Nomenclatures, APIs, and repositories • Nomenclatures are critically important to sharing data and promoting collaboration (e.g., cell lines) • Broad scientists, both wet lab and data, are key contributors to organizations and alliances that have and promote sharing of data through public (and coordinated) APIs • Agile practices used by both groups in their daily work!
  28. 28. How the Broad encourages adoption of Agile • Encourages collaboration within the Broad, e.g., • Platforms (e.g., Genomics, Data Sciences) • Programs (e.g., Cancer, Infectious Disease and Microbiome) • Academic labs (e.g., Sabeti Lab, Regev Lab) • Employs Agile within scientific groups and administration, e.g., • Data Sciences Platform has Agile coaches, Scrum Masters, and Product Owners as job descriptions/titles • Broad Information Technology Services employs Scrum for specific projects • Supports affinity groups and offers related training • Agile Academia, focused specifically on educating and spreading use of Agile • PM@Broad, focused on traditional project management, but PMI embracing Agile… • People Development workshops (e.g., Influencing without Authority, Matrix Management)
  29. 29. Recapitulation – Thesis of this talk! • Using Agile techniques in wet labs and computational science speeds production of big data in multiple dimensions • Volume • Increases number of samples sequenced • Lowers cost of sequencinganalysis and barriers to clinical sequencing • Velocity • Reduces cycle time of physical sample preparation prior to sequencing • Improves use of people and resources in lab work • Variety • Increases types of samples being sequenced (e.g., types of cells, diseases, ethnic and geographic diversity, nomenclatures, APIs, and repositories)
  30. 30. Acknowledgements • Mark Baker • Michelle Campo • Jean Chang • Raymond Coderre • Sheila Dodge • Vicky Guo • Andrew Hollinger • Eric Jones • Jen Lapan • Yenarae Lee • Anthony Losada • William Mayo • Peter Ragone • Jennifer Roth Thank you to the many people who helped paved the way for current and future success! A few notable individuals: • Katie Shakun • David Siedzik • Rocky Stroud • Diolinda Vaz • Sarah Winnicki Broad Alumni • Sadiya Akasha • Zeyna Haddad

×