Submit Search
Upload
Computing Workflows for Biologists: An Overview
•
0 likes
•
266 views
T
tracykteal
Follow
Talk based on paper from Shade and Teal on Computing Workflows for Biologists: A Roadmap
Read less
Read more
Data & Analytics
Report
Share
Report
Share
1 of 20
Download now
Download to read offline
Recommended
Responsible conduct of research: Data Management
Responsible conduct of research: Data Management
C. Tobin Magle
Data processing and analysis final
Data processing and analysis final
Akul10
Database Engine
Database Engine
prashanthbabu07
Introduction to Data Science
Introduction to Data Science
Caserta
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
Duncan Hull
Technical Presentation
Technical Presentation
Naito Watanabe
Josh Wills, MLconf 2013
Josh Wills, MLconf 2013
MLconf
Data Structure Assignment Help
Data Structure Assignment Help
Lesa Cote
Recommended
Responsible conduct of research: Data Management
Responsible conduct of research: Data Management
C. Tobin Magle
Data processing and analysis final
Data processing and analysis final
Akul10
Database Engine
Database Engine
prashanthbabu07
Introduction to Data Science
Introduction to Data Science
Caserta
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
Duncan Hull
Technical Presentation
Technical Presentation
Naito Watanabe
Josh Wills, MLconf 2013
Josh Wills, MLconf 2013
MLconf
Data Structure Assignment Help
Data Structure Assignment Help
Lesa Cote
Lecture 1 introduction
Lecture 1 introduction
Abirami A
The Data Analysis Workflow
The Data Analysis Workflow
JonathanEarley3
Presentation on data preparation with pandas
Presentation on data preparation with pandas
AkshitaKanther
Machine Learning using Big data
Machine Learning using Big data
Vaibhav Kurkute
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
Lars Marius Garshol
Machine Learning in the age of Big Data
Machine Learning in the age of Big Data
Daniel Sârbe
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
Sri Ambati
Data preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
Data Processing DOH Workshop.pptx
Data Processing DOH Workshop.pptx
charlslabarda
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable Workflows
Ilkay Altintas, Ph.D.
Data Management: Tips & Tools
Data Management: Tips & Tools
Stephanie Wright
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
Rachel Berryman
Building Data Scientists
Building Data Scientists
Mitch Sanders
Data Engineer vs Data Scientist vs Data Analyst.pptx
Data Engineer vs Data Scientist vs Data Analyst.pptx
CarolineRebeccaD
Data Management for librarians
Data Management for librarians
C. Tobin Magle
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Maninda Edirisooriya
The Art of Requesting Data from IT
The Art of Requesting Data from IT
Brad Adams
CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217
lyarmey
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Adaryl "Bob" Wakefield, MBA
Introduction to Bioinformatics
Introduction to Bioinformatics
Leighton Pritchard
Best practices data collection
Best practices data collection
Sherry Lake
More Related Content
What's hot
Lecture 1 introduction
Lecture 1 introduction
Abirami A
The Data Analysis Workflow
The Data Analysis Workflow
JonathanEarley3
Presentation on data preparation with pandas
Presentation on data preparation with pandas
AkshitaKanther
Machine Learning using Big data
Machine Learning using Big data
Vaibhav Kurkute
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
Lars Marius Garshol
Machine Learning in the age of Big Data
Machine Learning in the age of Big Data
Daniel Sârbe
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
Sri Ambati
What's hot
(7)
Lecture 1 introduction
Lecture 1 introduction
The Data Analysis Workflow
The Data Analysis Workflow
Presentation on data preparation with pandas
Presentation on data preparation with pandas
Machine Learning using Big data
Machine Learning using Big data
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
Machine Learning in the age of Big Data
Machine Learning in the age of Big Data
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
Similar to Computing Workflows for Biologists: An Overview
Data preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
Data Processing DOH Workshop.pptx
Data Processing DOH Workshop.pptx
charlslabarda
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable Workflows
Ilkay Altintas, Ph.D.
Data Management: Tips & Tools
Data Management: Tips & Tools
Stephanie Wright
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
Rachel Berryman
Building Data Scientists
Building Data Scientists
Mitch Sanders
Data Engineer vs Data Scientist vs Data Analyst.pptx
Data Engineer vs Data Scientist vs Data Analyst.pptx
CarolineRebeccaD
Data Management for librarians
Data Management for librarians
C. Tobin Magle
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Maninda Edirisooriya
The Art of Requesting Data from IT
The Art of Requesting Data from IT
Brad Adams
CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217
lyarmey
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Adaryl "Bob" Wakefield, MBA
Introduction to Bioinformatics
Introduction to Bioinformatics
Leighton Pritchard
Best practices data collection
Best practices data collection
Sherry Lake
Datascience methodology
Datascience methodology
ArunakumariAkula1
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP Students
Carly Strasser
Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
Digital Transformation EXPO Event Series
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
Vivian S. Zhang
Hadoop for Data Science
Hadoop for Data Science
Donald Miner
Similar to Computing Workflows for Biologists: An Overview
(20)
Data preprocessing using Machine Learning
Data preprocessing using Machine Learning
Data Processing DOH Workshop.pptx
Data Processing DOH Workshop.pptx
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable Workflows
Data Management: Tips & Tools
Data Management: Tips & Tools
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
Building Data Scientists
Building Data Scientists
Data Engineer vs Data Scientist vs Data Analyst.pptx
Data Engineer vs Data Scientist vs Data Analyst.pptx
Data Management for librarians
Data Management for librarians
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
The Art of Requesting Data from IT
The Art of Requesting Data from IT
CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Introduction to Bioinformatics
Introduction to Bioinformatics
Best practices data collection
Best practices data collection
Datascience methodology
Datascience methodology
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP Students
Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
Hadoop for Data Science
Hadoop for Data Science
More from tracykteal
Carpentries nasa meeting_2018-10-30
Carpentries nasa meeting_2018-10-30
tracykteal
Data and Software Carpentry Science Gateways webinar 2017-05-10
Data and Software Carpentry Science Gateways webinar 2017-05-10
tracykteal
Data Carpentry NSBE Informational Webinar
Data Carpentry NSBE Informational Webinar
tracykteal
Data carpentry replicathon_2017-03-24
Data carpentry replicathon_2017-03-24
tracykteal
Data carpentry run-a-workshop
Data carpentry run-a-workshop
tracykteal
Data carpentry instructor-onboarding
Data carpentry instructor-onboarding
tracykteal
Data carpentry ndic-2015-05-05
Data carpentry ndic-2015-05-05
tracykteal
More from tracykteal
(7)
Carpentries nasa meeting_2018-10-30
Carpentries nasa meeting_2018-10-30
Data and Software Carpentry Science Gateways webinar 2017-05-10
Data and Software Carpentry Science Gateways webinar 2017-05-10
Data Carpentry NSBE Informational Webinar
Data Carpentry NSBE Informational Webinar
Data carpentry replicathon_2017-03-24
Data carpentry replicathon_2017-03-24
Data carpentry run-a-workshop
Data carpentry run-a-workshop
Data carpentry instructor-onboarding
Data carpentry instructor-onboarding
Data carpentry ndic-2015-05-05
Data carpentry ndic-2015-05-05
Recently uploaded
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
ronsairoathenadugay
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
HyderabadDolls
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
amy56318795
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
kojalkojal131
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
SOFTTECHHUB
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
Rajesh Mondal
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
Timothy Spann
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
Recently uploaded
(20)
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
Computing Workflows for Biologists: An Overview
1.
Compu&ng Workflows for Biologists Based on: Shade & Teal, Compu&ng Workflows for Biologists: A Roadmap, PLOS Biology Data Carpentry data organiza&on lessons
2.
• How many people here plan to analyze data with a computer in their work? • Are you working with other people on this analysis? •
Do other people need to understand your analysis? • Do you need to remember and understand your analysis?
3.
Elements of compu&ng • How data was generated (metadata) • Data •
Data cleaning steps • Data analysis steps • Final plots and charts
4.
Data! • Keep raw data raw • Use meaningful names •
Organize your data so computers can read it
5.
Keep raw data raw • What is raw data? • Why should I leave it alone?
6.
Use meaningful names
7.
Organize your data so computers can read it (let’s talk about spreadsheets) hTp://www.datacarpentry.org/spreadsheet-ecology-lesson/00-intro.html … also avoid formaZng errors
8.
Organizing data in spreadsheets The cardinal rules of using spreadsheet programs for data: • Put all your variables in columns - the thing you're measuring, like 'weight' or 'temperature'. • Put each observa/on in its own row. •
Don't combine mul/ple pieces of informa/on in one cell. Some&mes it just seems like one thing, but think if that's the only way you'll want to be able to use or sort that data. • Leave the raw data raw - don't mess with it! • Export the cleaned data to a text based format like CSV. This ensures that anyone can use the data, and is the format required by most data repositories.
9.
10.
FormaZng problems hTp://www.datacarpentry.org/spreadsheet- ecology-lesson/02-common-mistakes.html
11.
A Roadmap for the Compu&ng Biologist • Consider the overarching goals of the analysis • Adopt an Itera&ve, Branching PaTern to Systema&cally Explore Op&ons •
Reproducibility Checkpoints • Taking Notes for Computa&onal Analysis • Shared Responsibility: The Team Approach to Reproducibility and Data Management Shade and Teal, Compu&ng Workflows for Biologists: A Roadmap hTp://journals.plos.org/plosbiology/ar&cle?id=10.1371/journal.pbio.1002303
12.
Consider the Overarching Goals of the Analysis • Working to address a given hypothesis will mo&vate different analysis strategies than conduc&ng data explora&on
13.
Reproducibility Checkpoints Reproducibility checkpoints are places in a workflow devoted to scru&nizing its integrity - the workflow (or step in the workflow) can be seamlessly used (it doesn’t crash halfway or return error messages) - the outcomes are consistent and validated across mul&ple, iden&cal itera&ons -
results should make biological sense
14.
Adopt an Itera/ve, Branching PaFern to Systema/cally Explore Op/ons
15.
Taking Notes for Computa/onal Analysis • Take notes like you would for experimental work • Comment code •
Use version control (Github/Gitlab)
16.
What needs to go in notes: - Soiware versions used - Descrip&on of what the soiware is doing/goal of that step -
Brief notes on devia&ons from default op&ons - Workflows can include different soiware (e.g., PANDAseq to QIIME to R), and should also include all “formaZng steps” needed to move between tools hopefully you don’t need to manually format too much; avoid if possible
17.
Shared Responsibility: The Team Approach to Reproducibility and Data Management We posit that integrity in computa&onal analysis of biological data is enhanced if there is a sense of shared responsibility for ensuring reproducible workflows. Research teams that work together to develop and debug code, perform internal reproducibility checkpoints for each other, and generally hold one another accountable for high-quality results likely will enjoy a low manuscript retrac&on rate, high level of confidence in their results, and strong sense of collabora&on. You, your lab mates and PI need to value the &me it takes to do analyses reproducibly and correctly
18.
Shared responsibility • Shared storage and workspace can facilitate access to all group data • Using version control repositories can provide access to code and documenta&on (Github, Dropbox) •
SeZng expecta&ons for ‘reproducibility checkpoints’ (team “hackathons”: open-computer group mee&ngs dedicated to analysis) • Paper reviews • Looking for help/support outside the lab (bioinforma&cs or user groups, office hours, StackOverflow)
19.
Looking for help hTps://github.com/mblmicdiv/course2016/ blob/master/bioinfo-resources.md You are not alone Survey responses
20.
Exercise hFp:///nyurl.com/mbl-workflows
Download now