SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Yannick Pouliot, PhD
Biocomputational scientist
Butte Laboratory
04/04/2012
Databases, Web Services and
Tools For Systems Immunology
What You Need For Systems Immunology
1. A real hypothesis
▫ No fuzzy brain stuff
2. An understanding of statistics and data mining
▫ …one never understands enough statistics…
3. A lot of data, typically from different “levels” of
reality: organismal, molecular/static,
molecular/functional, etc
▫ … and therefore, databases of some sort
4. Software tools and programming expertise
5. Computing power
1: Hypothesis
Developing a Hypothesis Suitable for Data Mining
• Possibly the hardest step
▫ Must have a measurable metric that can be tested
statistically
• A real hypothesis (H1) looks like this:
▫ H1: Drugs with increased frequency of adverse drug
reactions can be identified from patterns of reactivity
in PubChem Bioassays screens.
• Actually, statistical tests tries to invalidate the null
(Ho) hypothesis, which looks like this:
▫ Ho: Bioactivity patterns in PubChem Bioassays do not
distinguish drugs with increased frequency of adverse
drug reactions
2: Statistics & Data Mining
Understanding Statistics: Essential
• Not easy; counter-intuitive
• Critical, because with large volumes of data come the
guarantee that you will always find “something”
▫ … except that it will most likely be purely artifactual
Q: ever heard of multiple testing correction?
If not, read Bill Noble excellent description: Noble, W. S.
How does multiple testing correction work? Nat Biotech
27, 1135-1137 (2009).
Learning About Statistics
Introductory
• Norman, Streiner (2008): Biostatistics, the Bare Essentials;
Hamilton.
More advanced
• Vittinghoff, Eric. (2005): Regression methods in biostatistics;
Springer.
• Gentleman et al., (2005): Bioinformatics and Computational
Biology Solutions Using R and Bioconductor; Springer.
Advanced
Doncaster & Davey (2007): Analysis of Variance and
Covariance: How to Choose and Construct Models for the
Life Sciences; Cambridge University Press.
Understanding Data Mining
• Data mining uses statistical techniques + other
techniques that are uniquely “computational”
• Key to Systems Immunology
• Resources:
▫ Excellent introduction, Weka-specific:
 Witten & Frank (2005): Data Mining: Practical Machine
Learning Tools and Techniques, Second Edition; Morgan
Kaufman
▫ Nisbet et al., (2009): Handbook of Statistical Analysis and
Data Mining Applications; Wiley
• Tools: coming-up
3: Data
Huge Numbers of Databases
• Many need to be licensed ($)
▫ Ingenuity Pathways Analysis (IPA)
 Excellent but pricey
▫ MetaCore
 competitor to IPA
 available from Lane Library
• Many more freely available
▫ DAVID: similar to IPA and MetaCore
▫ Typically dirtier than commercial products, but
sometimes much more comprehensive
▫ Consult Nucleic Acids Research’s yearly database issue
The Bad News
• To be useful in Systems Medicine, databases
need to offer one of the following:
▫ Be downloadable (FTP) in text or other form
▫ Be accessible programmatically over the Internet
(e.g., Web service)
Clicking on Web interfaces just doesn’t cut it…
This mean knowing about databases and having
programming skills (more later)
A Small Sample of DBs Crucial to
Systems Immunology
• NCBI: Entrez, GEO, PubMed, Gene, Genome,
RefSeq, dbSNP
• EBI: Array Express, Gene Expression Atlas,
ENSEMBL
• Mouse Genome Database
• DrugBank
• BioGPS
• HapMap
• STITCH: interactions between compounds and
proteins
• UMLS (Unified Medical Language System)
Unified Medical Language System
• Developed by National Library of Medicine
• = data files and software that brings together
multiple biomedical vocabularies and ontologies
to enable semantic interoperability.
▫ repository of terms, definitions and concepts in
biomedicine, complete with cross-referencing and
ontological relationships
• Essential but complex and large
• Requires free license
Immunology Databases
The ImmPort Database: The Only DB of Its Kind in
Immunology
• http://immport.niaid.nih.gov/
• Stores results from huge range of assays
▫ HAI  flow cytometry phenotyping
▫ Genotyping
▫ Sequencing
▫ Gene expression
▫ etc
• Intended to be the primary repository for all NIAID
“center” grants
• Can access pre-publication data if given access by PI
• Caveat: volume of PUBLIC datasets is currently
limited
Stanford’s Human Immune Monitoring
Center (HIMC) Database
• Stanford Data Miner is HIMC’s data mining
database
• Stores many of the assays run by HIMC
• Ask HIMC for access data from researchers who
use HIMC (will require their consent)
Next Level Up: Relational Databases
Take Your Pick
Why Relational Databases?
• Much faster access to data
• Data are safe
• Completely robust query answers
• Good scaling
• Highly integrative
▫ Cross-database querying: essential!
Recommendation: MySQL
• Nothing magical about MySQL
▫ Widest usage in bioinformatics
▫ Free (community edition)
▫ Runs on everything (Linux, Win, Mac)
▫ Easiest relational DB (short of MS Access)
• Resources
▫ Moes (2005): Beginning MySQL; Wiley
▫ DuBois (2007): MySQL Cookbook; O’Reilly
▫ Dyer (2008): MYSQL in a Nutshell; O’Reilly
4: Software tools and programming
expertise
• Free software!
• Free algorithms!
• Pre-coded algorithms (i.e., packages)!
• Very cheap computing power!
The Good News
The Bad News
• Dunno how to use
• “Not talented”
• “Not enough time”
• Can’t be bothered
▫ e.g., reading the paper describing the software tool
one is relying on
More Good News
• Not that hard
• Lots and lots of good resources
• Read a book, dammit
• Find a buddy
• Use Cloud instances (preconfigured machines)
▫ Can even be free!
“Gateway Drugs” to Programming:
Workflow Systems
• GenePattern
▫ Predominantly oriented toward gene expression
analysis
▫ Public server available
• Galaxy
▫ Predominantly oriented toward sequence (NGS)
analysis
▫ Public server available
• Weka
▫ Easiest way to learn data mining
But Seriously: Why Programming?
• Address small problems that can nail you
• Address bigger problems by standing on the
shoulders of giants
• Flexibility: If you’re doing “real” science, off-the-
shelf software will fail you every time
▫ 80% rule…
Example:
Don’t Try This
With Excel
•Millions of sequence
reads compared against
mouse transcriptome
• Determining number of
distinct species and
frequency of members in
each
• Summarize using plots
for each codon
How it’s
done
SQL + R
Another Example…
Languages/Systems You Can’t Do
Without
• SQL
▫ To talk to MySQL
• Perl or Python
▫ To glue things together
• R (“R Project”)
▫ To perform heavy-duty statistical analysis
• Weka
▫ To apply machine learning algorithms
The Inside Scoop of Making
Programming Work for You
• Diagram or write down your process
▫ Don’t just sit down and write code
• Write comments
▫ “I’m doing this because of this special case over here”
• Using meaningful variable names
▫ $c = not good
• Use development tools
▫ Rstudio (for writing R code)
▫ Eclipse (for writing in almost any language)
▫ HeidiSQL (SQL browser)
1. download subject <--> group mapping table
2. download drug treatment data for each subject ID
create two sets of subjects
ImmunoTreatedSubjects
NonImmunoTreatedSubjects
3. download gene type data
ImmunoGenes
NonImmunoGenes
4. calculate variance of each gene set for each subject
5. create data frame to store (4) --> varGeneSetForEachSubject
6. compute t test to determine whether mean is significantly different
first test: generate statistic for each individual subject
--> compare variance of ImmunoGenes vs. variance of NonImmunoGenes for
each subject
Programming Without Programming: NCBI’s
Ebot
• Uses NCBI e-utilities (“Web services”)
▫ Programmatic access to NCBI databases,
including PubMed
▫ VERY useful for data mining
• Ebot codes the particular kind of service you
want to use
• Still, it only gets you so far, but at least the
heaviest lifting has been done (and it is heavy…)
Heavy Lifting: R
R: Why It Hurts So Good
• The “R Project” (aka R) is the premiere Open Source
statistical and data mining language and suite of
libraries.
• Pros
▫ Free, runs on everything
▫ Very flexible statistical computing
▫ Dominant standard in biocomputing
▫ Big user community at Stanford
▫ Many key libraries written at Stanford
• Cons
▫ Non-trivial learning curve
▫ Documentation is of variable quality
Key R Resources
Three essentials:
• RStudio
▫ Integrated development environment
 don’t code R w/o it!
• Crawley (2007): The R Book; Wiley
• Matloff (2011): The Art of R Programming; No
Starch Press
• Teetor (2011): The R Cookbook; O’Reilly
• Wickham (2009): ggplot2; Springer
Lighter Lifting: Weka
3/18/2022
37
WEKA Data Mining Suite
• Machine learning/data mining software written
in Java (distributed under the GNU Public
License)
• Main features:
▫ Comprehensive set of data pre-processing tools,
learning algorithms and evaluation methods
▫ Graphical user interfaces (incl. data visualization)
▫ Environment for comparing learning algorithms
• Heavily referenced in “Data Mining” (Witten &
Frank)
3/18/2022
University of Waikato
38
3/18/2022
University of Waikato
39
Perl, Python
• Either is a great language for bioinformatics
• Run on anything
• Use it to quickly glue systems together, e.g.,
▫ Integrate MySQL and R together
▫ Run Web services queries
• Python has more growth potential
▫ Preferable over Perl
5. Computer Power
Why The Cloud Matters For Biologists
• You are purchasing computing power, not
machines
▫  never outdated
• You can purchase as much/little as you need
▫ You don’t have to run/manage what you don’t use
• Can easily migrate from one machine type to
another (minutes)
• Can add storage in seconds
• Accessible from anywhere
• Easy to share e.g., (large) datasets with others
Why Own When You Can Rent?
Welcome To the Cloud…
For biomedical computing, Amazon
Cloud is ideal because it provides
highly flexible storage and compute
power sold on a use basis
Another Example: PathSeq
• Compare millions of short-read sequences
against all genomic + transcriptomic sequences
for all microbes (!)
Amazon Cloud
“Management Console”
Q: What does working with a Cloud
machine feel like?
A: It’s not materially different than
accessing a machine on our cluster,
except you can do anything you want
Main Services Provided by Amazon Cloud
• Storage
▫ Traditional disk volumes
▫ S3 buckets (“Simple Storage System”)
• Computing (EC2 – “Elastic Compute Cloud”)
▫ Single machine instances
▫ Clusters of various types
• Machine types
▫ Compute servers
▫ Database servers
▫ Cluster
▫ Specialized architectures
▫ Variety of operating systems (LINUX flavors, Windows)
Costs
• You pay for (almost) everything you do
▫ Data transfers (out)
▫ Storage
▫ CPU cycles (depends on instance type; one
instance is free)
• Can purchase cycles at below average market
price
▫ Can provide access to vast amounts of computing
power at a price you can afford
• Research grants from Amazon
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringTao Xie
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsAndrea Wiggins
 
Secondary data analysis with digital trace data
Secondary data analysis with digital trace dataSecondary data analysis with digital trace data
Secondary data analysis with digital trace dataAndrea Wiggins
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Bertram Ludäscher
 
Findability through Traceability - A Realistic Application of Candidate Tr...
Findability through Traceability  - A Realistic Application of Candidate Tr...Findability through Traceability  - A Realistic Application of Candidate Tr...
Findability through Traceability - A Realistic Application of Candidate Tr...Markus Borg
 
Research software susainability
Research software susainabilityResearch software susainability
Research software susainabilityDaniel S. Katz
 
Using electronic laboratory notebooks in the academic life sciences: a group ...
Using electronic laboratory notebooks in the academic life sciences: a group ...Using electronic laboratory notebooks in the academic life sciences: a group ...
Using electronic laboratory notebooks in the academic life sciences: a group ...SC CTSI at USC and CHLA
 
Large Scale Studies: Malware Needles in a Haystack
Large Scale Studies: Malware Needles in a HaystackLarge Scale Studies: Malware Needles in a Haystack
Large Scale Studies: Malware Needles in a HaystackMarcus Botacin
 
Lec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustLec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustMenchita Falcutila Dumlao
 
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014Tom LaGatta
 
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven ResearchISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven ResearchTao Xie
 
Slides chase 2019 connected health conference - thursday 26 september 2019 -...
Slides chase 2019  connected health conference - thursday 26 september 2019 -...Slides chase 2019  connected health conference - thursday 26 september 2019 -...
Slides chase 2019 connected health conference - thursday 26 september 2019 -...Amélie Gyrard
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...Bonnie Hurwitz
 
User Expectations in Mobile App Security
User Expectations in Mobile App SecurityUser Expectations in Mobile App Security
User Expectations in Mobile App SecurityTao Xie
 
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and SparkReproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and SparkAdaryl "Bob" Wakefield, MBA
 
PE Trojan Detection Based on the Assessment of Static File Features
PE Trojan Detection Based on the Assessment of Static File FeaturesPE Trojan Detection Based on the Assessment of Static File Features
PE Trojan Detection Based on the Assessment of Static File FeaturesAntiy Labs
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software RepositoriesIsrael Herraiz
 

Was ist angesagt? (20)

Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna Workflows
 
Secondary data analysis with digital trace data
Secondary data analysis with digital trace dataSecondary data analysis with digital trace data
Secondary data analysis with digital trace data
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
 
Findability through Traceability - A Realistic Application of Candidate Tr...
Findability through Traceability  - A Realistic Application of Candidate Tr...Findability through Traceability  - A Realistic Application of Candidate Tr...
Findability through Traceability - A Realistic Application of Candidate Tr...
 
Research software susainability
Research software susainabilityResearch software susainability
Research software susainability
 
Using electronic laboratory notebooks in the academic life sciences: a group ...
Using electronic laboratory notebooks in the academic life sciences: a group ...Using electronic laboratory notebooks in the academic life sciences: a group ...
Using electronic laboratory notebooks in the academic life sciences: a group ...
 
Large Scale Studies: Malware Needles in a Haystack
Large Scale Studies: Malware Needles in a HaystackLarge Scale Studies: Malware Needles in a Haystack
Large Scale Studies: Malware Needles in a Haystack
 
Lec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustLec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrust
 
Sybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal PresentationSybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal Presentation
 
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
 
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven ResearchISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
 
Slides chase 2019 connected health conference - thursday 26 september 2019 -...
Slides chase 2019  connected health conference - thursday 26 september 2019 -...Slides chase 2019  connected health conference - thursday 26 september 2019 -...
Slides chase 2019 connected health conference - thursday 26 september 2019 -...
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
CV_10/17
CV_10/17CV_10/17
CV_10/17
 
User Expectations in Mobile App Security
User Expectations in Mobile App SecurityUser Expectations in Mobile App Security
User Expectations in Mobile App Security
 
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and SparkReproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
 
Knowledge Beacons
Knowledge BeaconsKnowledge Beacons
Knowledge Beacons
 
PE Trojan Detection Based on the Assessment of Static File Features
PE Trojan Detection Based on the Assessment of Static File FeaturesPE Trojan Detection Based on the Assessment of Static File Features
PE Trojan Detection Based on the Assessment of Static File Features
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software Repositories
 

Andere mochten auch

Andere mochten auch (16)

Introduction
IntroductionIntroduction
Introduction
 
1. presentación recursos destinados al dmq
1. presentación recursos destinados al dmq1. presentación recursos destinados al dmq
1. presentación recursos destinados al dmq
 
Caperucita Roja Convertida En Internet
Caperucita Roja Convertida En InternetCaperucita Roja Convertida En Internet
Caperucita Roja Convertida En Internet
 
Kenth's UNT Trasncript.PDF
Kenth's UNT Trasncript.PDFKenth's UNT Trasncript.PDF
Kenth's UNT Trasncript.PDF
 
Historia del internet
Historia del internetHistoria del internet
Historia del internet
 
Monstruos en la noche
Monstruos en la nocheMonstruos en la noche
Monstruos en la noche
 
Ola
OlaOla
Ola
 
Historia del internet
Historia del internetHistoria del internet
Historia del internet
 
Arantaboutamvsandanime.docx
Arantaboutamvsandanime.docxArantaboutamvsandanime.docx
Arantaboutamvsandanime.docx
 
Chapter4b
Chapter4bChapter4b
Chapter4b
 
Chapter4a
Chapter4aChapter4a
Chapter4a
 
Creating Fit Families
Creating Fit FamiliesCreating Fit Families
Creating Fit Families
 
Rockfon Celings steel-finish-pics
Rockfon Celings steel-finish-picsRockfon Celings steel-finish-pics
Rockfon Celings steel-finish-pics
 
Online vs. offline - Studie zum Einkaufsverhalten
Online vs. offline - Studie zum EinkaufsverhaltenOnline vs. offline - Studie zum Einkaufsverhalten
Online vs. offline - Studie zum Einkaufsverhalten
 
Mapa conceptual de la sociedad de la información
Mapa conceptual de la sociedad de la informaciónMapa conceptual de la sociedad de la información
Mapa conceptual de la sociedad de la información
 
Muscles of mastication & TMJ Dr.N.Mugunthan
Muscles of mastication & TMJ Dr.N.MugunthanMuscles of mastication & TMJ Dr.N.Mugunthan
Muscles of mastication & TMJ Dr.N.Mugunthan
 

Ähnlich wie Databases, Web Services and Tools For Systems Immunology

H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellSri Ambati
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchRachel Berryman
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAjaved75
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker, Inc.
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciencesChris Dwan
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibilityc.titus.brown
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...mestato
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Ken Karapetyan
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to BioinformaticsLeighton Pritchard
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theoryC. Tobin Magle
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysDemi Ben-Ari
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupSri Ambati
 
2014 manchester-reproducibility
2014 manchester-reproducibility2014 manchester-reproducibility
2014 manchester-reproducibilityc.titus.brown
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesTao Xie
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 

Ähnlich wie Databases, Web Services and Tools For Systems Immunology (20)

H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce Hoff
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciences
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Cshl minseqe 2013_ouellette
Cshl minseqe 2013_ouelletteCshl minseqe 2013_ouellette
Cshl minseqe 2013_ouellette
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
 
2014 manchester-reproducibility
2014 manchester-reproducibility2014 manchester-reproducibility
2014 manchester-reproducibility
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 

Mehr von Yannick Pouliot

Survey of Spark for Data Pre-Processing and Analytics
Survey of Spark for Data Pre-Processing and AnalyticsSurvey of Spark for Data Pre-Processing and Analytics
Survey of Spark for Data Pre-Processing and AnalyticsYannick Pouliot
 
Systems Immunology -- 2014
Systems Immunology -- 2014Systems Immunology -- 2014
Systems Immunology -- 2014Yannick Pouliot
 
Managing experiment data using Excel and Friends
Managing experiment data using Excel and FriendsManaging experiment data using Excel and Friends
Managing experiment data using Excel and FriendsYannick Pouliot
 
Essential UNIX skills for biologists
Essential UNIX skills for biologistsEssential UNIX skills for biologists
Essential UNIX skills for biologistsYannick Pouliot
 
A guided SQL tour of bioinformatics databases
A guided SQL tour of bioinformatics databasesA guided SQL tour of bioinformatics databases
A guided SQL tour of bioinformatics databasesYannick Pouliot
 
Ontologically-Aware Automated Gating
Ontologically-Aware Automated GatingOntologically-Aware Automated Gating
Ontologically-Aware Automated GatingYannick Pouliot
 
Why The Cloud Is A Computational Biologist's Best Friend
Why The Cloud Is A Computational Biologist's Best FriendWhy The Cloud Is A Computational Biologist's Best Friend
Why The Cloud Is A Computational Biologist's Best FriendYannick Pouliot
 
There’s No Avoiding It: Programming Skills You’ll Need
There’s No Avoiding It:  Programming Skills You’ll NeedThere’s No Avoiding It:  Programming Skills You’ll Need
There’s No Avoiding It: Programming Skills You’ll NeedYannick Pouliot
 
Ontologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataOntologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataYannick Pouliot
 
Predicting Adverse Drug Reactions Using PubChem Screening Data
Predicting Adverse Drug Reactions Using PubChem Screening DataPredicting Adverse Drug Reactions Using PubChem Screening Data
Predicting Adverse Drug Reactions Using PubChem Screening DataYannick Pouliot
 
Repositioning Old Drugs For New Indications Using Computational Approaches
Repositioning Old Drugs For New Indications Using Computational ApproachesRepositioning Old Drugs For New Indications Using Computational Approaches
Repositioning Old Drugs For New Indications Using Computational ApproachesYannick Pouliot
 

Mehr von Yannick Pouliot (11)

Survey of Spark for Data Pre-Processing and Analytics
Survey of Spark for Data Pre-Processing and AnalyticsSurvey of Spark for Data Pre-Processing and Analytics
Survey of Spark for Data Pre-Processing and Analytics
 
Systems Immunology -- 2014
Systems Immunology -- 2014Systems Immunology -- 2014
Systems Immunology -- 2014
 
Managing experiment data using Excel and Friends
Managing experiment data using Excel and FriendsManaging experiment data using Excel and Friends
Managing experiment data using Excel and Friends
 
Essential UNIX skills for biologists
Essential UNIX skills for biologistsEssential UNIX skills for biologists
Essential UNIX skills for biologists
 
A guided SQL tour of bioinformatics databases
A guided SQL tour of bioinformatics databasesA guided SQL tour of bioinformatics databases
A guided SQL tour of bioinformatics databases
 
Ontologically-Aware Automated Gating
Ontologically-Aware Automated GatingOntologically-Aware Automated Gating
Ontologically-Aware Automated Gating
 
Why The Cloud Is A Computational Biologist's Best Friend
Why The Cloud Is A Computational Biologist's Best FriendWhy The Cloud Is A Computational Biologist's Best Friend
Why The Cloud Is A Computational Biologist's Best Friend
 
There’s No Avoiding It: Programming Skills You’ll Need
There’s No Avoiding It:  Programming Skills You’ll NeedThere’s No Avoiding It:  Programming Skills You’ll Need
There’s No Avoiding It: Programming Skills You’ll Need
 
Ontologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataOntologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological Data
 
Predicting Adverse Drug Reactions Using PubChem Screening Data
Predicting Adverse Drug Reactions Using PubChem Screening DataPredicting Adverse Drug Reactions Using PubChem Screening Data
Predicting Adverse Drug Reactions Using PubChem Screening Data
 
Repositioning Old Drugs For New Indications Using Computational Approaches
Repositioning Old Drugs For New Indications Using Computational ApproachesRepositioning Old Drugs For New Indications Using Computational Approaches
Repositioning Old Drugs For New Indications Using Computational Approaches
 

Kürzlich hochgeladen

Call Girls Shahdol Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Shahdol Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Shahdol Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Shahdol Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service Available
Call Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service AvailableCall Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service Available
Call Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service AvailableJanvi Singh
 
Exclusive Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangal...
Exclusive Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangal...Exclusive Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangal...
Exclusive Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangal...Sheetaleventcompany
 
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana GuptaLifecare Centre
 
Race Course Road } Book Call Girls in Bangalore | Whatsapp No 6378878445 VIP ...
Race Course Road } Book Call Girls in Bangalore | Whatsapp No 6378878445 VIP ...Race Course Road } Book Call Girls in Bangalore | Whatsapp No 6378878445 VIP ...
Race Course Road } Book Call Girls in Bangalore | Whatsapp No 6378878445 VIP ...dishamehta3332
 
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...Sheetaleventcompany
 
(RIYA)🎄Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...
(RIYA)🎄Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...(RIYA)🎄Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...
(RIYA)🎄Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...TanyaAhuja34
 
👉 Chennai Sexy Aunty’s WhatsApp Number 👉📞 7427069034 👉📞 Just📲 Call Ruhi Colle...
👉 Chennai Sexy Aunty’s WhatsApp Number 👉📞 7427069034 👉📞 Just📲 Call Ruhi Colle...👉 Chennai Sexy Aunty’s WhatsApp Number 👉📞 7427069034 👉📞 Just📲 Call Ruhi Colle...
👉 Chennai Sexy Aunty’s WhatsApp Number 👉📞 7427069034 👉📞 Just📲 Call Ruhi Colle...rajnisinghkjn
 
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Availableperfect solution
 
Circulatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanismsCirculatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanismsMedicoseAcademics
 
Premium Call Girls Nagpur {9xx000xx09} ❤️VVIP POOJA Call Girls in Nagpur Maha...
Premium Call Girls Nagpur {9xx000xx09} ❤️VVIP POOJA Call Girls in Nagpur Maha...Premium Call Girls Nagpur {9xx000xx09} ❤️VVIP POOJA Call Girls in Nagpur Maha...
Premium Call Girls Nagpur {9xx000xx09} ❤️VVIP POOJA Call Girls in Nagpur Maha...Sheetaleventcompany
 
Call 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room Delivery
Call 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room DeliveryCall 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room Delivery
Call 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room DeliveryJyoti singh
 
Call girls Service Phullen / 9332606886 Genuine Call girls with real Photos a...
Call girls Service Phullen / 9332606886 Genuine Call girls with real Photos a...Call girls Service Phullen / 9332606886 Genuine Call girls with real Photos a...
Call girls Service Phullen / 9332606886 Genuine Call girls with real Photos a...call girls hydrabad
 
Kolkata Call Girls Naktala 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
Kolkata Call Girls Naktala  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Girl Se...Kolkata Call Girls Naktala  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Girl Se...
Kolkata Call Girls Naktala 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...Namrata Singh
 
Call Girl In Chandigarh 📞9809698092📞 Just📲 Call Inaaya Chandigarh Call Girls ...
Call Girl In Chandigarh 📞9809698092📞 Just📲 Call Inaaya Chandigarh Call Girls ...Call Girl In Chandigarh 📞9809698092📞 Just📲 Call Inaaya Chandigarh Call Girls ...
Call Girl In Chandigarh 📞9809698092📞 Just📲 Call Inaaya Chandigarh Call Girls ...Sheetaleventcompany
 
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...gragneelam30
 
tongue disease lecture Dr Assadawy legacy
tongue disease lecture Dr Assadawy legacytongue disease lecture Dr Assadawy legacy
tongue disease lecture Dr Assadawy legacyDrMohamed Assadawy
 
Kolkata Call Girls Shobhabazar 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Gir...
Kolkata Call Girls Shobhabazar  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Gir...Kolkata Call Girls Shobhabazar  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Gir...
Kolkata Call Girls Shobhabazar 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Gir...Namrata Singh
 
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...Sheetaleventcompany
 
Difference Between Skeletal Smooth and Cardiac Muscles
Difference Between Skeletal Smooth and Cardiac MusclesDifference Between Skeletal Smooth and Cardiac Muscles
Difference Between Skeletal Smooth and Cardiac MusclesMedicoseAcademics
 

Kürzlich hochgeladen (20)

Call Girls Shahdol Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Shahdol Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Shahdol Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Shahdol Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service Available
Call Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service AvailableCall Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service Available
Call Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service Available
 
Exclusive Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangal...
Exclusive Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangal...Exclusive Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangal...
Exclusive Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangal...
 
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
 
Race Course Road } Book Call Girls in Bangalore | Whatsapp No 6378878445 VIP ...
Race Course Road } Book Call Girls in Bangalore | Whatsapp No 6378878445 VIP ...Race Course Road } Book Call Girls in Bangalore | Whatsapp No 6378878445 VIP ...
Race Course Road } Book Call Girls in Bangalore | Whatsapp No 6378878445 VIP ...
 
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
 
(RIYA)🎄Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...
(RIYA)🎄Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...(RIYA)🎄Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...
(RIYA)🎄Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...
 
👉 Chennai Sexy Aunty’s WhatsApp Number 👉📞 7427069034 👉📞 Just📲 Call Ruhi Colle...
👉 Chennai Sexy Aunty’s WhatsApp Number 👉📞 7427069034 👉📞 Just📲 Call Ruhi Colle...👉 Chennai Sexy Aunty’s WhatsApp Number 👉📞 7427069034 👉📞 Just📲 Call Ruhi Colle...
👉 Chennai Sexy Aunty’s WhatsApp Number 👉📞 7427069034 👉📞 Just📲 Call Ruhi Colle...
 
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
 
Circulatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanismsCirculatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanisms
 
Premium Call Girls Nagpur {9xx000xx09} ❤️VVIP POOJA Call Girls in Nagpur Maha...
Premium Call Girls Nagpur {9xx000xx09} ❤️VVIP POOJA Call Girls in Nagpur Maha...Premium Call Girls Nagpur {9xx000xx09} ❤️VVIP POOJA Call Girls in Nagpur Maha...
Premium Call Girls Nagpur {9xx000xx09} ❤️VVIP POOJA Call Girls in Nagpur Maha...
 
Call 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room Delivery
Call 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room DeliveryCall 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room Delivery
Call 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room Delivery
 
Call girls Service Phullen / 9332606886 Genuine Call girls with real Photos a...
Call girls Service Phullen / 9332606886 Genuine Call girls with real Photos a...Call girls Service Phullen / 9332606886 Genuine Call girls with real Photos a...
Call girls Service Phullen / 9332606886 Genuine Call girls with real Photos a...
 
Kolkata Call Girls Naktala 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
Kolkata Call Girls Naktala  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Girl Se...Kolkata Call Girls Naktala  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Girl Se...
Kolkata Call Girls Naktala 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
 
Call Girl In Chandigarh 📞9809698092📞 Just📲 Call Inaaya Chandigarh Call Girls ...
Call Girl In Chandigarh 📞9809698092📞 Just📲 Call Inaaya Chandigarh Call Girls ...Call Girl In Chandigarh 📞9809698092📞 Just📲 Call Inaaya Chandigarh Call Girls ...
Call Girl In Chandigarh 📞9809698092📞 Just📲 Call Inaaya Chandigarh Call Girls ...
 
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...
 
tongue disease lecture Dr Assadawy legacy
tongue disease lecture Dr Assadawy legacytongue disease lecture Dr Assadawy legacy
tongue disease lecture Dr Assadawy legacy
 
Kolkata Call Girls Shobhabazar 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Gir...
Kolkata Call Girls Shobhabazar  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Gir...Kolkata Call Girls Shobhabazar  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Gir...
Kolkata Call Girls Shobhabazar 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Gir...
 
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
 
Difference Between Skeletal Smooth and Cardiac Muscles
Difference Between Skeletal Smooth and Cardiac MusclesDifference Between Skeletal Smooth and Cardiac Muscles
Difference Between Skeletal Smooth and Cardiac Muscles
 

Databases, Web Services and Tools For Systems Immunology

  • 1. Yannick Pouliot, PhD Biocomputational scientist Butte Laboratory 04/04/2012 Databases, Web Services and Tools For Systems Immunology
  • 2. What You Need For Systems Immunology 1. A real hypothesis ▫ No fuzzy brain stuff 2. An understanding of statistics and data mining ▫ …one never understands enough statistics… 3. A lot of data, typically from different “levels” of reality: organismal, molecular/static, molecular/functional, etc ▫ … and therefore, databases of some sort 4. Software tools and programming expertise 5. Computing power
  • 4. Developing a Hypothesis Suitable for Data Mining • Possibly the hardest step ▫ Must have a measurable metric that can be tested statistically • A real hypothesis (H1) looks like this: ▫ H1: Drugs with increased frequency of adverse drug reactions can be identified from patterns of reactivity in PubChem Bioassays screens. • Actually, statistical tests tries to invalidate the null (Ho) hypothesis, which looks like this: ▫ Ho: Bioactivity patterns in PubChem Bioassays do not distinguish drugs with increased frequency of adverse drug reactions
  • 5. 2: Statistics & Data Mining
  • 6. Understanding Statistics: Essential • Not easy; counter-intuitive • Critical, because with large volumes of data come the guarantee that you will always find “something” ▫ … except that it will most likely be purely artifactual Q: ever heard of multiple testing correction? If not, read Bill Noble excellent description: Noble, W. S. How does multiple testing correction work? Nat Biotech 27, 1135-1137 (2009).
  • 7. Learning About Statistics Introductory • Norman, Streiner (2008): Biostatistics, the Bare Essentials; Hamilton. More advanced • Vittinghoff, Eric. (2005): Regression methods in biostatistics; Springer. • Gentleman et al., (2005): Bioinformatics and Computational Biology Solutions Using R and Bioconductor; Springer. Advanced Doncaster & Davey (2007): Analysis of Variance and Covariance: How to Choose and Construct Models for the Life Sciences; Cambridge University Press.
  • 8. Understanding Data Mining • Data mining uses statistical techniques + other techniques that are uniquely “computational” • Key to Systems Immunology • Resources: ▫ Excellent introduction, Weka-specific:  Witten & Frank (2005): Data Mining: Practical Machine Learning Tools and Techniques, Second Edition; Morgan Kaufman ▫ Nisbet et al., (2009): Handbook of Statistical Analysis and Data Mining Applications; Wiley • Tools: coming-up
  • 10. Huge Numbers of Databases • Many need to be licensed ($) ▫ Ingenuity Pathways Analysis (IPA)  Excellent but pricey ▫ MetaCore  competitor to IPA  available from Lane Library • Many more freely available ▫ DAVID: similar to IPA and MetaCore ▫ Typically dirtier than commercial products, but sometimes much more comprehensive ▫ Consult Nucleic Acids Research’s yearly database issue
  • 11. The Bad News • To be useful in Systems Medicine, databases need to offer one of the following: ▫ Be downloadable (FTP) in text or other form ▫ Be accessible programmatically over the Internet (e.g., Web service) Clicking on Web interfaces just doesn’t cut it… This mean knowing about databases and having programming skills (more later)
  • 12. A Small Sample of DBs Crucial to Systems Immunology • NCBI: Entrez, GEO, PubMed, Gene, Genome, RefSeq, dbSNP • EBI: Array Express, Gene Expression Atlas, ENSEMBL • Mouse Genome Database • DrugBank • BioGPS • HapMap • STITCH: interactions between compounds and proteins • UMLS (Unified Medical Language System)
  • 13. Unified Medical Language System • Developed by National Library of Medicine • = data files and software that brings together multiple biomedical vocabularies and ontologies to enable semantic interoperability. ▫ repository of terms, definitions and concepts in biomedicine, complete with cross-referencing and ontological relationships • Essential but complex and large • Requires free license
  • 15. The ImmPort Database: The Only DB of Its Kind in Immunology • http://immport.niaid.nih.gov/ • Stores results from huge range of assays ▫ HAI  flow cytometry phenotyping ▫ Genotyping ▫ Sequencing ▫ Gene expression ▫ etc • Intended to be the primary repository for all NIAID “center” grants • Can access pre-publication data if given access by PI • Caveat: volume of PUBLIC datasets is currently limited
  • 16. Stanford’s Human Immune Monitoring Center (HIMC) Database • Stanford Data Miner is HIMC’s data mining database • Stores many of the assays run by HIMC • Ask HIMC for access data from researchers who use HIMC (will require their consent)
  • 17. Next Level Up: Relational Databases Take Your Pick
  • 18. Why Relational Databases? • Much faster access to data • Data are safe • Completely robust query answers • Good scaling • Highly integrative ▫ Cross-database querying: essential!
  • 19. Recommendation: MySQL • Nothing magical about MySQL ▫ Widest usage in bioinformatics ▫ Free (community edition) ▫ Runs on everything (Linux, Win, Mac) ▫ Easiest relational DB (short of MS Access) • Resources ▫ Moes (2005): Beginning MySQL; Wiley ▫ DuBois (2007): MySQL Cookbook; O’Reilly ▫ Dyer (2008): MYSQL in a Nutshell; O’Reilly
  • 20. 4: Software tools and programming expertise
  • 21. • Free software! • Free algorithms! • Pre-coded algorithms (i.e., packages)! • Very cheap computing power! The Good News
  • 22. The Bad News • Dunno how to use • “Not talented” • “Not enough time” • Can’t be bothered ▫ e.g., reading the paper describing the software tool one is relying on
  • 23. More Good News • Not that hard • Lots and lots of good resources • Read a book, dammit • Find a buddy • Use Cloud instances (preconfigured machines) ▫ Can even be free!
  • 24. “Gateway Drugs” to Programming: Workflow Systems • GenePattern ▫ Predominantly oriented toward gene expression analysis ▫ Public server available • Galaxy ▫ Predominantly oriented toward sequence (NGS) analysis ▫ Public server available • Weka ▫ Easiest way to learn data mining
  • 25. But Seriously: Why Programming? • Address small problems that can nail you • Address bigger problems by standing on the shoulders of giants • Flexibility: If you’re doing “real” science, off-the- shelf software will fail you every time ▫ 80% rule…
  • 26. Example: Don’t Try This With Excel •Millions of sequence reads compared against mouse transcriptome • Determining number of distinct species and frequency of members in each • Summarize using plots for each codon
  • 28.
  • 30. Languages/Systems You Can’t Do Without • SQL ▫ To talk to MySQL • Perl or Python ▫ To glue things together • R (“R Project”) ▫ To perform heavy-duty statistical analysis • Weka ▫ To apply machine learning algorithms
  • 31. The Inside Scoop of Making Programming Work for You • Diagram or write down your process ▫ Don’t just sit down and write code • Write comments ▫ “I’m doing this because of this special case over here” • Using meaningful variable names ▫ $c = not good • Use development tools ▫ Rstudio (for writing R code) ▫ Eclipse (for writing in almost any language) ▫ HeidiSQL (SQL browser) 1. download subject <--> group mapping table 2. download drug treatment data for each subject ID create two sets of subjects ImmunoTreatedSubjects NonImmunoTreatedSubjects 3. download gene type data ImmunoGenes NonImmunoGenes 4. calculate variance of each gene set for each subject 5. create data frame to store (4) --> varGeneSetForEachSubject 6. compute t test to determine whether mean is significantly different first test: generate statistic for each individual subject --> compare variance of ImmunoGenes vs. variance of NonImmunoGenes for each subject
  • 32. Programming Without Programming: NCBI’s Ebot • Uses NCBI e-utilities (“Web services”) ▫ Programmatic access to NCBI databases, including PubMed ▫ VERY useful for data mining • Ebot codes the particular kind of service you want to use • Still, it only gets you so far, but at least the heaviest lifting has been done (and it is heavy…)
  • 34. R: Why It Hurts So Good • The “R Project” (aka R) is the premiere Open Source statistical and data mining language and suite of libraries. • Pros ▫ Free, runs on everything ▫ Very flexible statistical computing ▫ Dominant standard in biocomputing ▫ Big user community at Stanford ▫ Many key libraries written at Stanford • Cons ▫ Non-trivial learning curve ▫ Documentation is of variable quality
  • 35. Key R Resources Three essentials: • RStudio ▫ Integrated development environment  don’t code R w/o it! • Crawley (2007): The R Book; Wiley • Matloff (2011): The Art of R Programming; No Starch Press • Teetor (2011): The R Cookbook; O’Reilly • Wickham (2009): ggplot2; Springer
  • 37. 3/18/2022 37 WEKA Data Mining Suite • Machine learning/data mining software written in Java (distributed under the GNU Public License) • Main features: ▫ Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods ▫ Graphical user interfaces (incl. data visualization) ▫ Environment for comparing learning algorithms • Heavily referenced in “Data Mining” (Witten & Frank)
  • 40. Perl, Python • Either is a great language for bioinformatics • Run on anything • Use it to quickly glue systems together, e.g., ▫ Integrate MySQL and R together ▫ Run Web services queries • Python has more growth potential ▫ Preferable over Perl
  • 42. Why The Cloud Matters For Biologists • You are purchasing computing power, not machines ▫  never outdated • You can purchase as much/little as you need ▫ You don’t have to run/manage what you don’t use • Can easily migrate from one machine type to another (minutes) • Can add storage in seconds • Accessible from anywhere • Easy to share e.g., (large) datasets with others
  • 43. Why Own When You Can Rent? Welcome To the Cloud…
  • 44. For biomedical computing, Amazon Cloud is ideal because it provides highly flexible storage and compute power sold on a use basis
  • 45. Another Example: PathSeq • Compare millions of short-read sequences against all genomic + transcriptomic sequences for all microbes (!) Amazon Cloud “Management Console”
  • 46. Q: What does working with a Cloud machine feel like? A: It’s not materially different than accessing a machine on our cluster, except you can do anything you want
  • 47. Main Services Provided by Amazon Cloud • Storage ▫ Traditional disk volumes ▫ S3 buckets (“Simple Storage System”) • Computing (EC2 – “Elastic Compute Cloud”) ▫ Single machine instances ▫ Clusters of various types • Machine types ▫ Compute servers ▫ Database servers ▫ Cluster ▫ Specialized architectures ▫ Variety of operating systems (LINUX flavors, Windows)
  • 48. Costs • You pay for (almost) everything you do ▫ Data transfers (out) ▫ Storage ▫ CPU cycles (depends on instance type; one instance is free) • Can purchase cycles at below average market price ▫ Can provide access to vast amounts of computing power at a price you can afford • Research grants from Amazon

Hinweis der Redaktion

  1. Multiple hypothesis testing corrects for random events that falsely appear significant
  2. PPT document created using combination of Perl, MySQL and R
  3. Blue=services I’ve used
  4. Mention cost calculator: http://calculator.s3.amazonaws.com/calc5.html