SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
BioStatFlow –© INRA DJ 2014
PMFB –UMR 1332, INRA, F-33140 Villenave d’Ornon
djacob@bordeaux.inra.fr
http://biostatflow.org
BioStatFlow –© INRA DJ 2014
BioStatFlow is a web application designed for the analysis of "omics", including
metabolomics, data with statistical methods. It deals with the analysis of data sets
generated from experiments.
Omics experiments yield large amounts of data, too much to be interpreted by the human
eye. A combination of multivariate and univariate data analyses are therefore essential to
extract and visualize the information of interest. Biologists need to gain basic knowledge
about the statistics employed to critically contribute to and evaluate their experimental
design, protocols, and results.
Nevertheless, there is still a lack of useful, fast, and easy online statistical tools for those
who are not experts in statistics. BioStatFlow has been developed to meet this need.
A web-based tool for Statistical Analysis
BioStatFlow –© INRA DJ 2014
Motivation of the design of BioStatFlow
1. The main goal of BioStatFlow is to facilitate the access to statistical tools for biologists that are
not specialists. It has been designed to execute statistical analyses sequentially, i.e. a linear chain
of statistical processing, so-called workflow in BioStatFlow. From a set of use cases identified
(mainly around OMICS data), BioStatFlow is based on the typical workflow as shown below:
A set of analysis is first proposed as a static sequence in order to normalize the
dataset. At this stage, users have to follow the order of the sequence. Because
of experimental issues in the technical equipment, the levels of some
analytical variables (features) cannot be determined or that different
experiments need to be compared, missing value estimation and data scaling
are helpful pre-processing steps. This is the default use case (default
workflow). Then, users can choose any of additional methods depending on
the dataset and the corresponding experimental design (i.e. factors), in order i)
to visualize the whole data, ii) to reveal biomarkers, iii) to analyse interactions
between factors, iv) to discriminate groups, and so on.
The entrance to each treatment takes the output of previous treatment.
If a treatment generates a data table (matrix) as an output, it will be used as
input to the next step. Otherwise, if the treatment only generates results (texts
and images) but does not change the input array, this latter will be directly
taken as output.
Each treatment can be written as an R script (most common) or as a PERL
script, embedding binary tools (like Matlab compiled scripts).
BioStatFlow –© INRA DJ 2014
http://biostatflow.org/doc/pg?id=tutorial:startTutorial:
Overview of how to use BioStatFlow
BioStatFlow –© INRA DJ 2014
STEP1: Input Dataset :
Provided by user, by uploading a dataset file
correctly formatted, then « Next Step »
BioStatFlow –© INRA DJ 2014
STEP2: Workflow selection
Modify parameters and/or add another
analysis, then "Launch"
BioStatFlow –© INRA DJ 2014
STEP3: Visualization of Results
Select a result, Zoom In/Out, or Download
BioStatFlow –© INRA DJ 2014
2. BioStatFlow allows bioinformaticians to easily integrate a new method of statistical
analysis in a workflow, or even create their own workflows. Thus, the analysis scripts and
the workflow definition files are stored in separate catalogs of the application; some
configuration files enabling integration without modify the application source code.
Motivation of the design of BioStatFlow
BioStatFlow –© INRA DJ 2014
The BioStatFlow software components consist of:
1. The BioStatFlow core, which is responsible for:
• managing the input-output through the GUI (datasets, workflows, parameters of each analysis, and results),
• creating batch scripts, from the workflow definition files,
• launching the analysis scripts,
• managing the persistent sessions (including access management)
2. The workflow and statistical analysis catalogs. These catalogs may be enriched at any time by adding either some statistical
analyses or even a new workflow.
3. The repository of persistent sessions. To save your work in a persistent session, you have to register before.
Architecture
1
2
3
BioStatFlow –© INRA DJ 2014
Workflow and Statistical Analysis catalogs
Catalog’s Root
Workflow 1
Workflow 2
Workflow n
…
def doc scripts
PCA.def PCA.xml PCA.R
…
…
…
…
…
…
Definition
files
Documentation
files
Scripts
files
workflow.def
Workflow
definition
files
•A Workflow is implemented as a directory containing itself three sub-directories, plus one definition file.
•the ‘def’ sub-directory:
•contains the analysis definition files which serve to automatically build the GUI of input masks
of the analysis parameters with some default values, and also the the header of R scripts taken
into account the initialization of parameters with the values given by the user.
•the 'doc' sub-directory:
•contains the analysis documentation files describing the the analysis parameters within the
input mask.
•the 'scripts' sub-directory:
•contains the analysis scripts themselves (not including the initialisation part of their
parameters, given that the header of each script, automatically generated, takes into account
this part )
•the 'workflow.def‘ file:
•contains the list of all analyses within the workflow
BioStatFlow –© INRA DJ 2014
PCA.def
Header of the R script
(automatically generated)
The R script
(written by the provider)
dataInMat dataInFact
dataOutMat dataOutFact
PCA.R
Params
Results
PCA.xml
An example: PCA
Overview of the interaction mechanism of
the different file types
BioStatFlow –© INRA DJ 2014
An example: PCA
PCA.def
GUI
(automatically
generated)
Header of
the R code
(automatically
generated)
BioStatFlow –© INRA DJ 2014
An example: PCA
…
…
PCA.R :
R code
written by
the provider
BioStatFlow –© INRA DJ 2014
An example: PCA
Results
BioStatFlow –© INRA DJ 2014
Repository of persistent sessions
Repository’s Root
Session 1
Session 2
Session n
…
query
bswf
imported_matrix_file.csv
p0 : Data Formatting
p1 : Split names
Sub-directory of Input data
Sub-directory of the analysis results
p5 : Scaling
…
…
sessparams : session parameters
BioStatFlow –© INRA DJ 2014
3. BioStatFlow helps disseminate the results of statistical analyzes by saving them in a
persistent session so that they can be fully restored. One can thus provide the session
identifier when publishing results (see the tutorial).
To disseminate your data and their associated statistical analysis, communicate the URL formed as:
http://biostatflow.org/view/<SESSION ID>
Motivation of the design of BioStatFlow
BioStatFlow –© INRA DJ 2014
Example of Session ID: http://biostatflow.org/view/G633
Results of statistical analyzes
Motivation of the design of BioStatFlow: Dissemination
Datasets
R code
BioStatFlow –© INRA DJ 2014
Some Links
A Spotlight on BioStatFlow in MetaboNews
http://www.metabonews.ca/Feb2015/MetaboNews_Feb2015.htm#spotlight
BioStatFlow is available online:
http://biostatflow.org
A Tutorial on BioStatFlow
http://biostatflow.org/doc/pg?id=tutorial:start
BioStatFlow –© INRA DJ 2014
Some references
BioStatFlow –© INRA DJ 2014
experiment
Data preprocessing
BioStatFlow –© INRA DJ 2014
BioStatFlow –© INRA DJ 2014
BioStatFlow –© INRA DJ 2014

Weitere ähnliche Inhalte

Andere mochten auch

Propostas de atendimento aos cartórios de Registro de Imóveis - Desenvolvedor...
Propostas de atendimento aos cartórios de Registro de Imóveis - Desenvolvedor...Propostas de atendimento aos cartórios de Registro de Imóveis - Desenvolvedor...
Propostas de atendimento aos cartórios de Registro de Imóveis - Desenvolvedor...IRIB
 
Registro eletrônico e a privacidade de dados
Registro eletrônico e a privacidade de dadosRegistro eletrônico e a privacidade de dados
Registro eletrônico e a privacidade de dadosIRIB
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and MiningDaniel JACOB
 
Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentatio...
Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentatio...Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentatio...
Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentatio...Judith Eckle-Kohler
 

Andere mochten auch (7)

Bio ent47
Bio ent47Bio ent47
Bio ent47
 
Blog
BlogBlog
Blog
 
Propostas de atendimento aos cartórios de Registro de Imóveis - Desenvolvedor...
Propostas de atendimento aos cartórios de Registro de Imóveis - Desenvolvedor...Propostas de atendimento aos cartórios de Registro de Imóveis - Desenvolvedor...
Propostas de atendimento aos cartórios de Registro de Imóveis - Desenvolvedor...
 
Registro eletrônico e a privacidade de dados
Registro eletrônico e a privacidade de dadosRegistro eletrônico e a privacidade de dados
Registro eletrônico e a privacidade de dados
 
04 e
04 e04 e
04 e
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and Mining
 
Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentatio...
Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentatio...Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentatio...
Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentatio...
 

Ähnlich wie Biostatflow

Association Rule Mining Scheme for Software Failure Analysis
Association Rule Mining Scheme for Software Failure AnalysisAssociation Rule Mining Scheme for Software Failure Analysis
Association Rule Mining Scheme for Software Failure AnalysisEditor IJMTER
 
Data modelling tool in CASE
Data modelling tool in CASEData modelling tool in CASE
Data modelling tool in CASEManju Pillai
 
Log Analysis Engine with Integration of Hadoop and Spark
Log Analysis Engine with Integration of Hadoop and SparkLog Analysis Engine with Integration of Hadoop and Spark
Log Analysis Engine with Integration of Hadoop and SparkIRJET Journal
 
Algorithm Procedure and Pseudo Code Mining
Algorithm Procedure and Pseudo Code MiningAlgorithm Procedure and Pseudo Code Mining
Algorithm Procedure and Pseudo Code MiningIRJET Journal
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsMichael Häusler
 
Data Gaurd Final Thesis for University in Progress (2).docx
Data Gaurd Final Thesis for University in Progress (2).docxData Gaurd Final Thesis for University in Progress (2).docx
Data Gaurd Final Thesis for University in Progress (2).docxMohdKashif82
 
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem ChemAxon
 
Generalized audit-software
Generalized audit-softwareGeneralized audit-software
Generalized audit-softwarekzoe1996
 
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...OSTHUS
 
A Survey on Bug Tracking System for Effective Bug Clearance
A Survey on Bug Tracking System for Effective Bug ClearanceA Survey on Bug Tracking System for Effective Bug Clearance
A Survey on Bug Tracking System for Effective Bug ClearanceIRJET Journal
 
Practical operability techniques for teams - Matthew Skelton - Agile in the C...
Practical operability techniques for teams - Matthew Skelton - Agile in the C...Practical operability techniques for teams - Matthew Skelton - Agile in the C...
Practical operability techniques for teams - Matthew Skelton - Agile in the C...Skelton Thatcher Consulting Ltd
 
Maximizing SAP ABAP Performance
Maximizing SAP ABAP PerformanceMaximizing SAP ABAP Performance
Maximizing SAP ABAP PerformancePeterHBrown
 
3 Software Estmation.ppt
3 Software Estmation.ppt3 Software Estmation.ppt
3 Software Estmation.pptSoham De
 
Applying linear regression and predictive analytics
Applying linear regression and predictive analyticsApplying linear regression and predictive analytics
Applying linear regression and predictive analyticsMariaDB plc
 

Ähnlich wie Biostatflow (20)

Computers in management
Computers in managementComputers in management
Computers in management
 
Association Rule Mining Scheme for Software Failure Analysis
Association Rule Mining Scheme for Software Failure AnalysisAssociation Rule Mining Scheme for Software Failure Analysis
Association Rule Mining Scheme for Software Failure Analysis
 
Data modelling tool in CASE
Data modelling tool in CASEData modelling tool in CASE
Data modelling tool in CASE
 
Log Analysis Engine with Integration of Hadoop and Spark
Log Analysis Engine with Integration of Hadoop and SparkLog Analysis Engine with Integration of Hadoop and Spark
Log Analysis Engine with Integration of Hadoop and Spark
 
Algorithm Procedure and Pseudo Code Mining
Algorithm Procedure and Pseudo Code MiningAlgorithm Procedure and Pseudo Code Mining
Algorithm Procedure and Pseudo Code Mining
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
 
Data Gaurd Final Thesis for University in Progress (2).docx
Data Gaurd Final Thesis for University in Progress (2).docxData Gaurd Final Thesis for University in Progress (2).docx
Data Gaurd Final Thesis for University in Progress (2).docx
 
IJET-V2I6P28
IJET-V2I6P28IJET-V2I6P28
IJET-V2I6P28
 
Function points and elements
Function points and elementsFunction points and elements
Function points and elements
 
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
 
BIO
BIOBIO
BIO
 
Generalized audit-software
Generalized audit-softwareGeneralized audit-software
Generalized audit-software
 
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
 
A Survey on Bug Tracking System for Effective Bug Clearance
A Survey on Bug Tracking System for Effective Bug ClearanceA Survey on Bug Tracking System for Effective Bug Clearance
A Survey on Bug Tracking System for Effective Bug Clearance
 
Practical operability techniques for teams - Matthew Skelton - Agile in the C...
Practical operability techniques for teams - Matthew Skelton - Agile in the C...Practical operability techniques for teams - Matthew Skelton - Agile in the C...
Practical operability techniques for teams - Matthew Skelton - Agile in the C...
 
Maximizing SAP ABAP Performance
Maximizing SAP ABAP PerformanceMaximizing SAP ABAP Performance
Maximizing SAP ABAP Performance
 
Sd Revision
Sd RevisionSd Revision
Sd Revision
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
 
3 Software Estmation.ppt
3 Software Estmation.ppt3 Software Estmation.ppt
3 Software Estmation.ppt
 
Applying linear regression and predictive analytics
Applying linear regression and predictive analyticsApplying linear regression and predictive analytics
Applying linear regression and predictive analytics
 

Kürzlich hochgeladen

Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 

Kürzlich hochgeladen (20)

Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 

Biostatflow

  • 1. BioStatFlow –© INRA DJ 2014 PMFB –UMR 1332, INRA, F-33140 Villenave d’Ornon djacob@bordeaux.inra.fr http://biostatflow.org
  • 2. BioStatFlow –© INRA DJ 2014 BioStatFlow is a web application designed for the analysis of "omics", including metabolomics, data with statistical methods. It deals with the analysis of data sets generated from experiments. Omics experiments yield large amounts of data, too much to be interpreted by the human eye. A combination of multivariate and univariate data analyses are therefore essential to extract and visualize the information of interest. Biologists need to gain basic knowledge about the statistics employed to critically contribute to and evaluate their experimental design, protocols, and results. Nevertheless, there is still a lack of useful, fast, and easy online statistical tools for those who are not experts in statistics. BioStatFlow has been developed to meet this need. A web-based tool for Statistical Analysis
  • 3. BioStatFlow –© INRA DJ 2014 Motivation of the design of BioStatFlow 1. The main goal of BioStatFlow is to facilitate the access to statistical tools for biologists that are not specialists. It has been designed to execute statistical analyses sequentially, i.e. a linear chain of statistical processing, so-called workflow in BioStatFlow. From a set of use cases identified (mainly around OMICS data), BioStatFlow is based on the typical workflow as shown below: A set of analysis is first proposed as a static sequence in order to normalize the dataset. At this stage, users have to follow the order of the sequence. Because of experimental issues in the technical equipment, the levels of some analytical variables (features) cannot be determined or that different experiments need to be compared, missing value estimation and data scaling are helpful pre-processing steps. This is the default use case (default workflow). Then, users can choose any of additional methods depending on the dataset and the corresponding experimental design (i.e. factors), in order i) to visualize the whole data, ii) to reveal biomarkers, iii) to analyse interactions between factors, iv) to discriminate groups, and so on. The entrance to each treatment takes the output of previous treatment. If a treatment generates a data table (matrix) as an output, it will be used as input to the next step. Otherwise, if the treatment only generates results (texts and images) but does not change the input array, this latter will be directly taken as output. Each treatment can be written as an R script (most common) or as a PERL script, embedding binary tools (like Matlab compiled scripts).
  • 4. BioStatFlow –© INRA DJ 2014 http://biostatflow.org/doc/pg?id=tutorial:startTutorial: Overview of how to use BioStatFlow
  • 5. BioStatFlow –© INRA DJ 2014 STEP1: Input Dataset : Provided by user, by uploading a dataset file correctly formatted, then « Next Step »
  • 6. BioStatFlow –© INRA DJ 2014 STEP2: Workflow selection Modify parameters and/or add another analysis, then "Launch"
  • 7. BioStatFlow –© INRA DJ 2014 STEP3: Visualization of Results Select a result, Zoom In/Out, or Download
  • 8. BioStatFlow –© INRA DJ 2014 2. BioStatFlow allows bioinformaticians to easily integrate a new method of statistical analysis in a workflow, or even create their own workflows. Thus, the analysis scripts and the workflow definition files are stored in separate catalogs of the application; some configuration files enabling integration without modify the application source code. Motivation of the design of BioStatFlow
  • 9. BioStatFlow –© INRA DJ 2014 The BioStatFlow software components consist of: 1. The BioStatFlow core, which is responsible for: • managing the input-output through the GUI (datasets, workflows, parameters of each analysis, and results), • creating batch scripts, from the workflow definition files, • launching the analysis scripts, • managing the persistent sessions (including access management) 2. The workflow and statistical analysis catalogs. These catalogs may be enriched at any time by adding either some statistical analyses or even a new workflow. 3. The repository of persistent sessions. To save your work in a persistent session, you have to register before. Architecture 1 2 3
  • 10. BioStatFlow –© INRA DJ 2014 Workflow and Statistical Analysis catalogs Catalog’s Root Workflow 1 Workflow 2 Workflow n … def doc scripts PCA.def PCA.xml PCA.R … … … … … … Definition files Documentation files Scripts files workflow.def Workflow definition files •A Workflow is implemented as a directory containing itself three sub-directories, plus one definition file. •the ‘def’ sub-directory: •contains the analysis definition files which serve to automatically build the GUI of input masks of the analysis parameters with some default values, and also the the header of R scripts taken into account the initialization of parameters with the values given by the user. •the 'doc' sub-directory: •contains the analysis documentation files describing the the analysis parameters within the input mask. •the 'scripts' sub-directory: •contains the analysis scripts themselves (not including the initialisation part of their parameters, given that the header of each script, automatically generated, takes into account this part ) •the 'workflow.def‘ file: •contains the list of all analyses within the workflow
  • 11. BioStatFlow –© INRA DJ 2014 PCA.def Header of the R script (automatically generated) The R script (written by the provider) dataInMat dataInFact dataOutMat dataOutFact PCA.R Params Results PCA.xml An example: PCA Overview of the interaction mechanism of the different file types
  • 12. BioStatFlow –© INRA DJ 2014 An example: PCA PCA.def GUI (automatically generated) Header of the R code (automatically generated)
  • 13. BioStatFlow –© INRA DJ 2014 An example: PCA … … PCA.R : R code written by the provider
  • 14. BioStatFlow –© INRA DJ 2014 An example: PCA Results
  • 15. BioStatFlow –© INRA DJ 2014 Repository of persistent sessions Repository’s Root Session 1 Session 2 Session n … query bswf imported_matrix_file.csv p0 : Data Formatting p1 : Split names Sub-directory of Input data Sub-directory of the analysis results p5 : Scaling … … sessparams : session parameters
  • 16. BioStatFlow –© INRA DJ 2014 3. BioStatFlow helps disseminate the results of statistical analyzes by saving them in a persistent session so that they can be fully restored. One can thus provide the session identifier when publishing results (see the tutorial). To disseminate your data and their associated statistical analysis, communicate the URL formed as: http://biostatflow.org/view/<SESSION ID> Motivation of the design of BioStatFlow
  • 17. BioStatFlow –© INRA DJ 2014 Example of Session ID: http://biostatflow.org/view/G633 Results of statistical analyzes Motivation of the design of BioStatFlow: Dissemination Datasets R code
  • 18. BioStatFlow –© INRA DJ 2014 Some Links A Spotlight on BioStatFlow in MetaboNews http://www.metabonews.ca/Feb2015/MetaboNews_Feb2015.htm#spotlight BioStatFlow is available online: http://biostatflow.org A Tutorial on BioStatFlow http://biostatflow.org/doc/pg?id=tutorial:start
  • 19. BioStatFlow –© INRA DJ 2014 Some references
  • 20. BioStatFlow –© INRA DJ 2014 experiment Data preprocessing