SlideShare a Scribd company logo
1 of 26
Download to read offline
Elegant Graphics for Data
  Analysis with ggplot2
      Yann Abraham
     BaselR 28.04.2010
Who is Yann Abraham
• Biochemist by training
• Bioinformatician by trade
• Pharma/Biotech
  – Cellzome AG
  – Novartis Pharma AG


• http://ch.linkedin.com/in/yannabraham
How to Represent Data
“A Picture is Worth a Thousand Words”
• Visualization is a critical component of data
  analysis
• Graphics are the most efficient way to digest
  large volumes of data & identify trends
• Graphical design is a mixture of mathematical
  and perceptual science
A Straightforward Way to Create
              Visualizations
• Grammar of Graphics provides a framework to
  streamline the description and creation of graphics
• For a given dataset to be displayed:
   – Map variables to aesthetics
   – Define Layers
      • A representation (a ‘geom’) ie line, boxplot, histogram,…
      • Associated statistical transformation ie counts, model,…
   – Define Scales
      • Color, Shape, axes,…
   – Define Coordinates
   – Define Facets
Why Use ggplot2?
• Simple yet powerful syntax
• Provides a framework for creating any type of
  graphics
• Implements basic graphical design rules by
  default
An Example
• 4 cell lines where treated with a compound
  active against a class of enzymes
• Proteins where extracted and quantified using
  mass spectrometry
• Is there anything interesting?!?
Excel…
R…
R…
R…
R… (this could go on for hours)
…ggplot2!
ggplot2!
ggplot2!
ggplot2!
ggplot2!



qplot(Experiment_1,Experiment_2,data=comp,color=CELLLINE,facets=ISTARGET~CELLLINE)+
         coord_equal()+
         scale_x_log10()+
         scale_y_log10()
When Visualization Alone is Not
              Enough
• Some datasets are large multidimensional
  data structures
• Representing data from such structure
  requires data transformation
• R is good at handling large sets
• R functions for handling multidimensional sets
  are complex to use
Easy Data Transformation With plyr
• plyr provides wrappers around typical R
  operations
  – Split
  – Apply
  – Combine
• plyr functions are similar to the by() function
Why use plyr?
• Simple syntax
• Predictable output
• Tightly integrated into ggplot2

• This comes at a price – somewhat slower than
  apply
An Example
• Given a set of raw data from a High
  Throughput Screen, compute the plate-
  normalized effect
The standard R way
    plate.mean <- aggregate(hts.data$RAW,
         list(hts.data$PLATE_ID),mean)

names(plate.mean) <- c(‘PLATE_ID’,’PLATE_MEAN’)

     hts.data <- merge(hts.data,plate.mean)

               hts.data$NORM<-
      hts.data$RAW/hts.data$PLATE_MEAN
The plyr way
             hts.data <-
ddply(hts.data,.(PLATE_ID),function(df) {
   df$NORM<-df$RAW/mean(df$RAW)
             return(df) }
                   )
Benefits of Using plyr & ggplot2
• Compact, straightforward syntax
  – Good basic output, complex options only required
    for polishing
• Shifts focus from plotting to exploring
  – Presentation graphics can be created from there
    at minimal cost
  – Data transformation is intuitive
• Powerful statistics available
  – It’s R!
Some links…
• The Grammar of Graphics book by Leland
  Wilkinson
• The ggplot2 book by Hadley Wickham
  – And the corresponding website
• A presentation about plyr by JD Long
  – And his initial blog post
THANK YOU FOR YOUR
    ATTENTION!

More Related Content

What's hot

모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
r-kor
 
Tutorial of topological data analysis part 3(Mapper algorithm)
Tutorial of topological data analysis part 3(Mapper algorithm)Tutorial of topological data analysis part 3(Mapper algorithm)
Tutorial of topological data analysis part 3(Mapper algorithm)
Ha Phuong
 

What's hot (20)

R and Visualization: A match made in Heaven
R and Visualization: A match made in HeavenR and Visualization: A match made in Heaven
R and Visualization: A match made in Heaven
 
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
 
Chapter 9 ds
Chapter 9 dsChapter 9 ds
Chapter 9 ds
 
Vectors data frames
Vectors data framesVectors data frames
Vectors data frames
 
Machine Learning - Neural Networks - Perceptron
Machine Learning - Neural Networks - PerceptronMachine Learning - Neural Networks - Perceptron
Machine Learning - Neural Networks - Perceptron
 
Tutorial of topological data analysis part 3(Mapper algorithm)
Tutorial of topological data analysis part 3(Mapper algorithm)Tutorial of topological data analysis part 3(Mapper algorithm)
Tutorial of topological data analysis part 3(Mapper algorithm)
 
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
 
Pandas data transformational data structure patterns and challenges final
Pandas   data transformational data structure patterns and challenges  finalPandas   data transformational data structure patterns and challenges  final
Pandas data transformational data structure patterns and challenges final
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
Python data structures - best in class for data analysis
Python data structures -   best in class for data analysisPython data structures -   best in class for data analysis
Python data structures - best in class for data analysis
 
I Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for TreesI Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for Trees
 
5. working on data using R -Cleaning, filtering ,transformation, Sampling
5. working on data using R -Cleaning, filtering ,transformation, Sampling5. working on data using R -Cleaning, filtering ,transformation, Sampling
5. working on data using R -Cleaning, filtering ,transformation, Sampling
 
Broom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data FramesBroom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data Frames
 
Self taught clustering
Self taught clusteringSelf taught clustering
Self taught clustering
 
chapter1
chapter1chapter1
chapter1
 
Chapter 3 ds
Chapter 3 dsChapter 3 ds
Chapter 3 ds
 
3. R- list and data frame
3. R- list and data frame3. R- list and data frame
3. R- list and data frame
 
DATA VISUALIZATION WITH R PACKAGES
DATA VISUALIZATION WITH R PACKAGESDATA VISUALIZATION WITH R PACKAGES
DATA VISUALIZATION WITH R PACKAGES
 
PCA and SVD in brief
PCA and SVD in briefPCA and SVD in brief
PCA and SVD in brief
 

Similar to Elegant Graphics for Data Analysis with ggplot2

Lecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningLecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learning
my6305874
 
Week-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docxWeek-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docx
helzerpatrina
 

Similar to Elegant Graphics for Data Analysis with ggplot2 (20)

Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
 
Basic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsBasic terminologies & asymptotic notations
Basic terminologies & asymptotic notations
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
 
R Programming - part 1.pdf
R Programming - part 1.pdfR Programming - part 1.pdf
R Programming - part 1.pdf
 
Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphs
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackbox
 
R programmingmilano
R programmingmilanoR programmingmilano
R programmingmilano
 
Lecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningLecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learning
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Week-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docxWeek-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docx
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
 
Python for data science
Python for data sciencePython for data science
Python for data science
 
Practice discovering biological knowledge using networks approach.
Practice discovering biological knowledge using networks approach.Practice discovering biological knowledge using networks approach.
Practice discovering biological knowledge using networks approach.
 
DutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision MakingDutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision Making
 
Matplotlib Review 2021
Matplotlib Review 2021Matplotlib Review 2021
Matplotlib Review 2021
 
Matplotlib_Complete review_2021_abridged_version
Matplotlib_Complete review_2021_abridged_versionMatplotlib_Complete review_2021_abridged_version
Matplotlib_Complete review_2021_abridged_version
 
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Elegant Graphics for Data Analysis with ggplot2

  • 1. Elegant Graphics for Data Analysis with ggplot2 Yann Abraham BaselR 28.04.2010
  • 2. Who is Yann Abraham • Biochemist by training • Bioinformatician by trade • Pharma/Biotech – Cellzome AG – Novartis Pharma AG • http://ch.linkedin.com/in/yannabraham
  • 4. “A Picture is Worth a Thousand Words” • Visualization is a critical component of data analysis • Graphics are the most efficient way to digest large volumes of data & identify trends • Graphical design is a mixture of mathematical and perceptual science
  • 5. A Straightforward Way to Create Visualizations • Grammar of Graphics provides a framework to streamline the description and creation of graphics • For a given dataset to be displayed: – Map variables to aesthetics – Define Layers • A representation (a ‘geom’) ie line, boxplot, histogram,… • Associated statistical transformation ie counts, model,… – Define Scales • Color, Shape, axes,… – Define Coordinates – Define Facets
  • 6. Why Use ggplot2? • Simple yet powerful syntax • Provides a framework for creating any type of graphics • Implements basic graphical design rules by default
  • 7. An Example • 4 cell lines where treated with a compound active against a class of enzymes • Proteins where extracted and quantified using mass spectrometry • Is there anything interesting?!?
  • 10. R…
  • 11. R…
  • 12. R… (this could go on for hours)
  • 18. When Visualization Alone is Not Enough • Some datasets are large multidimensional data structures • Representing data from such structure requires data transformation • R is good at handling large sets • R functions for handling multidimensional sets are complex to use
  • 19. Easy Data Transformation With plyr • plyr provides wrappers around typical R operations – Split – Apply – Combine • plyr functions are similar to the by() function
  • 20. Why use plyr? • Simple syntax • Predictable output • Tightly integrated into ggplot2 • This comes at a price – somewhat slower than apply
  • 21. An Example • Given a set of raw data from a High Throughput Screen, compute the plate- normalized effect
  • 22. The standard R way plate.mean <- aggregate(hts.data$RAW, list(hts.data$PLATE_ID),mean) names(plate.mean) <- c(‘PLATE_ID’,’PLATE_MEAN’) hts.data <- merge(hts.data,plate.mean) hts.data$NORM<- hts.data$RAW/hts.data$PLATE_MEAN
  • 23. The plyr way hts.data <- ddply(hts.data,.(PLATE_ID),function(df) { df$NORM<-df$RAW/mean(df$RAW) return(df) } )
  • 24. Benefits of Using plyr & ggplot2 • Compact, straightforward syntax – Good basic output, complex options only required for polishing • Shifts focus from plotting to exploring – Presentation graphics can be created from there at minimal cost – Data transformation is intuitive • Powerful statistics available – It’s R!
  • 25. Some links… • The Grammar of Graphics book by Leland Wilkinson • The ggplot2 book by Hadley Wickham – And the corresponding website • A presentation about plyr by JD Long – And his initial blog post
  • 26. THANK YOU FOR YOUR ATTENTION!