SlideShare ist ein Scribd-Unternehmen logo
1 von 6
Downloaden Sie, um offline zu lesen
Discovering
Data Science Design Patterns
with Examples from R and Python
Dmitrij Petrov
Autumn 2017
30/11/2017 1Dmitrij Petrov - Master Thesis Presentation - Autumn 2017
Outlining Master Thesis
Motivation
• Design patterns capture best solutions to recurring issues in
• Architecture
• Started the Pattern Language Movement
• Object-Oriented Programming
• Seminal work for software analysis, design and implementation
• Cloud Computing, Database Modelling, etc.
• Data Science
30/11/2017
Research Questions
• RQ1: What exactly does software ecosystem, data science and design
pattern mean?
• RQ2: Which data science-oriented design patterns can be recognized?
• RQ3: What are the specific FOSS R and Python tools that can be used for
solving common data mining problems?
30/11/2017 3Dmitrij Petrov - Master Thesis Presentation - Autumn 2017
Methodology – 3D2P framework
Dmitrij Petrov - Master Thesis Presentation - Autumn 2017
Pattern
prospecting
Pattern mining Pattern writing
- Literature Sources
- General Inductive Approach &
Open/Axial Coding
- Discovery of patterns
(i.e. best practises and
their relationships)
Relevant works of: Thomas (‘06), Inventado & Scupelli (‘15), Meszaros & Doble (‘96)
- Follow PW guidelines for
their documentation
A Pattern Example – “Build Me Dataset”
“Build Me Dataset”
Dmitrij Petrov - Master Thesis Presentation - Autumn 2017
1. Pattern Name & Sketch
2. Context: you want to process data from multiple data sources/formats
3. Problem: extracting/storing data in a common data structure
4. Solution: “table”  “data frame”
5. Consequences: can be very simple but also slow
6. Known uses: modelling, visualization…
7. Examples: from R & Python ecosystem
30/11/2017 5
Expected Outcomes
1. Aim to formulate Data Science design patterns
2. Data Science R and Python Toolkit Matrix
• A holistic map of tools can simplify knowledge discovery process
30/11/2017 6Dmitrij Petrov - Master Thesis Presentation - Autumn 2017

Weitere ähnliche Inhalte

Ähnlich wie Discovering Data Science Design Patterns with Examples from R and Python Software Ecosystem

Software Engineering Patterns for Machine Learning Applications
Software Engineering Patterns for Machine Learning ApplicationsSoftware Engineering Patterns for Machine Learning Applications
Software Engineering Patterns for Machine Learning ApplicationsHironori Washizaki
 
Software art and design: computational thinking through programming practice ...
Software art and design: computational thinking through programming practice ...Software art and design: computational thinking through programming practice ...
Software art and design: computational thinking through programming practice ...Aarhus University
 
Interaction Design Patterns in Recommender Systems
Interaction Design Patterns in Recommender SystemsInteraction Design Patterns in Recommender Systems
Interaction Design Patterns in Recommender SystemsUniversity of Bergen
 
Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerAutomatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerFrancesco Osborne
 
Yannis@patras seminar a_20150512a
Yannis@patras seminar a_20150512aYannis@patras seminar a_20150512a
Yannis@patras seminar a_20150512aYannis
 
R in the Humanities: Text Analysis (v2)
R in the Humanities: Text Analysis (v2)R in the Humanities: Text Analysis (v2)
R in the Humanities: Text Analysis (v2)Leah Henrickson
 
Design & Evaluation of the Goal-Oriented Design Knowledge Library
Design & Evaluation of the Goal-Oriented Design Knowledge LibraryDesign & Evaluation of the Goal-Oriented Design Knowledge Library
Design & Evaluation of the Goal-Oriented Design Knowledge Libraryandrewhilts
 
MetaScience: Holistic Approach for Research Modeling and Analysis
MetaScience: Holistic Approach for Research Modeling and AnalysisMetaScience: Holistic Approach for Research Modeling and Analysis
MetaScience: Holistic Approach for Research Modeling and AnalysisJordi Cabot
 
Thirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping StudyThirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping Studyswolny
 
News Production Workflows in Data- driven, Algorithmic Journalism: A Systema...
 News Production Workflows in Data- driven, Algorithmic Journalism: A Systema... News Production Workflows in Data- driven, Algorithmic Journalism: A Systema...
News Production Workflows in Data- driven, Algorithmic Journalism: A Systema...Julian Ausserhofer
 
Analysing the concept of quality in model-driven engineering literature: a sy...
Analysing the concept of quality in model-driven engineering literature: a sy...Analysing the concept of quality in model-driven engineering literature: a sy...
Analysing the concept of quality in model-driven engineering literature: a sy...Fáber D. Giraldo
 
Guidelines For PhD System Development
Guidelines For PhD System DevelopmentGuidelines For PhD System Development
Guidelines For PhD System DevelopmentPhD Services
 
Deep learning Type Inference for Dynamic Programming Languages
Deep learning Type Inference for Dynamic Programming Languages Deep learning Type Inference for Dynamic Programming Languages
Deep learning Type Inference for Dynamic Programming Languages Amir M. Mir
 
Efficient Practices for Large Scale Text Mining Process
Efficient Practices for Large Scale Text Mining ProcessEfficient Practices for Large Scale Text Mining Process
Efficient Practices for Large Scale Text Mining ProcessOntotext
 
Model Driven Engineering for Design-Runtime Interaction in Complex Systems: S...
Model Driven Engineering for Design-Runtime Interaction in Complex Systems: S...Model Driven Engineering for Design-Runtime Interaction in Complex Systems: S...
Model Driven Engineering for Design-Runtime Interaction in Complex Systems: S...Hugo Bruneliere
 

Ähnlich wie Discovering Data Science Design Patterns with Examples from R and Python Software Ecosystem (20)

Software Engineering Patterns for Machine Learning Applications
Software Engineering Patterns for Machine Learning ApplicationsSoftware Engineering Patterns for Machine Learning Applications
Software Engineering Patterns for Machine Learning Applications
 
Software art and design: computational thinking through programming practice ...
Software art and design: computational thinking through programming practice ...Software art and design: computational thinking through programming practice ...
Software art and design: computational thinking through programming practice ...
 
Interaction Design Patterns in Recommender Systems
Interaction Design Patterns in Recommender SystemsInteraction Design Patterns in Recommender Systems
Interaction Design Patterns in Recommender Systems
 
Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerAutomatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
 
Yannis@patras seminar a_20150512a
Yannis@patras seminar a_20150512aYannis@patras seminar a_20150512a
Yannis@patras seminar a_20150512a
 
R in the Humanities: Text Analysis (v2)
R in the Humanities: Text Analysis (v2)R in the Humanities: Text Analysis (v2)
R in the Humanities: Text Analysis (v2)
 
Design & Evaluation of the Goal-Oriented Design Knowledge Library
Design & Evaluation of the Goal-Oriented Design Knowledge LibraryDesign & Evaluation of the Goal-Oriented Design Knowledge Library
Design & Evaluation of the Goal-Oriented Design Knowledge Library
 
MetaScience: Holistic Approach for Research Modeling and Analysis
MetaScience: Holistic Approach for Research Modeling and AnalysisMetaScience: Holistic Approach for Research Modeling and Analysis
MetaScience: Holistic Approach for Research Modeling and Analysis
 
Iwesep19.ppt
Iwesep19.pptIwesep19.ppt
Iwesep19.ppt
 
Presentation at MTSR 2012
Presentation at MTSR 2012Presentation at MTSR 2012
Presentation at MTSR 2012
 
Thirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping StudyThirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping Study
 
IA377 Seminar FEEC-UNICAMP Literature Review
IA377 Seminar FEEC-UNICAMP Literature ReviewIA377 Seminar FEEC-UNICAMP Literature Review
IA377 Seminar FEEC-UNICAMP Literature Review
 
News Production Workflows in Data- driven, Algorithmic Journalism: A Systema...
 News Production Workflows in Data- driven, Algorithmic Journalism: A Systema... News Production Workflows in Data- driven, Algorithmic Journalism: A Systema...
News Production Workflows in Data- driven, Algorithmic Journalism: A Systema...
 
Digital repertoires of poetry metrics: towards a Linked Open Data ecosystem
Digital repertoires of poetry metrics: towards a Linked Open Data ecosystemDigital repertoires of poetry metrics: towards a Linked Open Data ecosystem
Digital repertoires of poetry metrics: towards a Linked Open Data ecosystem
 
Analysing the concept of quality in model-driven engineering literature: a sy...
Analysing the concept of quality in model-driven engineering literature: a sy...Analysing the concept of quality in model-driven engineering literature: a sy...
Analysing the concept of quality in model-driven engineering literature: a sy...
 
Guidelines For PhD System Development
Guidelines For PhD System DevelopmentGuidelines For PhD System Development
Guidelines For PhD System Development
 
Deep learning Type Inference for Dynamic Programming Languages
Deep learning Type Inference for Dynamic Programming Languages Deep learning Type Inference for Dynamic Programming Languages
Deep learning Type Inference for Dynamic Programming Languages
 
Efficient Practices for Large Scale Text Mining Process
Efficient Practices for Large Scale Text Mining ProcessEfficient Practices for Large Scale Text Mining Process
Efficient Practices for Large Scale Text Mining Process
 
Model Driven Engineering for Design-Runtime Interaction in Complex Systems: S...
Model Driven Engineering for Design-Runtime Interaction in Complex Systems: S...Model Driven Engineering for Design-Runtime Interaction in Complex Systems: S...
Model Driven Engineering for Design-Runtime Interaction in Complex Systems: S...
 
Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender Systems
 

Mehr von F789GH

Apple's Communication: Antennagate & Batterygate
Apple's Communication: Antennagate & BatterygateApple's Communication: Antennagate & Batterygate
Apple's Communication: Antennagate & BatterygateF789GH
 
Scrum for beginners
Scrum for beginnersScrum for beginners
Scrum for beginnersF789GH
 
Service Innovation - Increasing effectiveness for corporate clients at JOSEPHS
Service Innovation - Increasing effectiveness for corporate clients at JOSEPHSService Innovation - Increasing effectiveness for corporate clients at JOSEPHS
Service Innovation - Increasing effectiveness for corporate clients at JOSEPHSF789GH
 
Co-creating a Smart Home concept
Co-creating a Smart Home conceptCo-creating a Smart Home concept
Co-creating a Smart Home conceptF789GH
 
Customer Linguistic Profiling
Customer Linguistic ProfilingCustomer Linguistic Profiling
Customer Linguistic ProfilingF789GH
 
Smart Factory: ICT Requirements
Smart Factory: ICT RequirementsSmart Factory: ICT Requirements
Smart Factory: ICT RequirementsF789GH
 
Presentations on two case studies
Presentations on two case studiesPresentations on two case studies
Presentations on two case studiesF789GH
 
Datenanalyse mit R
Datenanalyse mit RDatenanalyse mit R
Datenanalyse mit RF789GH
 
Introduction to the Corporate Social Responsibility
Introduction to the Corporate Social ResponsibilityIntroduction to the Corporate Social Responsibility
Introduction to the Corporate Social ResponsibilityF789GH
 
Project Management with Microsoft SharePoint and VCSs (Git & SVN)
Project Management with Microsoft SharePoint and VCSs (Git & SVN)Project Management with Microsoft SharePoint and VCSs (Git & SVN)
Project Management with Microsoft SharePoint and VCSs (Git & SVN)F789GH
 
SkyBoard Inc.: Transition to SAP ERP
SkyBoard Inc.: Transition to SAP ERPSkyBoard Inc.: Transition to SAP ERP
SkyBoard Inc.: Transition to SAP ERPF789GH
 
Consuming information: The move from radio to internet
Consuming information: The move from radio to internetConsuming information: The move from radio to internet
Consuming information: The move from radio to internetF789GH
 
Warum mochte ich für FirefoxOS entwickeln
Warum mochte ich für FirefoxOS entwickelnWarum mochte ich für FirefoxOS entwickeln
Warum mochte ich für FirefoxOS entwickelnF789GH
 
Domain name system security extension
Domain name system security extensionDomain name system security extension
Domain name system security extensionF789GH
 
Social CRM in the Banking Environment (in Germany and Swizerland)
Social CRM in the Banking Environment (in Germany and Swizerland)Social CRM in the Banking Environment (in Germany and Swizerland)
Social CRM in the Banking Environment (in Germany and Swizerland)F789GH
 
Data in the 21st century
Data in the 21st centuryData in the 21st century
Data in the 21st centuryF789GH
 
Kernmodelle
KernmodelleKernmodelle
KernmodelleF789GH
 
Moebel
MoebelMoebel
MoebelF789GH
 
Presentace woyzek
Presentace woyzekPresentace woyzek
Presentace woyzekF789GH
 
Warum kann man Pi nicht als einen Bruch aufschreiben ?
 Warum kann man Pi nicht als einen Bruch aufschreiben ? Warum kann man Pi nicht als einen Bruch aufschreiben ?
Warum kann man Pi nicht als einen Bruch aufschreiben ?F789GH
 

Mehr von F789GH (20)

Apple's Communication: Antennagate & Batterygate
Apple's Communication: Antennagate & BatterygateApple's Communication: Antennagate & Batterygate
Apple's Communication: Antennagate & Batterygate
 
Scrum for beginners
Scrum for beginnersScrum for beginners
Scrum for beginners
 
Service Innovation - Increasing effectiveness for corporate clients at JOSEPHS
Service Innovation - Increasing effectiveness for corporate clients at JOSEPHSService Innovation - Increasing effectiveness for corporate clients at JOSEPHS
Service Innovation - Increasing effectiveness for corporate clients at JOSEPHS
 
Co-creating a Smart Home concept
Co-creating a Smart Home conceptCo-creating a Smart Home concept
Co-creating a Smart Home concept
 
Customer Linguistic Profiling
Customer Linguistic ProfilingCustomer Linguistic Profiling
Customer Linguistic Profiling
 
Smart Factory: ICT Requirements
Smart Factory: ICT RequirementsSmart Factory: ICT Requirements
Smart Factory: ICT Requirements
 
Presentations on two case studies
Presentations on two case studiesPresentations on two case studies
Presentations on two case studies
 
Datenanalyse mit R
Datenanalyse mit RDatenanalyse mit R
Datenanalyse mit R
 
Introduction to the Corporate Social Responsibility
Introduction to the Corporate Social ResponsibilityIntroduction to the Corporate Social Responsibility
Introduction to the Corporate Social Responsibility
 
Project Management with Microsoft SharePoint and VCSs (Git & SVN)
Project Management with Microsoft SharePoint and VCSs (Git & SVN)Project Management with Microsoft SharePoint and VCSs (Git & SVN)
Project Management with Microsoft SharePoint and VCSs (Git & SVN)
 
SkyBoard Inc.: Transition to SAP ERP
SkyBoard Inc.: Transition to SAP ERPSkyBoard Inc.: Transition to SAP ERP
SkyBoard Inc.: Transition to SAP ERP
 
Consuming information: The move from radio to internet
Consuming information: The move from radio to internetConsuming information: The move from radio to internet
Consuming information: The move from radio to internet
 
Warum mochte ich für FirefoxOS entwickeln
Warum mochte ich für FirefoxOS entwickelnWarum mochte ich für FirefoxOS entwickeln
Warum mochte ich für FirefoxOS entwickeln
 
Domain name system security extension
Domain name system security extensionDomain name system security extension
Domain name system security extension
 
Social CRM in the Banking Environment (in Germany and Swizerland)
Social CRM in the Banking Environment (in Germany and Swizerland)Social CRM in the Banking Environment (in Germany and Swizerland)
Social CRM in the Banking Environment (in Germany and Swizerland)
 
Data in the 21st century
Data in the 21st centuryData in the 21st century
Data in the 21st century
 
Kernmodelle
KernmodelleKernmodelle
Kernmodelle
 
Moebel
MoebelMoebel
Moebel
 
Presentace woyzek
Presentace woyzekPresentace woyzek
Presentace woyzek
 
Warum kann man Pi nicht als einen Bruch aufschreiben ?
 Warum kann man Pi nicht als einen Bruch aufschreiben ? Warum kann man Pi nicht als einen Bruch aufschreiben ?
Warum kann man Pi nicht als einen Bruch aufschreiben ?
 

Kürzlich hochgeladen

Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfPratikPatil591646
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 

Kürzlich hochgeladen (17)

Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdf
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 

Discovering Data Science Design Patterns with Examples from R and Python Software Ecosystem

  • 1. Discovering Data Science Design Patterns with Examples from R and Python Dmitrij Petrov Autumn 2017 30/11/2017 1Dmitrij Petrov - Master Thesis Presentation - Autumn 2017 Outlining Master Thesis
  • 2. Motivation • Design patterns capture best solutions to recurring issues in • Architecture • Started the Pattern Language Movement • Object-Oriented Programming • Seminal work for software analysis, design and implementation • Cloud Computing, Database Modelling, etc. • Data Science 30/11/2017
  • 3. Research Questions • RQ1: What exactly does software ecosystem, data science and design pattern mean? • RQ2: Which data science-oriented design patterns can be recognized? • RQ3: What are the specific FOSS R and Python tools that can be used for solving common data mining problems? 30/11/2017 3Dmitrij Petrov - Master Thesis Presentation - Autumn 2017
  • 4. Methodology – 3D2P framework Dmitrij Petrov - Master Thesis Presentation - Autumn 2017 Pattern prospecting Pattern mining Pattern writing - Literature Sources - General Inductive Approach & Open/Axial Coding - Discovery of patterns (i.e. best practises and their relationships) Relevant works of: Thomas (‘06), Inventado & Scupelli (‘15), Meszaros & Doble (‘96) - Follow PW guidelines for their documentation
  • 5. A Pattern Example – “Build Me Dataset” “Build Me Dataset” Dmitrij Petrov - Master Thesis Presentation - Autumn 2017 1. Pattern Name & Sketch 2. Context: you want to process data from multiple data sources/formats 3. Problem: extracting/storing data in a common data structure 4. Solution: “table”  “data frame” 5. Consequences: can be very simple but also slow 6. Known uses: modelling, visualization… 7. Examples: from R & Python ecosystem 30/11/2017 5
  • 6. Expected Outcomes 1. Aim to formulate Data Science design patterns 2. Data Science R and Python Toolkit Matrix • A holistic map of tools can simplify knowledge discovery process 30/11/2017 6Dmitrij Petrov - Master Thesis Presentation - Autumn 2017