SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Data Science meets Software Engineering
Vienna Semantic Web Meetup
2016-03-01
Bernhard Haslhofer
Who am I?
• Scientist at AIT’s Digital Insight Lab
• Specialization
• Network Analytics
• Machine Learning
• Text Mining
• PhD in Computer Science
Plan for tonight
• Build an example service
• Approach problem from
• software engineering perspective
• data science perspective
• Look at gap & propose solution
Example Service
sports
politics
art
business
Text Classification
API
Approaching the Problem
Software
Engineering
Software
Engineering
Steps
• Identify use cases / features
• Choose framework
• Implement functionality
• Ensure quality: test functionality, scalability etc…
• Deploy service
Ensure quality
public classify(Document document) {
….
}
@Test(timeout=100)
public test_classify(…) {
d = new Document(…)
c = classifier.classify(d)
assertNotNull(c)
assert(c in [sports, politics, …])
}
Result / Quality Expectation
• A service
• implementing defined use case(s)
• passing all tests (unit, integration, functional)
• fulfilling scalability needs
Approaching the Problem
Data
Science
Data
Science
Steps
• Define problem / hypothesis
• Collect data
• Design approach / model
• Ensure quality: evaluate model, compare
• Prototype algorithm (in R, Matlab, Octave, etc.)
Ensure quality
• Split dataset into training / test / cross-validation
dataset
• Train model using training dataset
• Evaluate using test (and cross-validation) dataset
• Report and investigate metrics
• precision, recall, F1, …
What ???
Software Engineering Data Science
Overall Goal Build the service Build the service
Technical Goal
Implement software features,
deploy working service
Find the right model features, get
the model right
Quality
assurance
Unit, functional, integration tests
Evaluate model, report metrics, re-
design model
What ???
• The overall (business) goal can be the same
• Different technical approach
• language issues (what is a “feature” !?)
• lack of understanding differences and necessities
• Different quality assurance
• notion of “testing” is different
• different “success factors” (passing test vs. metrics)
Possible solution
Define Goal
Collect
Ground Truth
Implement
Model and
Functions
Test &
Evaluate
Analyze
Errors
Deploy
Service
Metrics Driven Software Engineering
Tool support
@Test(precision >= 0.8)
@Test(timeout=100)
public test_classify(…) {
d = new Document(…)
c = classifier.classify(d)
assertNotNull(c)
assert(c in [sports, politics, …])
}
Thank You!
bernhard.haslhofer@ait.ac.at

Weitere ähnliche Inhalte

Ähnlich wie Mind the Gap - Data Science Meets Software Engineering

An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time Applications
Johann Schleier-Smith
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusone
DotNetCampus
 
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Sri Ambati
 
Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16
Andy Lathrop
 

Ähnlich wie Mind the Gap - Data Science Meets Software Engineering (20)

Transferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to Practice
 
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with GraphsNeo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
 
DevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-usDevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-us
 
Machine learning
Machine learningMachine learning
Machine learning
 
Making Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedMaking Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons Learned
 
An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time Applications
 
Cloud-based Modelling Solutions Empowering Tool Integration
Cloud-based Modelling Solutions Empowering Tool IntegrationCloud-based Modelling Solutions Empowering Tool Integration
Cloud-based Modelling Solutions Empowering Tool Integration
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusone
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
 
Data Mining 2008
Data Mining 2008Data Mining 2008
Data Mining 2008
 
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
 
Expanding the idea of static analysis from code check to other development pr...
Expanding the idea of static analysis from code check to other development pr...Expanding the idea of static analysis from code check to other development pr...
Expanding the idea of static analysis from code check to other development pr...
 
GraphTalk Wien - Intelligente Lösungen mit Graphen erstellen
GraphTalk Wien - Intelligente Lösungen mit Graphen erstellenGraphTalk Wien - Intelligente Lösungen mit Graphen erstellen
GraphTalk Wien - Intelligente Lösungen mit Graphen erstellen
 
Assessing Your Company's Cloud Readiness
Assessing Your Company's Cloud ReadinessAssessing Your Company's Cloud Readiness
Assessing Your Company's Cloud Readiness
 
Neo4j GraphTalk Basel - Building intelligent Software with Graphs
Neo4j GraphTalk Basel - Building intelligent Software with GraphsNeo4j GraphTalk Basel - Building intelligent Software with Graphs
Neo4j GraphTalk Basel - Building intelligent Software with Graphs
 
Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
 
From Raw Data to Deployed Product. Fast & Agile with CRISP-DM
From Raw Data to Deployed Product. Fast & Agile with CRISP-DMFrom Raw Data to Deployed Product. Fast & Agile with CRISP-DM
From Raw Data to Deployed Product. Fast & Agile with CRISP-DM
 

Mehr von Bernhard Haslhofer

Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
 Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur... Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
Bernhard Haslhofer
 
The value of open data and the OpenGLAM network
The value of open data and the OpenGLAM networkThe value of open data and the OpenGLAM network
The value of open data and the OpenGLAM network
Bernhard Haslhofer
 
Offene Daten im Kulturbereich - Die pragmatische Perspektive
Offene Daten im Kulturbereich - Die pragmatische PerspektiveOffene Daten im Kulturbereich - Die pragmatische Perspektive
Offene Daten im Kulturbereich - Die pragmatische Perspektive
Bernhard Haslhofer
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and Techniques
Bernhard Haslhofer
 
OpenGLAM Intro @ OKFN.AT Meetup Graz
OpenGLAM Intro @ OKFN.AT Meetup GrazOpenGLAM Intro @ OKFN.AT Meetup Graz
OpenGLAM Intro @ OKFN.AT Meetup Graz
Bernhard Haslhofer
 

Mehr von Bernhard Haslhofer (20)

Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...
Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...
Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...
 
Token Systems, Payment Channels, and Corporate Currencies
Token Systems, Payment Channels, and Corporate CurrenciesToken Systems, Payment Channels, and Corporate Currencies
Token Systems, Payment Channels, and Corporate Currencies
 
Can a blockchain solve the trust problem?
Can a blockchain solve the trust problem?Can a blockchain solve the trust problem?
Can a blockchain solve the trust problem?
 
Measurements in Cryptocurrency Networks
Measurements in Cryptocurrency NetworksMeasurements in Cryptocurrency Networks
Measurements in Cryptocurrency Networks
 
Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
 Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur... Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
 
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
 
O Bitcoin Where Art Thou? An Introduction to Cryptocurrency Analytics
O Bitcoin Where Art Thou? An Introduction to Cryptocurrency AnalyticsO Bitcoin Where Art Thou? An Introduction to Cryptocurrency Analytics
O Bitcoin Where Art Thou? An Introduction to Cryptocurrency Analytics
 
GraphSense - Real-time Insight into Virtual Currency Ecosystems
GraphSense - Real-time Insight into Virtual Currency EcosystemsGraphSense - Real-time Insight into Virtual Currency Ecosystems
GraphSense - Real-time Insight into Virtual Currency Ecosystems
 
BITCOIN - De-anonymization and Money Laundering Detection Strategies
BITCOIN - De-anonymization and Money Laundering Detection StrategiesBITCOIN - De-anonymization and Money Laundering Detection Strategies
BITCOIN - De-anonymization and Money Laundering Detection Strategies
 
Bitcoin - Introduction, Technical Aspects and Ongoing Developments
Bitcoin - Introduction, Technical Aspects and Ongoing DevelopmentsBitcoin - Introduction, Technical Aspects and Ongoing Developments
Bitcoin - Introduction, Technical Aspects and Ongoing Developments
 
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
 
The value of open data and the OpenGLAM network
The value of open data and the OpenGLAM networkThe value of open data and the OpenGLAM network
The value of open data and the OpenGLAM network
 
Things, not Strings
Things, not StringsThings, not Strings
Things, not Strings
 
Offene Daten im Kulturbereich - Die pragmatische Perspektive
Offene Daten im Kulturbereich - Die pragmatische PerspektiveOffene Daten im Kulturbereich - Die pragmatische Perspektive
Offene Daten im Kulturbereich - Die pragmatische Perspektive
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and Techniques
 
Semantic Tagging on Historical Maps
Semantic Tagging on Historical MapsSemantic Tagging on Historical Maps
Semantic Tagging on Historical Maps
 
The Story behind Maphub
The Story behind MaphubThe Story behind Maphub
The Story behind Maphub
 
OpenGLAM Intro @ OKFN.AT Meetup Graz
OpenGLAM Intro @ OKFN.AT Meetup GrazOpenGLAM Intro @ OKFN.AT Meetup Graz
OpenGLAM Intro @ OKFN.AT Meetup Graz
 
Semantic Tagging for old maps...and other things on the Web
Semantic Tagging for old maps...and other things on the WebSemantic Tagging for old maps...and other things on the Web
Semantic Tagging for old maps...and other things on the Web
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Mind the Gap - Data Science Meets Software Engineering

  • 1. Data Science meets Software Engineering Vienna Semantic Web Meetup 2016-03-01 Bernhard Haslhofer
  • 2. Who am I? • Scientist at AIT’s Digital Insight Lab • Specialization • Network Analytics • Machine Learning • Text Mining • PhD in Computer Science
  • 3.
  • 4. Plan for tonight • Build an example service • Approach problem from • software engineering perspective • data science perspective • Look at gap & propose solution
  • 7. Steps • Identify use cases / features • Choose framework • Implement functionality • Ensure quality: test functionality, scalability etc… • Deploy service
  • 8. Ensure quality public classify(Document document) { …. } @Test(timeout=100) public test_classify(…) { d = new Document(…) c = classifier.classify(d) assertNotNull(c) assert(c in [sports, politics, …]) }
  • 9. Result / Quality Expectation • A service • implementing defined use case(s) • passing all tests (unit, integration, functional) • fulfilling scalability needs
  • 11. Steps • Define problem / hypothesis • Collect data • Design approach / model • Ensure quality: evaluate model, compare • Prototype algorithm (in R, Matlab, Octave, etc.)
  • 12. Ensure quality • Split dataset into training / test / cross-validation dataset • Train model using training dataset • Evaluate using test (and cross-validation) dataset • Report and investigate metrics • precision, recall, F1, …
  • 13. What ??? Software Engineering Data Science Overall Goal Build the service Build the service Technical Goal Implement software features, deploy working service Find the right model features, get the model right Quality assurance Unit, functional, integration tests Evaluate model, report metrics, re- design model
  • 14. What ??? • The overall (business) goal can be the same • Different technical approach • language issues (what is a “feature” !?) • lack of understanding differences and necessities • Different quality assurance • notion of “testing” is different • different “success factors” (passing test vs. metrics)
  • 15. Possible solution Define Goal Collect Ground Truth Implement Model and Functions Test & Evaluate Analyze Errors Deploy Service Metrics Driven Software Engineering
  • 16. Tool support @Test(precision >= 0.8) @Test(timeout=100) public test_classify(…) { d = new Document(…) c = classifier.classify(d) assertNotNull(c) assert(c in [sports, politics, …]) }