SlideShare a Scribd company logo
1 of 25
Download to read offline
Knowledge Discovery in Production
André Karpištšenko
Knowledge Discovery
Requires Automation
Growth of information and devices per knowledge worker
1. Digital universe x3.8 in size in 2020. Focus on the highest-value subset.*
2. 26.3B devices in 2020, up +61% from 2015 with x2.7 IP traffic increase.**
3. 700M knowledge workers***, automation worth $5.2T to $6.7T****
* IDC, Apr 2014
** Cisco, Jun 2016
*** Teleport.org, Jun 2016
**** McKinsey, Jun 2016
Core Dataflow
Model Engine
Preprocessing Dataflow
System Composition:
Networked Intelligence
Mature
Nascent
Emerging
networked.ai
Infrastructure, Data & IoT Platforms, Advanced Analytics Platforms
Input
Data
Info
Merger
Data Curator Preparer & Explorer
Base Library
SelectorExecutor
Self-improvementInterpreter
Output Interfaces Core Human Interfaces
Knowledge
Manager
Knowledge
Manager
Predictive Modeling Flow Example
DashOpt
Feature
Engineering
Raw
Data
Raw
Features
Labels
Feature
Integration
Features
with Labels
Data
Partitioning
Training
Data
Validation
Data
Testing
Data
Model Training
Evaluate for
model selection
Compute offline
evaluation metrics
Best model
Offline scoring
and indexing
Online/offline
systems
Online A/B test
Label
preparation
Log data
Scoring
features
Raw features
Feature
integrationModel
Performance
Test Results
Applications in Production
Electronics Manufacturing Biotechnology
Process time reduction
Predictive maintenance Quality improvement
Yield increase
Product Preview
Preprocessing data for manufacturing
analytics is complex and time consuming.
Custom built preprocessing
solutions are used to gather data
in electronics manufacturing.
The problem
How do people
solve it today
Product Scope
Data-driven electronics manufacturing
enabling understanding and prediction
• Heavy machinery
• Automotive
• Consumer Devices & Networks
• Drives
• PLC
Product for Pilot Factories
Product Solution
• Hybrid SaaS factory subscriptions and applications via open marketplace
• Real-time data streams from the field and factories for R&D and production
Electronics Factories
End Products
IoT Platforms Cloud Services
Delivering Business Value
Enabled metrics data
Increased engagement 2x
Enhanced usability of MES
Increased productivity
Test time reduction
270k-290kEUR/plant
Reducing risk through higher quality data and
improving business with data preprocessing
Industrial Analytics Example:
Bosch Competition, I
4 product lines
52 stations
Every feature has timestamp
Data rows
Parts of mechanical components
# (training data) – 1 183 747
# (test data) – 1 183 748
Data columns
Anonymized features of stations
Numeric – 970
Categorical – 2 141
Bosch has to ensure that the recipes for the production of its
advanced mechanical components are of the highest quality
and safety standards. Part of doing so is closely monitoring its
parts as they progress through the manufacturing processes.
https://www.kaggle.com/
(Dis%nct)pa,erns)of)missing)values)of)all)sta%ons)))
Utilization of stations
Industrial Analytics Example:
Bosch Competition, II
ProductFamilies
https://sites.google.com/site/iotminingtutorial/
IoT Data Streams Mining
• Continuous data, dynamic models, distributed, few seconds
Streams Mining: Actors Model
Data processing pipeline Distributed processing
Kappa Architecture
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
DashOpt: Data Science Intelligence
Real-Time Predictive Flow
ML & Simulation
Platforms
IoT Platforms
Preprocessed Data
IoT Data
Earth Data
Manufacturing
Data
Predictive Models
Decision Tree SVM
Neural Network Random Forest
Data 

Science

Intelligence
Outlier Detection
• Single point anomaly detection: likelihood over distribution
• Finding anomalous groups: divergence estimation
• Methods: percentage change, T-test, Chi-square test, Generalized ESD (Extreme
Studentized Deviate) test, Seasonal Hybrid ESD, etc.
• Goal: move from detection to automated response
Outlier Detection in Practice
• Too many detections of too little value
• Use methods for thresholds
• Breakout detection and Concept Drift
• For changing distributions move baselines over time
• Risk of overfitting to known anomalies, not finding unknown anomalies
Bayesian aka Active Optimization
• Examples: Design of Experiments, hyper-parameters of supervised
learning, algorithms tested with simulations
f is an unknown expensive black-box function with the goal to
approximately optimize f with as few experiments as possible
• No free lunch theorem
• Other bio-inspired
algorithms for optimization
exploitation and
exploration: neural
networks, genetic algorithms,
swarm intelligence, ant
colony optimisation, etc.
Bayesian Optimization in Practice
• SigOpt experience: 20 dimensions, above human capacity.
• Uber ATC experience: scaling active optimization to high
dimensions default works reliably for 5-7 dim.
• Variables are added during optimization.
• Choose fidelity using heuristics.
DashOpt: Data Science Intelligence
US Patent pending
Extensive data bases of DNA sequences,
metabolism of cells and components – enzymes
etc., high-throughput experimental omics-
methods
Software environment for in silico ab initio
design of cells, and in silico testing
(predictive modeling) of the cell designs in
manufacturing processes
Current State in Biotech
Already available Future state
Thinking about Value from Data Science

More Related Content

What's hot

Dealing with uncertainty in fintech using AI
Dealing with uncertainty in fintech using AIDealing with uncertainty in fintech using AI
Dealing with uncertainty in fintech using AIData Products Meetup
 
Android Malware 2020 (CCCS-CIC-AndMal-2020)
Android Malware 2020 (CCCS-CIC-AndMal-2020)Android Malware 2020 (CCCS-CIC-AndMal-2020)
Android Malware 2020 (CCCS-CIC-AndMal-2020)Indraneel Dabhade
 
Что такое Data Science
Что такое Data ScienceЧто такое Data Science
Что такое Data ScienceOlga Lavrentieva
 
Towards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTowards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTuri, Inc.
 
Satwik Mishra resume
Satwik Mishra resumeSatwik Mishra resume
Satwik Mishra resumeSatwik Mishra
 
Ferruzza g automl deck
Ferruzza g   automl deckFerruzza g   automl deck
Ferruzza g automl deckEric Dill
 
Data! Data! Data! I Can't Make Bricks Without Clay!
Data! Data! Data! I Can't Make Bricks Without Clay!Data! Data! Data! I Can't Make Bricks Without Clay!
Data! Data! Data! I Can't Make Bricks Without Clay!Turi, Inc.
 
Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017Boris Adryan
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for EveryoneDhiana Deva
 
II-SDV 2017: The Next Era: Deep Learning for Biomedical Research
II-SDV 2017: The Next Era: Deep Learning for Biomedical ResearchII-SDV 2017: The Next Era: Deep Learning for Biomedical Research
II-SDV 2017: The Next Era: Deep Learning for Biomedical ResearchDr. Haxel Consult
 
Deep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreathDeep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreathDatabricks
 
Python and Machine Learning Applications in Industry
Python and Machine Learning Applications in IndustryPython and Machine Learning Applications in Industry
Python and Machine Learning Applications in Industrystermedia
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine LearningMostafa
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data SciencePouria Amirian
 
Machine Learning for the Sensored IoT
Machine Learning for the Sensored IoTMachine Learning for the Sensored IoT
Machine Learning for the Sensored IoTHank Roark
 
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...Mark Tabladillo
 
Detecting eCommerce Fraud with Neo4j and Linkurious
Detecting eCommerce Fraud with Neo4j and LinkuriousDetecting eCommerce Fraud with Neo4j and Linkurious
Detecting eCommerce Fraud with Neo4j and LinkuriousNeo4j
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewDr. Ananth Krishnamoorthy
 

What's hot (20)

Dealing with uncertainty in fintech using AI
Dealing with uncertainty in fintech using AIDealing with uncertainty in fintech using AI
Dealing with uncertainty in fintech using AI
 
Android Malware 2020 (CCCS-CIC-AndMal-2020)
Android Malware 2020 (CCCS-CIC-AndMal-2020)Android Malware 2020 (CCCS-CIC-AndMal-2020)
Android Malware 2020 (CCCS-CIC-AndMal-2020)
 
Что такое Data Science
Что такое Data ScienceЧто такое Data Science
Что такое Data Science
 
Towards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTowards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning Benchmark
 
Satwik Mishra resume
Satwik Mishra resumeSatwik Mishra resume
Satwik Mishra resume
 
Ferruzza g automl deck
Ferruzza g   automl deckFerruzza g   automl deck
Ferruzza g automl deck
 
Data! Data! Data! I Can't Make Bricks Without Clay!
Data! Data! Data! I Can't Make Bricks Without Clay!Data! Data! Data! I Can't Make Bricks Without Clay!
Data! Data! Data! I Can't Make Bricks Without Clay!
 
Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
II-SDV 2017: The Next Era: Deep Learning for Biomedical Research
II-SDV 2017: The Next Era: Deep Learning for Biomedical ResearchII-SDV 2017: The Next Era: Deep Learning for Biomedical Research
II-SDV 2017: The Next Era: Deep Learning for Biomedical Research
 
Deep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreathDeep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreath
 
Python and Machine Learning Applications in Industry
Python and Machine Learning Applications in IndustryPython and Machine Learning Applications in Industry
Python and Machine Learning Applications in Industry
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
Machine Learning for the Sensored IoT
Machine Learning for the Sensored IoTMachine Learning for the Sensored IoT
Machine Learning for the Sensored IoT
 
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
 
Detecting eCommerce Fraud with Neo4j and Linkurious
Detecting eCommerce Fraud with Neo4j and LinkuriousDetecting eCommerce Fraud with Neo4j and Linkurious
Detecting eCommerce Fraud with Neo4j and Linkurious
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape Overview
 
Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017 Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017
 

Viewers also liked

Starship, Building Intelligent Delivery Robots
Starship, Building Intelligent Delivery RobotsStarship, Building Intelligent Delivery Robots
Starship, Building Intelligent Delivery RobotsAndré Karpištšenko
 
Lingvist - Statistical Methods in Language Learning
Lingvist - Statistical Methods in Language LearningLingvist - Statistical Methods in Language Learning
Lingvist - Statistical Methods in Language LearningAndré Karpištšenko
 
Knowledge discovery in social media mining for market analysis
Knowledge discovery in social media mining for market analysisKnowledge discovery in social media mining for market analysis
Knowledge discovery in social media mining for market analysisSenuri Wijenayake
 
Tutorial Knowledge Discovery
Tutorial Knowledge DiscoveryTutorial Knowledge Discovery
Tutorial Knowledge DiscoverySSSW
 
Machine learning in action at Pipedrive
Machine learning in action at PipedriveMachine learning in action at Pipedrive
Machine learning in action at PipedriveAndré Karpištšenko
 
Knowledge Discovery in Databases
Knowledge Discovery in DatabasesKnowledge Discovery in Databases
Knowledge Discovery in DatabasesDiwas Kandel
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data MiningAmritanshu Mehra
 

Viewers also liked (8)

Starship, Building Intelligent Delivery Robots
Starship, Building Intelligent Delivery RobotsStarship, Building Intelligent Delivery Robots
Starship, Building Intelligent Delivery Robots
 
Lingvist - Statistical Methods in Language Learning
Lingvist - Statistical Methods in Language LearningLingvist - Statistical Methods in Language Learning
Lingvist - Statistical Methods in Language Learning
 
Knowledge discovery in social media mining for market analysis
Knowledge discovery in social media mining for market analysisKnowledge discovery in social media mining for market analysis
Knowledge discovery in social media mining for market analysis
 
Tutorial Knowledge Discovery
Tutorial Knowledge DiscoveryTutorial Knowledge Discovery
Tutorial Knowledge Discovery
 
Machine learning in action at Pipedrive
Machine learning in action at PipedriveMachine learning in action at Pipedrive
Machine learning in action at Pipedrive
 
Knowledge Discovery in Databases
Knowledge Discovery in DatabasesKnowledge Discovery in Databases
Knowledge Discovery in Databases
 
Business-IT Alignment
Business-IT AlignmentBusiness-IT Alignment
Business-IT Alignment
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining
 

Similar to Knowledge Discovery in Production

Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and OpportunitiesKenny Huang Ph.D.
 
2023 GEOINT Tutorial - Synthetic Data Tools for Computer Vision-Based AI - Re...
2023 GEOINT Tutorial - Synthetic Data Tools for Computer Vision-Based AI - Re...2023 GEOINT Tutorial - Synthetic Data Tools for Computer Vision-Based AI - Re...
2023 GEOINT Tutorial - Synthetic Data Tools for Computer Vision-Based AI - Re...Chris Andrews
 
Dell NVIDIA AI Roadshow - South Western Ontario
Dell NVIDIA AI Roadshow - South Western OntarioDell NVIDIA AI Roadshow - South Western Ontario
Dell NVIDIA AI Roadshow - South Western OntarioBill Wong
 
Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Jeffrey Sica
 
Ultralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC EdgeUltralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC EdgeDataWorks Summit
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
 
Machine Learning on dirty data - Dataiku - Forum du GFII 2014
Machine Learning on dirty data - Dataiku - Forum du GFII 2014Machine Learning on dirty data - Dataiku - Forum du GFII 2014
Machine Learning on dirty data - Dataiku - Forum du GFII 2014Le_GFII
 
influence of AI in IS
influence of AI in ISinfluence of AI in IS
influence of AI in ISISACA Riyadh
 
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)byteLAKE
 
Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Sri Ambati
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTrivadis
 
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...DataBench
 
General introduction to IoTCrawler
General introduction to IoTCrawlerGeneral introduction to IoTCrawler
General introduction to IoTCrawlerIoTCrawler
 
Information Security Analytics
Information Security AnalyticsInformation Security Analytics
Information Security AnalyticsAmrit Chhetri
 
Training of Python scikit-learn models on Azure
Training of Python scikit-learn models on AzureTraining of Python scikit-learn models on Azure
Training of Python scikit-learn models on AzureMark Tabladillo
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Tomasz Bednarz
 

Similar to Knowledge Discovery in Production (20)

Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and Opportunities
 
2023 GEOINT Tutorial - Synthetic Data Tools for Computer Vision-Based AI - Re...
2023 GEOINT Tutorial - Synthetic Data Tools for Computer Vision-Based AI - Re...2023 GEOINT Tutorial - Synthetic Data Tools for Computer Vision-Based AI - Re...
2023 GEOINT Tutorial - Synthetic Data Tools for Computer Vision-Based AI - Re...
 
Dell NVIDIA AI Roadshow - South Western Ontario
Dell NVIDIA AI Roadshow - South Western OntarioDell NVIDIA AI Roadshow - South Western Ontario
Dell NVIDIA AI Roadshow - South Western Ontario
 
Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Ultralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC EdgeUltralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC Edge
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Machine Learning on dirty data - Dataiku - Forum du GFII 2014
Machine Learning on dirty data - Dataiku - Forum du GFII 2014Machine Learning on dirty data - Dataiku - Forum du GFII 2014
Machine Learning on dirty data - Dataiku - Forum du GFII 2014
 
influence of AI in IS
influence of AI in ISinfluence of AI in IS
influence of AI in IS
 
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
 
Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
 
Big Data Application Architectures - IoT
Big Data Application Architectures - IoTBig Data Application Architectures - IoT
Big Data Application Architectures - IoT
 
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
 
General introduction to IoTCrawler
General introduction to IoTCrawlerGeneral introduction to IoTCrawler
General introduction to IoTCrawler
 
Information Security Analytics
Information Security AnalyticsInformation Security Analytics
Information Security Analytics
 
Data Analytics for IoT
Data Analytics for IoT Data Analytics for IoT
Data Analytics for IoT
 
Training of Python scikit-learn models on Azure
Training of Python scikit-learn models on AzureTraining of Python scikit-learn models on Azure
Training of Python scikit-learn models on Azure
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
 

Recently uploaded

+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 

Recently uploaded (20)

Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 

Knowledge Discovery in Production

  • 1. Knowledge Discovery in Production André Karpištšenko
  • 2.
  • 3. Knowledge Discovery Requires Automation Growth of information and devices per knowledge worker 1. Digital universe x3.8 in size in 2020. Focus on the highest-value subset.* 2. 26.3B devices in 2020, up +61% from 2015 with x2.7 IP traffic increase.** 3. 700M knowledge workers***, automation worth $5.2T to $6.7T**** * IDC, Apr 2014 ** Cisco, Jun 2016 *** Teleport.org, Jun 2016 **** McKinsey, Jun 2016
  • 4. Core Dataflow Model Engine Preprocessing Dataflow System Composition: Networked Intelligence Mature Nascent Emerging networked.ai Infrastructure, Data & IoT Platforms, Advanced Analytics Platforms Input Data Info Merger Data Curator Preparer & Explorer Base Library SelectorExecutor Self-improvementInterpreter Output Interfaces Core Human Interfaces Knowledge Manager Knowledge Manager
  • 5. Predictive Modeling Flow Example DashOpt Feature Engineering Raw Data Raw Features Labels Feature Integration Features with Labels Data Partitioning Training Data Validation Data Testing Data Model Training Evaluate for model selection Compute offline evaluation metrics Best model Offline scoring and indexing Online/offline systems Online A/B test Label preparation Log data Scoring features Raw features Feature integrationModel Performance Test Results
  • 6. Applications in Production Electronics Manufacturing Biotechnology Process time reduction Predictive maintenance Quality improvement Yield increase
  • 8. Preprocessing data for manufacturing analytics is complex and time consuming. Custom built preprocessing solutions are used to gather data in electronics manufacturing. The problem How do people solve it today
  • 9. Product Scope Data-driven electronics manufacturing enabling understanding and prediction • Heavy machinery • Automotive • Consumer Devices & Networks • Drives • PLC
  • 10. Product for Pilot Factories
  • 11. Product Solution • Hybrid SaaS factory subscriptions and applications via open marketplace • Real-time data streams from the field and factories for R&D and production Electronics Factories End Products IoT Platforms Cloud Services
  • 12. Delivering Business Value Enabled metrics data Increased engagement 2x Enhanced usability of MES Increased productivity Test time reduction 270k-290kEUR/plant Reducing risk through higher quality data and improving business with data preprocessing
  • 13. Industrial Analytics Example: Bosch Competition, I 4 product lines 52 stations Every feature has timestamp Data rows Parts of mechanical components # (training data) – 1 183 747 # (test data) – 1 183 748 Data columns Anonymized features of stations Numeric – 970 Categorical – 2 141 Bosch has to ensure that the recipes for the production of its advanced mechanical components are of the highest quality and safety standards. Part of doing so is closely monitoring its parts as they progress through the manufacturing processes. https://www.kaggle.com/
  • 14. (Dis%nct)pa,erns)of)missing)values)of)all)sta%ons))) Utilization of stations Industrial Analytics Example: Bosch Competition, II ProductFamilies
  • 15. https://sites.google.com/site/iotminingtutorial/ IoT Data Streams Mining • Continuous data, dynamic models, distributed, few seconds
  • 16. Streams Mining: Actors Model Data processing pipeline Distributed processing Kappa Architecture https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
  • 17. DashOpt: Data Science Intelligence
  • 18. Real-Time Predictive Flow ML & Simulation Platforms IoT Platforms Preprocessed Data IoT Data Earth Data Manufacturing Data Predictive Models Decision Tree SVM Neural Network Random Forest Data 
 Science
 Intelligence
  • 19. Outlier Detection • Single point anomaly detection: likelihood over distribution • Finding anomalous groups: divergence estimation • Methods: percentage change, T-test, Chi-square test, Generalized ESD (Extreme Studentized Deviate) test, Seasonal Hybrid ESD, etc. • Goal: move from detection to automated response
  • 20. Outlier Detection in Practice • Too many detections of too little value • Use methods for thresholds • Breakout detection and Concept Drift • For changing distributions move baselines over time • Risk of overfitting to known anomalies, not finding unknown anomalies
  • 21. Bayesian aka Active Optimization • Examples: Design of Experiments, hyper-parameters of supervised learning, algorithms tested with simulations f is an unknown expensive black-box function with the goal to approximately optimize f with as few experiments as possible • No free lunch theorem • Other bio-inspired algorithms for optimization exploitation and exploration: neural networks, genetic algorithms, swarm intelligence, ant colony optimisation, etc.
  • 22. Bayesian Optimization in Practice • SigOpt experience: 20 dimensions, above human capacity. • Uber ATC experience: scaling active optimization to high dimensions default works reliably for 5-7 dim. • Variables are added during optimization. • Choose fidelity using heuristics.
  • 23. DashOpt: Data Science Intelligence US Patent pending
  • 24. Extensive data bases of DNA sequences, metabolism of cells and components – enzymes etc., high-throughput experimental omics- methods Software environment for in silico ab initio design of cells, and in silico testing (predictive modeling) of the cell designs in manufacturing processes Current State in Biotech Already available Future state
  • 25. Thinking about Value from Data Science