SlideShare ist ein Scribd-Unternehmen logo
1 von 42
© 2019 All rights reserved.Schaffhausen Institute of Technology
Mauro Pezzè,
Schaffhausen Institute of Technology
Self-healing
cloud systems
© 2019 All rights reserved.Schaffhausen Institute of Technology
Cloud in finance
2
The cloud is transforming the banking
industry as banks adopt cloud solutions to help
deliver against increasing customer expectations.
“The cloud and emerging technologies such as AI
and machine learning serve as both a catalyst and
a reason to change for the financial industry,”
Financial industry adopts cloud solutions
IBM Expert Advice April 2019
© 2019 All rights reserved.Schaffhausen Institute of Technology 3
runtime failures
© 2019 All rights reserved.Schaffhausen Institute of Technology 4
Unavoidable
© 2019 All rights reserved.Schaffhausen Institute of Technology 5
Expensive
10 hours average downtime per year IWGCR
1.25B$—2.5B$
total cost of unplanned application
downtime per year
fortune
.5M$—1M$
average cost of a critical app
failure per hour
IDC
© 2019 All rights reserved.Schaffhausen Institute of Technology
Finance software is not bug free 1..
6
Less then a week into 2016, HSBC become the first bank
to suffer a major IT outage. Millions of the bank’s
costumers were unable to access online accounts.
Services only returned to normal after a two-day outage.
The bank’s chief operating officer Jack Hackett blamed a
’complex technical issue’ with its internal systems.
© 2019 All rights reserved.Schaffhausen Institute of Technology
Finance software is not bug free ..2..
7
In August 2015 a reported 275,000 individual payments
failed to be processed by HSBC, which left many
potentially without pay before the Bank Holiday weekend.
The cause of this major failure was a problem with its
electronic payment system for its business banking users
which affected salary payments.
© 2019 All rights reserved.Schaffhausen Institute of Technology
Finance software is not bug free ..3..
8
In April 2015, Blomberg’s London office suffered a software
glitch resulting in their trading terminals going down for two
hours.
In a statement Bloomberg said: “Service has been fully
restored. We experienced a combination of hardware and
software failures in the network, which caused an
excessive volume of network traffic.”
© 2019 All rights reserved.Schaffhausen Institute of Technology
Finance software is not bug free ..4..
9
In June 2015 about 600,000 payments failed to enter the
accounts of RBS overnight — including wages and benefit
payments. Many took several days to come through. The
bank chief officer said a ‘technology fault meant we could
not ingest a file from a third-party provider”….
In 2012 6.5 million RBS customers experiences an outage
due to batch scheduling software, a glitch for which the
bank was subsequently fined 56 million pounds.
© 2019 All rights reserved.Schaffhausen Institute of Technology
Self-healing
(cloud) systems
Preventing
Tolerating
Removing
By
Predicting failures
Locating bugs
Working around failures
Fixing bugs
Failures
10
© 2019 All rights reserved.Schaffhausen Institute of Technology
State-of-the-art (Cloud)
healing solutions
Monitoring tools:
• Kube-state metrics
• metrics-server
• Envoy
• Helm charts
Self-healing tools:
• Liveliness/Readiness probes
• Health indicators
• Pod phase, probe, restart
• …
Limitations
performance interference
no knowledge of system status
no knowledge of applications
Tools:
• Monasca: monitoring
• Aodh: alarming
• Congress: policy-based governance
• Mistral: workflow
• Senlin: clustering service
• Vitrage: root cause analysis
• Watcher: optimisation
• Masakari compute healing advice
• Freezer-dr: compute healing advice
• Doctor: fault management
• Fault Genes Working Group: fault classification and recovery strategy
• Craton: fleet management
Features
monitoring
hardware/system recovery
Pod recovery
11
© 2019 All rights reserved.Schaffhausen Institute of Technology
STAR moving on
from to
12
Limitations
Performance interference
No knowledge of system status
No knowledge of applications
Features
Limited performance interference
Knowledge of application composition
Holistic hierarchical system view
STAR
© 2019 All rights reserved.Schaffhausen Institute of Technology
Normal state timeError state
Failure Prediction
Fault
activation
Healing
Failure Alert Faulty
component
Failure
Localisation
13
STAR
© 2019 All rights reserved.Schaffhausen Institute of Technology
SystemSensor Actuator
Fault Localizer HealerFailure Predictor
monitor
14
STAR
© 2019 All rights reserved.Schaffhausen Institute of Technology
Linux Server
Openstack
Clearwater
cross-layer partial monitoring
with built-in facilities
15
(Cloud) Monitors
STAR
© 2019 All rights reserved.Schaffhausen Institute of Technology
Failure predictor 1.0
Data analytics
Machine learning
STAR
© 2019 All rights reserved.Schaffhausen Institute of Technology
Failure Alerts
SystemSensor Actuator
Anomalies
Anomaly
Classifier
Anomaly
Detector
Failure Type
Fault
Location
Monitored
KPIs
Fault Localizer HealerFailure Predictor
KPI2 (Packets Received, R7)
Data
Analytics
Failure
predictor 1.0
17
STAR
© 2019 All rights reserved.Schaffhausen Institute of Technology
Failure Alerts
SystemSensor Actuator
Anomalies
Anomaly
Classifier
Anomaly
Detector
Failure Type
Fault
Location
Monitored
KPIs
Fault Localizer HealerFailure Predictor
ℎ1
(1)
ℎ2
(1)
ℎ3
(1)
ℎ4
(1)
ℎ5
(1)
ℎ6
(1)
ℎ7
(1)
ℎ8
(1)
ℎ1
(2)
ℎ2
(2)
ℎ3
(2)
ℎ4
(2)
ℎ5
(2)
ℎ6
(2)
ℎ1
(3)
ℎ2
(3)
ℎ3
(3)
ℎ1
(4)
ℎ2
(4)
ℎ3
(4)
ℎ4
(4)
ℎ5
(4)
ℎ6
(4)
ℎ1
(5)
ℎ2
(5)
ℎ3
(5)
ℎ4
(5)
ℎ5
(5)
ℎ6
(5)
ℎ7
(5)
ℎ8
(5)
𝑣1
𝑣2
𝑣3
𝑣4
𝑣5
𝑣6
𝑣7
𝑣
̂
8
𝑣9
𝑣10
𝑣
̂
1
𝑣
̂
2
𝑣
̂
3
𝑣
̂
4
𝑣
̂
5
𝑣
̂
6
𝑣
̂
7
𝑣
̂
9
𝑣
̂
10
𝑣8
Deep Autoencoder
18
Failure
predictor 1.0STAR
© 2019 All rights reserved.Schaffhausen Institute of Technology
Failure Alerts
SystemSensor Actuator
Anomalies
Anomaly
Classifier
Anomaly
Detector
Failure Type
Fault
Location
Monitored
KPIs
Fault Localizer HealerFailure Predictor
spurious anomalous KPI
failure-prone anomalous KPI
19
Failure
predictor 1.0STAR
© 2019 All rights reserved.Schaffhausen Institute of Technology
Failure predictor 1.0
Precise failure prediction and fault
localization but (extensive) training
with seeded faults
STAR
© 2019 All rights reserved.Schaffhausen Institute of Technology
Failure predictor 2.0
Machine learning
STAR
© 2019 All rights reserved.Schaffhausen Institute of Technology
Failure
Alerts
SystemSensor Actuator
Anomalies
Anomaly
Classifier
Anomaly
Detector
Failure Type
Fault
Location
Monitored
KPIs
Fault Localizer HealerFailure Predictor
KPI1
KPI2
KPI3
KPI4
…
KPIn*m(t)
09:00
KPI1(t)
KPI2 (t)
KPI3 (t)
KPI4 (t)
…
KPIn*m(t)
KPI1(t)
KPI2 (t)
KPI3 (t)
KPI4 (t)
…
KPIn*m(t)
10:00 16:00
…
ONE-class Support Vector Machine with RBF kernel
22
Failure
predictor 2.0STAR
© 2019 All rights reserved.Schaffhausen Institute of Technology
Failure predictor 2.0
Precise failure prediction
NO fault localization with NO
Seeded faults but still
(extensive) training STAR
© 2019 All rights reserved.Schaffhausen Institute of Technology
Failure predictor 3.0
Energy based models
Deep Learning
STAR
© 2019 All rights reserved.Schaffhausen Institute of Technology
SystemSensor Actuator
Free Energy
Calculator Failure Alerts
Fault Localizer HealerFailure Predictor
monitored KPIs
25
Failure
predictor 3.0STAR
© 2019 All rights reserved.Schaffhausen Institute of Technology
Free Energy
Gtrain(t)
Baseline model with normal data
v h
KPIs
Time
KPI1 … KPIn*m
(M1, R1) … (Mn, Rm)
5’ 2500.00 … 4645.33
10’ 2500.00 … 3833.20
15’ 2500.00 … 3981.20
20’ … … …
26
Failure
predictor 3.0STAR
© 2019 All rights reserved.Schaffhausen Institute of Technology
Free Energy
Gfaulty(t)
Predicting failures in error state
Faulty Data
Time
KPI1 … KPIn*m
(M1, R1) … (Mn, Rm)
5’ 2500.00 … 4645.33
10’ 2776.47 … 3833.20
15’ 2776.47 … 3981.20
20’ … … …
v h
27
Failure
predictor 3.0STAR
© 2019 All rights reserved.Schaffhausen Institute of Technology
Precision Recall
95.64% 99.98%
28
Failure
predictor 3.0STAR
Performance
training time ~ 24 seconds
16 GB RAM laptop
3840 NVIDIA CUDA cores
input size: 350 KPIs
© 2019 All rights reserved.Schaffhausen Institute of Technology
Failure predictor 3.0
Precise failure prediction
NO fault localization
Negligible overhead
Online incremental training STAR
© 2019 All rights reserved.Schaffhausen Institute of Technology
Fault Localizer
KPI ranking
STAR
© 2019 All rights reserved.Schaffhausen Institute of Technology
CloudSensor Actuator
Anomalies
Anomaly
Classifier
Anomaly
Detector
Failure
Alerts
graphs
Graph
Generator
Graph
Ranker
Fault Localizer HealerFailure Predictor
(Retransmitted Packets, VM)
(Retransmitted Packets, Server)
(Db latency, Server)
(Memory Usage, Server)
/
(# of Connections, Server)
/
(# of Processes, Server)
node: KPI = (M, R)
edge: KPIi → KPIj
Granger causality
with probability wij
node: KPI = (M, R)
edge: KPIi → KPIj
Granger causality
with probability wij
(Retransmitted Packets, VM)
(Retransmitted Packets, Server)
(Db latency, Server)
(Memory Usage, Server)
/
(# of Connections, Server)
/
(# of Processes, Server)
09:00
Ranking
(M1, R1)
(M70, R5)
(M15, R5)
(M7, R5)
10:00 15:40
Failure Alert
31
Fault
LocalizerSTAR
© 2019 All rights reserved.Schaffhausen Institute of Technology
Fault
LocalizerSTAR
CloudSensor Actuator
Anomalies
Anomaly
Classifier
Anomaly
Detector
Failure
Alerts
graphs
Graph
Generator
Graph
Ranker
Fault
Localization
Fault Localizer HealerFailure Predictor
Fault
Injection
32
© 2019 All rights reserved.Schaffhausen Institute of Technology
Fault Localizer
Precise localisation
No training
No overhead
STAR
© 2019 All rights reserved.Schaffhausen Institute of Technology
Healer
NLP (and more)
for learning automatic
workarounds
STAR
© 2019 All rights reserved.Schaffhausen Institute of Technology
SystemSensor Actuator
0
.
20
.
2
0
.
3
0
.
4
0
.
5Anomalies
Anomaly
Classifier
Anomaly
Detector
Failure Alerts
graphs
Graph
Generator
Graph
Ranker
Fault Localizer HealerFailure Predictor
Automatic
workaround
generator
automatic workarounds
natural language annotations
35
Healer
STAR
danger
threat
search
found
a thread is found search from dangerFROM TO
word embedding and word mover distance
Contextual
NLP
© 2019 All rights reserved.Schaffhausen Institute of Technology
Healer
NLP to automatically identify
workarounds
STAR
© 2019 All rights reserved.Schaffhausen Institute of Technology
SystemSensor Actuator
Fault Localizer HealerFailure Predictor
monitor
extensive experience with
data analytics
machine learning
deep learning
excellent results
on large scale industrial systems
for
packet loss/corruption/latency
CPU hogs
memory leaks
excessive workload
FAILURE PREDICTION
The star
approachSTAR
© 2019 All rights reserved.Schaffhausen Institute of Technology
SystemSensor Actuator
Fault Localizer HealerFailure Predictor
monitor
FAULT LOCALIZER
extensive experience with
machine learning
KPI ranking
excellent results
on large scale industrial
systems
for
packet
loss/corruption/latency
CPU hogs
memory leaks
excessive workload
The star
approachSTAR
© 2019 All rights reserved.Schaffhausen Institute of Technology
SystemSensor Actuator
Fault Localizer HealerFailure Predictor
monitor
HEALER
experience
NLP (Natural Language Processing)
to identify automatic workarounds
excellent results
on small scale systems
The star
approachSTAR
© 2019 All rights reserved.Schaffhausen Institute of Technology
Plans
40
From classic cloud to highly dynamic cloud configurations
(Microservices and Kubernetes)
• Predictor 3.0 — deep learning
• Dynamically devolving system models
From functional and performance issues to cybersecurity breaches
• Empirical studies on cybersecurity breaches
From simple to pervasive automatic workarounds
• NLP on pervasive contradicting annotations
• Image and video processing
© 2019 All rights reserved.Schaffhausen Institute of Technology 41
SIT research today
Two research chairs in software
engineering / verification / security
Bertrand
Meyer
SIT Professor of
Software Engineering
and Provost
Mauro
Pezze
SIT Professor of
Software Quality and
Cybersecurity
Software Quality and Cybersecurity
SQCProgram
Environment
People
Reliability &
Protection
Outputs
• Software
• Papers
• PhD theses
• Patents
• Technology transfer
© 2019 All rights reserved.Schaffhausen Institute of Technology
Come join us!
UNIVERSITY • RESEARCH • TECHPARK • ECOSYSTEM • R&D CENTERS • STARTUPS

Weitere ähnliche Inhalte

Ähnlich wie Mauro Pezzè - Self-healing cloud systems at SIT Insights in Technology 2019

Sanofi’s Journey to Service Resolution
Sanofi’s Journey to Service ResolutionSanofi’s Journey to Service Resolution
Sanofi’s Journey to Service ResolutionBMC Software
 
AWS 預測性維護與智慧物聯應用
AWS 預測性維護與智慧物聯應用AWS 預測性維護與智慧物聯應用
AWS 預測性維護與智慧物聯應用Amazon Web Services
 
Abenteuer bei Monitoring und Troubleshooting
Abenteuer bei Monitoring und TroubleshootingAbenteuer bei Monitoring und Troubleshooting
Abenteuer bei Monitoring und TroubleshootingSplunk
 
Data Science Powered Apps for Internet of Things
Data Science Powered Apps for Internet of ThingsData Science Powered Apps for Internet of Things
Data Science Powered Apps for Internet of ThingsVMware Tanzu
 
Learn How to Operationalize IoT Apps on Pivotal Cloud Foundry
Learn How to Operationalize IoT Apps on Pivotal Cloud FoundryLearn How to Operationalize IoT Apps on Pivotal Cloud Foundry
Learn How to Operationalize IoT Apps on Pivotal Cloud FoundryVMware Tanzu
 
CONNECT DETECT ACT Internet of Things 40 Systems
CONNECT DETECT ACT Internet of Things 40 Systems CONNECT DETECT ACT Internet of Things 40 Systems
CONNECT DETECT ACT Internet of Things 40 Systems Michael Klemen
 
CA Continuous Application Insight: Discovery, Insight, Automation for Paralle...
CA Continuous Application Insight: Discovery, Insight, Automation for Paralle...CA Continuous Application Insight: Discovery, Insight, Automation for Paralle...
CA Continuous Application Insight: Discovery, Insight, Automation for Paralle...CA Technologies
 
Microservices: The Future-Proof Framework for IoT
Microservices: The Future-Proof Framework for IoTMicroservices: The Future-Proof Framework for IoT
Microservices: The Future-Proof Framework for IoTCapgemini
 
A Diet of Poisoned Fruit: Designing Implants & OT Payloads for ICS Embedded D...
A Diet of Poisoned Fruit: Designing Implants & OT Payloadsfor ICS Embedded D...A Diet of Poisoned Fruit: Designing Implants & OT Payloadsfor ICS Embedded D...
A Diet of Poisoned Fruit: Designing Implants & OT Payloads for ICS Embedded D...Marina Krotofil
 
Adventures in Monitoring and Troubleshooting
Adventures in Monitoring and Troubleshooting Adventures in Monitoring and Troubleshooting
Adventures in Monitoring and Troubleshooting Splunk
 
Adventures in Monitoring and Troubleshooting
Adventures in Monitoring and Troubleshooting Adventures in Monitoring and Troubleshooting
Adventures in Monitoring and Troubleshooting Splunk
 
Blueplanet Inventory & Federation - Alvaro Osle, blueplanet
Blueplanet Inventory & Federation - Alvaro Osle, blueplanetBlueplanet Inventory & Federation - Alvaro Osle, blueplanet
Blueplanet Inventory & Federation - Alvaro Osle, blueplanetNeo4j
 
AMI Global Award Write Up
AMI Global Award Write UpAMI Global Award Write Up
AMI Global Award Write UpClaudia Toscano
 
Cast cloud april_2019
Cast cloud april_2019Cast cloud april_2019
Cast cloud april_2019SPIN Chennai
 
Leveraging the Power of the ServiceNow® Platform with Mainframe and IBM i Sys...
Leveraging the Power of the ServiceNow® Platform with Mainframe and IBM i Sys...Leveraging the Power of the ServiceNow® Platform with Mainframe and IBM i Sys...
Leveraging the Power of the ServiceNow® Platform with Mainframe and IBM i Sys...Precisely
 
Open Banking and the Realization of Banking-as-a-Service
Open Banking and the Realization of Banking-as-a-ServiceOpen Banking and the Realization of Banking-as-a-Service
Open Banking and the Realization of Banking-as-a-ServiceKyriba Corporation
 
13.spime senselabs
13.spime senselabs13.spime senselabs
13.spime senselabsEITESANGO
 
Importance of APIs and their Management in Digitalisation Initiatives
Importance of APIs and their Management in Digitalisation InitiativesImportance of APIs and their Management in Digitalisation Initiatives
Importance of APIs and their Management in Digitalisation InitiativesSEEBURGER
 
Predictive, Proactive, and Collaborative ML with iT Service Intelligence
Predictive, Proactive, and Collaborative ML with iT Service Intelligence Predictive, Proactive, and Collaborative ML with iT Service Intelligence
Predictive, Proactive, and Collaborative ML with iT Service Intelligence Splunk
 
Predictive, Proactive, and Collaborative ML with iT Service Intelligence
Predictive, Proactive, and Collaborative ML with iT Service Intelligence Predictive, Proactive, and Collaborative ML with iT Service Intelligence
Predictive, Proactive, and Collaborative ML with iT Service Intelligence Splunk
 

Ähnlich wie Mauro Pezzè - Self-healing cloud systems at SIT Insights in Technology 2019 (20)

Sanofi’s Journey to Service Resolution
Sanofi’s Journey to Service ResolutionSanofi’s Journey to Service Resolution
Sanofi’s Journey to Service Resolution
 
AWS 預測性維護與智慧物聯應用
AWS 預測性維護與智慧物聯應用AWS 預測性維護與智慧物聯應用
AWS 預測性維護與智慧物聯應用
 
Abenteuer bei Monitoring und Troubleshooting
Abenteuer bei Monitoring und TroubleshootingAbenteuer bei Monitoring und Troubleshooting
Abenteuer bei Monitoring und Troubleshooting
 
Data Science Powered Apps for Internet of Things
Data Science Powered Apps for Internet of ThingsData Science Powered Apps for Internet of Things
Data Science Powered Apps for Internet of Things
 
Learn How to Operationalize IoT Apps on Pivotal Cloud Foundry
Learn How to Operationalize IoT Apps on Pivotal Cloud FoundryLearn How to Operationalize IoT Apps on Pivotal Cloud Foundry
Learn How to Operationalize IoT Apps on Pivotal Cloud Foundry
 
CONNECT DETECT ACT Internet of Things 40 Systems
CONNECT DETECT ACT Internet of Things 40 Systems CONNECT DETECT ACT Internet of Things 40 Systems
CONNECT DETECT ACT Internet of Things 40 Systems
 
CA Continuous Application Insight: Discovery, Insight, Automation for Paralle...
CA Continuous Application Insight: Discovery, Insight, Automation for Paralle...CA Continuous Application Insight: Discovery, Insight, Automation for Paralle...
CA Continuous Application Insight: Discovery, Insight, Automation for Paralle...
 
Microservices: The Future-Proof Framework for IoT
Microservices: The Future-Proof Framework for IoTMicroservices: The Future-Proof Framework for IoT
Microservices: The Future-Proof Framework for IoT
 
A Diet of Poisoned Fruit: Designing Implants & OT Payloads for ICS Embedded D...
A Diet of Poisoned Fruit: Designing Implants & OT Payloadsfor ICS Embedded D...A Diet of Poisoned Fruit: Designing Implants & OT Payloadsfor ICS Embedded D...
A Diet of Poisoned Fruit: Designing Implants & OT Payloads for ICS Embedded D...
 
Adventures in Monitoring and Troubleshooting
Adventures in Monitoring and Troubleshooting Adventures in Monitoring and Troubleshooting
Adventures in Monitoring and Troubleshooting
 
Adventures in Monitoring and Troubleshooting
Adventures in Monitoring and Troubleshooting Adventures in Monitoring and Troubleshooting
Adventures in Monitoring and Troubleshooting
 
Blueplanet Inventory & Federation - Alvaro Osle, blueplanet
Blueplanet Inventory & Federation - Alvaro Osle, blueplanetBlueplanet Inventory & Federation - Alvaro Osle, blueplanet
Blueplanet Inventory & Federation - Alvaro Osle, blueplanet
 
AMI Global Award Write Up
AMI Global Award Write UpAMI Global Award Write Up
AMI Global Award Write Up
 
Cast cloud april_2019
Cast cloud april_2019Cast cloud april_2019
Cast cloud april_2019
 
Leveraging the Power of the ServiceNow® Platform with Mainframe and IBM i Sys...
Leveraging the Power of the ServiceNow® Platform with Mainframe and IBM i Sys...Leveraging the Power of the ServiceNow® Platform with Mainframe and IBM i Sys...
Leveraging the Power of the ServiceNow® Platform with Mainframe and IBM i Sys...
 
Open Banking and the Realization of Banking-as-a-Service
Open Banking and the Realization of Banking-as-a-ServiceOpen Banking and the Realization of Banking-as-a-Service
Open Banking and the Realization of Banking-as-a-Service
 
13.spime senselabs
13.spime senselabs13.spime senselabs
13.spime senselabs
 
Importance of APIs and their Management in Digitalisation Initiatives
Importance of APIs and their Management in Digitalisation InitiativesImportance of APIs and their Management in Digitalisation Initiatives
Importance of APIs and their Management in Digitalisation Initiatives
 
Predictive, Proactive, and Collaborative ML with iT Service Intelligence
Predictive, Proactive, and Collaborative ML with iT Service Intelligence Predictive, Proactive, and Collaborative ML with iT Service Intelligence
Predictive, Proactive, and Collaborative ML with iT Service Intelligence
 
Predictive, Proactive, and Collaborative ML with iT Service Intelligence
Predictive, Proactive, and Collaborative ML with iT Service Intelligence Predictive, Proactive, and Collaborative ML with iT Service Intelligence
Predictive, Proactive, and Collaborative ML with iT Service Intelligence
 

Mehr von Schaffhausen Institute of Technology

Mauro Pezze - Smart eco-systems impact for the sit master programs
Mauro Pezze - Smart eco-systems impact for the sit master programsMauro Pezze - Smart eco-systems impact for the sit master programs
Mauro Pezze - Smart eco-systems impact for the sit master programsSchaffhausen Institute of Technology
 
David M. Saunders - Digital Transformation at SIT Insights in Technology 2019
David M. Saunders - Digital Transformation at SIT Insights in Technology 2019David M. Saunders - Digital Transformation at SIT Insights in Technology 2019
David M. Saunders - Digital Transformation at SIT Insights in Technology 2019Schaffhausen Institute of Technology
 
Bertrand Meyer - Challenges in computing research at SIT Insights in Technolo...
Bertrand Meyer - Challenges in computing research at SIT Insights in Technolo...Bertrand Meyer - Challenges in computing research at SIT Insights in Technolo...
Bertrand Meyer - Challenges in computing research at SIT Insights in Technolo...Schaffhausen Institute of Technology
 
Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019
Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019
Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019Schaffhausen Institute of Technology
 
Bertrand Meyer - Challenges in computing research at SIT Insights in Technolo...
Bertrand Meyer - Challenges in computing research at SIT Insights in Technolo...Bertrand Meyer - Challenges in computing research at SIT Insights in Technolo...
Bertrand Meyer - Challenges in computing research at SIT Insights in Technolo...Schaffhausen Institute of Technology
 
Wolfgang Ketterle - Quantum Computing, Science & Engineering at SIT Insights...
Wolfgang Ketterle - Quantum Computing, Science & Engineering at SIT Insights...Wolfgang Ketterle - Quantum Computing, Science & Engineering at SIT Insights...
Wolfgang Ketterle - Quantum Computing, Science & Engineering at SIT Insights...Schaffhausen Institute of Technology
 
Wolfgang Ketterle - Kälter als kalt: Forschung am absoluten Nullpunkt
Wolfgang Ketterle - Kälter als kalt: Forschung am absoluten NullpunktWolfgang Ketterle - Kälter als kalt: Forschung am absoluten Nullpunkt
Wolfgang Ketterle - Kälter als kalt: Forschung am absoluten NullpunktSchaffhausen Institute of Technology
 
Wolfgang Ketterle - What happened to the kilogram at SIT Insights in Technolo...
Wolfgang Ketterle - What happened tothe kilogram at SIT Insights in Technolo...Wolfgang Ketterle - What happened tothe kilogram at SIT Insights in Technolo...
Wolfgang Ketterle - What happened to the kilogram at SIT Insights in Technolo...Schaffhausen Institute of Technology
 
Dalith Steiger - Why should we liquefy our data at SIT Insights in Technology...
Dalith Steiger - Why should we liquefy our data at SIT Insights in Technology...Dalith Steiger - Why should we liquefy our data at SIT Insights in Technology...
Dalith Steiger - Why should we liquefy our data at SIT Insights in Technology...Schaffhausen Institute of Technology
 
Christian Amsler - Schaffhausen at a glance at SIT Insights in Technology 201...
Christian Amsler - Schaffhausen at a glance at SIT Insights in Technology 201...Christian Amsler - Schaffhausen at a glance at SIT Insights in Technology 201...
Christian Amsler - Schaffhausen at a glance at SIT Insights in Technology 201...Schaffhausen Institute of Technology
 
Dmitri Baliev - Replacing agents with AI at SIT Insights in Technology 2019 S...
Dmitri Baliev - Replacing agents with AI at SIT Insights in Technology 2019 S...Dmitri Baliev - Replacing agents with AI at SIT Insights in Technology 2019 S...
Dmitri Baliev - Replacing agents with AI at SIT Insights in Technology 2019 S...Schaffhausen Institute of Technology
 

Mehr von Schaffhausen Institute of Technology (20)

Mauro Pezzé - Introduction of the master program
Mauro Pezzé - Introduction of the master programMauro Pezzé - Introduction of the master program
Mauro Pezzé - Introduction of the master program
 
Serguei Beloussov - Future of computing
Serguei Beloussov - Future of computingSerguei Beloussov - Future of computing
Serguei Beloussov - Future of computing
 
Mauro Pezze - 7 reasons to join SIT
Mauro Pezze - 7 reasons to join SITMauro Pezze - 7 reasons to join SIT
Mauro Pezze - 7 reasons to join SIT
 
Mauro Pezze - Smart eco-systems impact for the sit master programs
Mauro Pezze - Smart eco-systems impact for the sit master programsMauro Pezze - Smart eco-systems impact for the sit master programs
Mauro Pezze - Smart eco-systems impact for the sit master programs
 
Barry L. McManus - Cyber security concepts
Barry L. McManus - Cyber security conceptsBarry L. McManus - Cyber security concepts
Barry L. McManus - Cyber security concepts
 
Artur Ekert - Future of quantum information
Artur Ekert - Future of quantum informationArtur Ekert - Future of quantum information
Artur Ekert - Future of quantum information
 
Serguei Seloussov - Future of computing and SIT MSc program
Serguei Seloussov - Future of computing and SIT MSc programSerguei Seloussov - Future of computing and SIT MSc program
Serguei Seloussov - Future of computing and SIT MSc program
 
2020-04-29 SIT Insights in Technology - Guenther Dobrauz
2020-04-29 SIT Insights in Technology - Guenther Dobrauz2020-04-29 SIT Insights in Technology - Guenther Dobrauz
2020-04-29 SIT Insights in Technology - Guenther Dobrauz
 
2020-04-29 SIT Insights in Technology - Bertrand Meyer
2020-04-29 SIT Insights in Technology - Bertrand Meyer2020-04-29 SIT Insights in Technology - Bertrand Meyer
2020-04-29 SIT Insights in Technology - Bertrand Meyer
 
2020-04-29 SIT Insights in Technology - Serguei Beloussov
2020-04-29 SIT Insights in Technology - Serguei Beloussov2020-04-29 SIT Insights in Technology - Serguei Beloussov
2020-04-29 SIT Insights in Technology - Serguei Beloussov
 
David M. Saunders - Digital Transformation at SIT Insights in Technology 2019
David M. Saunders - Digital Transformation at SIT Insights in Technology 2019David M. Saunders - Digital Transformation at SIT Insights in Technology 2019
David M. Saunders - Digital Transformation at SIT Insights in Technology 2019
 
Bertrand Meyer - Challenges in computing research at SIT Insights in Technolo...
Bertrand Meyer - Challenges in computing research at SIT Insights in Technolo...Bertrand Meyer - Challenges in computing research at SIT Insights in Technolo...
Bertrand Meyer - Challenges in computing research at SIT Insights in Technolo...
 
Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019
Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019
Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019
 
Bertrand Meyer - Challenges in computing research at SIT Insights in Technolo...
Bertrand Meyer - Challenges in computing research at SIT Insights in Technolo...Bertrand Meyer - Challenges in computing research at SIT Insights in Technolo...
Bertrand Meyer - Challenges in computing research at SIT Insights in Technolo...
 
Wolfgang Ketterle - Quantum Computing, Science & Engineering at SIT Insights...
Wolfgang Ketterle - Quantum Computing, Science & Engineering at SIT Insights...Wolfgang Ketterle - Quantum Computing, Science & Engineering at SIT Insights...
Wolfgang Ketterle - Quantum Computing, Science & Engineering at SIT Insights...
 
Wolfgang Ketterle - Kälter als kalt: Forschung am absoluten Nullpunkt
Wolfgang Ketterle - Kälter als kalt: Forschung am absoluten NullpunktWolfgang Ketterle - Kälter als kalt: Forschung am absoluten Nullpunkt
Wolfgang Ketterle - Kälter als kalt: Forschung am absoluten Nullpunkt
 
Wolfgang Ketterle - What happened to the kilogram at SIT Insights in Technolo...
Wolfgang Ketterle - What happened tothe kilogram at SIT Insights in Technolo...Wolfgang Ketterle - What happened tothe kilogram at SIT Insights in Technolo...
Wolfgang Ketterle - What happened to the kilogram at SIT Insights in Technolo...
 
Dalith Steiger - Why should we liquefy our data at SIT Insights in Technology...
Dalith Steiger - Why should we liquefy our data at SIT Insights in Technology...Dalith Steiger - Why should we liquefy our data at SIT Insights in Technology...
Dalith Steiger - Why should we liquefy our data at SIT Insights in Technology...
 
Christian Amsler - Schaffhausen at a glance at SIT Insights in Technology 201...
Christian Amsler - Schaffhausen at a glance at SIT Insights in Technology 201...Christian Amsler - Schaffhausen at a glance at SIT Insights in Technology 201...
Christian Amsler - Schaffhausen at a glance at SIT Insights in Technology 201...
 
Dmitri Baliev - Replacing agents with AI at SIT Insights in Technology 2019 S...
Dmitri Baliev - Replacing agents with AI at SIT Insights in Technology 2019 S...Dmitri Baliev - Replacing agents with AI at SIT Insights in Technology 2019 S...
Dmitri Baliev - Replacing agents with AI at SIT Insights in Technology 2019 S...
 

Kürzlich hochgeladen

Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 

Kürzlich hochgeladen (20)

Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 

Mauro Pezzè - Self-healing cloud systems at SIT Insights in Technology 2019

  • 1. © 2019 All rights reserved.Schaffhausen Institute of Technology Mauro Pezzè, Schaffhausen Institute of Technology Self-healing cloud systems
  • 2. © 2019 All rights reserved.Schaffhausen Institute of Technology Cloud in finance 2 The cloud is transforming the banking industry as banks adopt cloud solutions to help deliver against increasing customer expectations. “The cloud and emerging technologies such as AI and machine learning serve as both a catalyst and a reason to change for the financial industry,” Financial industry adopts cloud solutions IBM Expert Advice April 2019
  • 3. © 2019 All rights reserved.Schaffhausen Institute of Technology 3 runtime failures
  • 4. © 2019 All rights reserved.Schaffhausen Institute of Technology 4 Unavoidable
  • 5. © 2019 All rights reserved.Schaffhausen Institute of Technology 5 Expensive 10 hours average downtime per year IWGCR 1.25B$—2.5B$ total cost of unplanned application downtime per year fortune .5M$—1M$ average cost of a critical app failure per hour IDC
  • 6. © 2019 All rights reserved.Schaffhausen Institute of Technology Finance software is not bug free 1.. 6 Less then a week into 2016, HSBC become the first bank to suffer a major IT outage. Millions of the bank’s costumers were unable to access online accounts. Services only returned to normal after a two-day outage. The bank’s chief operating officer Jack Hackett blamed a ’complex technical issue’ with its internal systems.
  • 7. © 2019 All rights reserved.Schaffhausen Institute of Technology Finance software is not bug free ..2.. 7 In August 2015 a reported 275,000 individual payments failed to be processed by HSBC, which left many potentially without pay before the Bank Holiday weekend. The cause of this major failure was a problem with its electronic payment system for its business banking users which affected salary payments.
  • 8. © 2019 All rights reserved.Schaffhausen Institute of Technology Finance software is not bug free ..3.. 8 In April 2015, Blomberg’s London office suffered a software glitch resulting in their trading terminals going down for two hours. In a statement Bloomberg said: “Service has been fully restored. We experienced a combination of hardware and software failures in the network, which caused an excessive volume of network traffic.”
  • 9. © 2019 All rights reserved.Schaffhausen Institute of Technology Finance software is not bug free ..4.. 9 In June 2015 about 600,000 payments failed to enter the accounts of RBS overnight — including wages and benefit payments. Many took several days to come through. The bank chief officer said a ‘technology fault meant we could not ingest a file from a third-party provider”…. In 2012 6.5 million RBS customers experiences an outage due to batch scheduling software, a glitch for which the bank was subsequently fined 56 million pounds.
  • 10. © 2019 All rights reserved.Schaffhausen Institute of Technology Self-healing (cloud) systems Preventing Tolerating Removing By Predicting failures Locating bugs Working around failures Fixing bugs Failures 10
  • 11. © 2019 All rights reserved.Schaffhausen Institute of Technology State-of-the-art (Cloud) healing solutions Monitoring tools: • Kube-state metrics • metrics-server • Envoy • Helm charts Self-healing tools: • Liveliness/Readiness probes • Health indicators • Pod phase, probe, restart • … Limitations performance interference no knowledge of system status no knowledge of applications Tools: • Monasca: monitoring • Aodh: alarming • Congress: policy-based governance • Mistral: workflow • Senlin: clustering service • Vitrage: root cause analysis • Watcher: optimisation • Masakari compute healing advice • Freezer-dr: compute healing advice • Doctor: fault management • Fault Genes Working Group: fault classification and recovery strategy • Craton: fleet management Features monitoring hardware/system recovery Pod recovery 11
  • 12. © 2019 All rights reserved.Schaffhausen Institute of Technology STAR moving on from to 12 Limitations Performance interference No knowledge of system status No knowledge of applications Features Limited performance interference Knowledge of application composition Holistic hierarchical system view STAR
  • 13. © 2019 All rights reserved.Schaffhausen Institute of Technology Normal state timeError state Failure Prediction Fault activation Healing Failure Alert Faulty component Failure Localisation 13 STAR
  • 14. © 2019 All rights reserved.Schaffhausen Institute of Technology SystemSensor Actuator Fault Localizer HealerFailure Predictor monitor 14 STAR
  • 15. © 2019 All rights reserved.Schaffhausen Institute of Technology Linux Server Openstack Clearwater cross-layer partial monitoring with built-in facilities 15 (Cloud) Monitors STAR
  • 16. © 2019 All rights reserved.Schaffhausen Institute of Technology Failure predictor 1.0 Data analytics Machine learning STAR
  • 17. © 2019 All rights reserved.Schaffhausen Institute of Technology Failure Alerts SystemSensor Actuator Anomalies Anomaly Classifier Anomaly Detector Failure Type Fault Location Monitored KPIs Fault Localizer HealerFailure Predictor KPI2 (Packets Received, R7) Data Analytics Failure predictor 1.0 17 STAR
  • 18. © 2019 All rights reserved.Schaffhausen Institute of Technology Failure Alerts SystemSensor Actuator Anomalies Anomaly Classifier Anomaly Detector Failure Type Fault Location Monitored KPIs Fault Localizer HealerFailure Predictor ℎ1 (1) ℎ2 (1) ℎ3 (1) ℎ4 (1) ℎ5 (1) ℎ6 (1) ℎ7 (1) ℎ8 (1) ℎ1 (2) ℎ2 (2) ℎ3 (2) ℎ4 (2) ℎ5 (2) ℎ6 (2) ℎ1 (3) ℎ2 (3) ℎ3 (3) ℎ1 (4) ℎ2 (4) ℎ3 (4) ℎ4 (4) ℎ5 (4) ℎ6 (4) ℎ1 (5) ℎ2 (5) ℎ3 (5) ℎ4 (5) ℎ5 (5) ℎ6 (5) ℎ7 (5) ℎ8 (5) 𝑣1 𝑣2 𝑣3 𝑣4 𝑣5 𝑣6 𝑣7 𝑣 ̂ 8 𝑣9 𝑣10 𝑣 ̂ 1 𝑣 ̂ 2 𝑣 ̂ 3 𝑣 ̂ 4 𝑣 ̂ 5 𝑣 ̂ 6 𝑣 ̂ 7 𝑣 ̂ 9 𝑣 ̂ 10 𝑣8 Deep Autoencoder 18 Failure predictor 1.0STAR
  • 19. © 2019 All rights reserved.Schaffhausen Institute of Technology Failure Alerts SystemSensor Actuator Anomalies Anomaly Classifier Anomaly Detector Failure Type Fault Location Monitored KPIs Fault Localizer HealerFailure Predictor spurious anomalous KPI failure-prone anomalous KPI 19 Failure predictor 1.0STAR
  • 20. © 2019 All rights reserved.Schaffhausen Institute of Technology Failure predictor 1.0 Precise failure prediction and fault localization but (extensive) training with seeded faults STAR
  • 21. © 2019 All rights reserved.Schaffhausen Institute of Technology Failure predictor 2.0 Machine learning STAR
  • 22. © 2019 All rights reserved.Schaffhausen Institute of Technology Failure Alerts SystemSensor Actuator Anomalies Anomaly Classifier Anomaly Detector Failure Type Fault Location Monitored KPIs Fault Localizer HealerFailure Predictor KPI1 KPI2 KPI3 KPI4 … KPIn*m(t) 09:00 KPI1(t) KPI2 (t) KPI3 (t) KPI4 (t) … KPIn*m(t) KPI1(t) KPI2 (t) KPI3 (t) KPI4 (t) … KPIn*m(t) 10:00 16:00 … ONE-class Support Vector Machine with RBF kernel 22 Failure predictor 2.0STAR
  • 23. © 2019 All rights reserved.Schaffhausen Institute of Technology Failure predictor 2.0 Precise failure prediction NO fault localization with NO Seeded faults but still (extensive) training STAR
  • 24. © 2019 All rights reserved.Schaffhausen Institute of Technology Failure predictor 3.0 Energy based models Deep Learning STAR
  • 25. © 2019 All rights reserved.Schaffhausen Institute of Technology SystemSensor Actuator Free Energy Calculator Failure Alerts Fault Localizer HealerFailure Predictor monitored KPIs 25 Failure predictor 3.0STAR
  • 26. © 2019 All rights reserved.Schaffhausen Institute of Technology Free Energy Gtrain(t) Baseline model with normal data v h KPIs Time KPI1 … KPIn*m (M1, R1) … (Mn, Rm) 5’ 2500.00 … 4645.33 10’ 2500.00 … 3833.20 15’ 2500.00 … 3981.20 20’ … … … 26 Failure predictor 3.0STAR
  • 27. © 2019 All rights reserved.Schaffhausen Institute of Technology Free Energy Gfaulty(t) Predicting failures in error state Faulty Data Time KPI1 … KPIn*m (M1, R1) … (Mn, Rm) 5’ 2500.00 … 4645.33 10’ 2776.47 … 3833.20 15’ 2776.47 … 3981.20 20’ … … … v h 27 Failure predictor 3.0STAR
  • 28. © 2019 All rights reserved.Schaffhausen Institute of Technology Precision Recall 95.64% 99.98% 28 Failure predictor 3.0STAR Performance training time ~ 24 seconds 16 GB RAM laptop 3840 NVIDIA CUDA cores input size: 350 KPIs
  • 29. © 2019 All rights reserved.Schaffhausen Institute of Technology Failure predictor 3.0 Precise failure prediction NO fault localization Negligible overhead Online incremental training STAR
  • 30. © 2019 All rights reserved.Schaffhausen Institute of Technology Fault Localizer KPI ranking STAR
  • 31. © 2019 All rights reserved.Schaffhausen Institute of Technology CloudSensor Actuator Anomalies Anomaly Classifier Anomaly Detector Failure Alerts graphs Graph Generator Graph Ranker Fault Localizer HealerFailure Predictor (Retransmitted Packets, VM) (Retransmitted Packets, Server) (Db latency, Server) (Memory Usage, Server) / (# of Connections, Server) / (# of Processes, Server) node: KPI = (M, R) edge: KPIi → KPIj Granger causality with probability wij node: KPI = (M, R) edge: KPIi → KPIj Granger causality with probability wij (Retransmitted Packets, VM) (Retransmitted Packets, Server) (Db latency, Server) (Memory Usage, Server) / (# of Connections, Server) / (# of Processes, Server) 09:00 Ranking (M1, R1) (M70, R5) (M15, R5) (M7, R5) 10:00 15:40 Failure Alert 31 Fault LocalizerSTAR
  • 32. © 2019 All rights reserved.Schaffhausen Institute of Technology Fault LocalizerSTAR CloudSensor Actuator Anomalies Anomaly Classifier Anomaly Detector Failure Alerts graphs Graph Generator Graph Ranker Fault Localization Fault Localizer HealerFailure Predictor Fault Injection 32
  • 33. © 2019 All rights reserved.Schaffhausen Institute of Technology Fault Localizer Precise localisation No training No overhead STAR
  • 34. © 2019 All rights reserved.Schaffhausen Institute of Technology Healer NLP (and more) for learning automatic workarounds STAR
  • 35. © 2019 All rights reserved.Schaffhausen Institute of Technology SystemSensor Actuator 0 . 20 . 2 0 . 3 0 . 4 0 . 5Anomalies Anomaly Classifier Anomaly Detector Failure Alerts graphs Graph Generator Graph Ranker Fault Localizer HealerFailure Predictor Automatic workaround generator automatic workarounds natural language annotations 35 Healer STAR danger threat search found a thread is found search from dangerFROM TO word embedding and word mover distance Contextual NLP
  • 36. © 2019 All rights reserved.Schaffhausen Institute of Technology Healer NLP to automatically identify workarounds STAR
  • 37. © 2019 All rights reserved.Schaffhausen Institute of Technology SystemSensor Actuator Fault Localizer HealerFailure Predictor monitor extensive experience with data analytics machine learning deep learning excellent results on large scale industrial systems for packet loss/corruption/latency CPU hogs memory leaks excessive workload FAILURE PREDICTION The star approachSTAR
  • 38. © 2019 All rights reserved.Schaffhausen Institute of Technology SystemSensor Actuator Fault Localizer HealerFailure Predictor monitor FAULT LOCALIZER extensive experience with machine learning KPI ranking excellent results on large scale industrial systems for packet loss/corruption/latency CPU hogs memory leaks excessive workload The star approachSTAR
  • 39. © 2019 All rights reserved.Schaffhausen Institute of Technology SystemSensor Actuator Fault Localizer HealerFailure Predictor monitor HEALER experience NLP (Natural Language Processing) to identify automatic workarounds excellent results on small scale systems The star approachSTAR
  • 40. © 2019 All rights reserved.Schaffhausen Institute of Technology Plans 40 From classic cloud to highly dynamic cloud configurations (Microservices and Kubernetes) • Predictor 3.0 — deep learning • Dynamically devolving system models From functional and performance issues to cybersecurity breaches • Empirical studies on cybersecurity breaches From simple to pervasive automatic workarounds • NLP on pervasive contradicting annotations • Image and video processing
  • 41. © 2019 All rights reserved.Schaffhausen Institute of Technology 41 SIT research today Two research chairs in software engineering / verification / security Bertrand Meyer SIT Professor of Software Engineering and Provost Mauro Pezze SIT Professor of Software Quality and Cybersecurity Software Quality and Cybersecurity SQCProgram Environment People Reliability & Protection Outputs • Software • Papers • PhD theses • Patents • Technology transfer
  • 42. © 2019 All rights reserved.Schaffhausen Institute of Technology Come join us! UNIVERSITY • RESEARCH • TECHPARK • ECOSYSTEM • R&D CENTERS • STARTUPS

Hinweis der Redaktion

  1. Building a model of the normal behavior for each application from collections of pods running the same application, by relying on fast deep learning techniques (Deep Belief Networks, Deep Convolutional Neural Networks) trained in a semi-supervised fashion, without relying on faulty data for training  Improving supervised learning techniques for performance deviation analysis, leveraging userbased SLA violation as labels for each application , eg. distribution of response times below a certain threshold  Analyzing distributions of response time at service level and exploit hypothesis testing and regression techniques to predict behavior and detect deviations from the norm. Salacia will implement fast algorithms based on standard machine learning techniques for fast and robust nonlinear regression.  Localising faults by analyzing the relation between the health status at application level and the application topology retrieved from weave scope as an adjacency list of containers, and issuing 11/30 fault alerts that indicate the culprit application and/or pod, by exploiting the information on the application topology.  Activating self-healing procedures, which will leverage self-healing functionalities of Kubernetes to implement self-healing actions on the pods that Salacia localises as responsible for the faulty behavior at application level.
  2. Building a model of the normal behavior for each application from collections of pods running the same application, by relying on fast deep learning techniques (Deep Belief Networks, Deep Convolutional Neural Networks) trained in a semi-supervised fashion, without relying on faulty data for training  Improving supervised learning techniques for performance deviation analysis, leveraging userbased SLA violation as labels for each application , eg. distribution of response times below a certain threshold  Analyzing distributions of response time at service level and exploit hypothesis testing and regression techniques to predict behavior and detect deviations from the norm. Salacia will implement fast algorithms based on standard machine learning techniques for fast and robust nonlinear regression.  Localising faults by analyzing the relation between the health status at application level and the application topology retrieved from weave scope as an adjacency list of containers, and issuing 11/30 fault alerts that indicate the culprit application and/or pod, by exploiting the information on the application topology.  Activating self-healing procedures, which will leverage self-healing functionalities of Kubernetes to implement self-healing actions on the pods that Salacia localises as responsible for the faulty behavior at application level.
  3. Building a model of the normal behavior for each application from collections of pods running the same application, by relying on fast deep learning techniques (Deep Belief Networks, Deep Convolutional Neural Networks) trained in a semi-supervised fashion, without relying on faulty data for training  Improving supervised learning techniques for performance deviation analysis, leveraging userbased SLA violation as labels for each application , eg. distribution of response times below a certain threshold  Analyzing distributions of response time at service level and exploit hypothesis testing and regression techniques to predict behavior and detect deviations from the norm. Salacia will implement fast algorithms based on standard machine learning techniques for fast and robust nonlinear regression.  Localising faults by analyzing the relation between the health status at application level and the application topology retrieved from weave scope as an adjacency list of containers, and issuing 11/30 fault alerts that indicate the culprit application and/or pod, by exploiting the information on the application topology.  Activating self-healing procedures, which will leverage self-healing functionalities of Kubernetes to implement self-healing actions on the pods that Salacia localises as responsible for the faulty behavior at application level.
  4. Building a model of the normal behavior for each application from collections of pods running the same application, by relying on fast deep learning techniques (Deep Belief Networks, Deep Convolutional Neural Networks) trained in a semi-supervised fashion, without relying on faulty data for training  Improving supervised learning techniques for performance deviation analysis, leveraging userbased SLA violation as labels for each application , eg. distribution of response times below a certain threshold  Analyzing distributions of response time at service level and exploit hypothesis testing and regression techniques to predict behavior and detect deviations from the norm. Salacia will implement fast algorithms based on standard machine learning techniques for fast and robust nonlinear regression.  Localising faults by analyzing the relation between the health status at application level and the application topology retrieved from weave scope as an adjacency list of containers, and issuing 11/30 fault alerts that indicate the culprit application and/or pod, by exploiting the information on the application topology.  Activating self-healing procedures, which will leverage self-healing functionalities of Kubernetes to implement self-healing actions on the pods that Salacia localises as responsible for the faulty behavior at application level.
  5. Building a model of the normal behavior for each application from collections of pods running the same application, by relying on fast deep learning techniques (Deep Belief Networks, Deep Convolutional Neural Networks) trained in a semi-supervised fashion, without relying on faulty data for training  Improving supervised learning techniques for performance deviation analysis, leveraging userbased SLA violation as labels for each application , eg. distribution of response times below a certain threshold  Analyzing distributions of response time at service level and exploit hypothesis testing and regression techniques to predict behavior and detect deviations from the norm. Salacia will implement fast algorithms based on standard machine learning techniques for fast and robust nonlinear regression.  Localising faults by analyzing the relation between the health status at application level and the application topology retrieved from weave scope as an adjacency list of containers, and issuing 11/30 fault alerts that indicate the culprit application and/or pod, by exploiting the information on the application topology.  Activating self-healing procedures, which will leverage self-healing functionalities of Kubernetes to implement self-healing actions on the pods that Salacia localises as responsible for the faulty behavior at application level.
  6. Building a model of the normal behavior for each application from collections of pods running the same application, by relying on fast deep learning techniques (Deep Belief Networks, Deep Convolutional Neural Networks) trained in a semi-supervised fashion, without relying on faulty data for training  Improving supervised learning techniques for performance deviation analysis, leveraging userbased SLA violation as labels for each application , eg. distribution of response times below a certain threshold  Analyzing distributions of response time at service level and exploit hypothesis testing and regression techniques to predict behavior and detect deviations from the norm. Salacia will implement fast algorithms based on standard machine learning techniques for fast and robust nonlinear regression.  Localising faults by analyzing the relation between the health status at application level and the application topology retrieved from weave scope as an adjacency list of containers, and issuing 11/30 fault alerts that indicate the culprit application and/or pod, by exploiting the information on the application topology.  Activating self-healing procedures, which will leverage self-healing functionalities of Kubernetes to implement self-healing actions on the pods that Salacia localises as responsible for the faulty behavior at application level.
  7. Building a model of the normal behavior for each application from collections of pods running the same application, by relying on fast deep learning techniques (Deep Belief Networks, Deep Convolutional Neural Networks) trained in a semi-supervised fashion, without relying on faulty data for training  Improving supervised learning techniques for performance deviation analysis, leveraging userbased SLA violation as labels for each application , eg. distribution of response times below a certain threshold  Analyzing distributions of response time at service level and exploit hypothesis testing and regression techniques to predict behavior and detect deviations from the norm. Salacia will implement fast algorithms based on standard machine learning techniques for fast and robust nonlinear regression.  Localising faults by analyzing the relation between the health status at application level and the application topology retrieved from weave scope as an adjacency list of containers, and issuing 11/30 fault alerts that indicate the culprit application and/or pod, by exploiting the information on the application topology.  Activating self-healing procedures, which will leverage self-healing functionalities of Kubernetes to implement self-healing actions on the pods that Salacia localises as responsible for the faulty behavior at application level.
  8. Building a model of the normal behavior for each application from collections of pods running the same application, by relying on fast deep learning techniques (Deep Belief Networks, Deep Convolutional Neural Networks) trained in a semi-supervised fashion, without relying on faulty data for training  Improving supervised learning techniques for performance deviation analysis, leveraging userbased SLA violation as labels for each application , eg. distribution of response times below a certain threshold  Analyzing distributions of response time at service level and exploit hypothesis testing and regression techniques to predict behavior and detect deviations from the norm. Salacia will implement fast algorithms based on standard machine learning techniques for fast and robust nonlinear regression.  Localising faults by analyzing the relation between the health status at application level and the application topology retrieved from weave scope as an adjacency list of containers, and issuing 11/30 fault alerts that indicate the culprit application and/or pod, by exploiting the information on the application topology.  Activating self-healing procedures, which will leverage self-healing functionalities of Kubernetes to implement self-healing actions on the pods that Salacia localises as responsible for the faulty behavior at application level.
  9. Building a model of the normal behavior for each application from collections of pods running the same application, by relying on fast deep learning techniques (Deep Belief Networks, Deep Convolutional Neural Networks) trained in a semi-supervised fashion, without relying on faulty data for training  Improving supervised learning techniques for performance deviation analysis, leveraging userbased SLA violation as labels for each application , eg. distribution of response times below a certain threshold  Analyzing distributions of response time at service level and exploit hypothesis testing and regression techniques to predict behavior and detect deviations from the norm. Salacia will implement fast algorithms based on standard machine learning techniques for fast and robust nonlinear regression.  Localising faults by analyzing the relation between the health status at application level and the application topology retrieved from weave scope as an adjacency list of containers, and issuing 11/30 fault alerts that indicate the culprit application and/or pod, by exploiting the information on the application topology.  Activating self-healing procedures, which will leverage self-healing functionalities of Kubernetes to implement self-healing actions on the pods that Salacia localises as responsible for the faulty behavior at application level.
  10. Building a model of the normal behavior for each application from collections of pods running the same application, by relying on fast deep learning techniques (Deep Belief Networks, Deep Convolutional Neural Networks) trained in a semi-supervised fashion, without relying on faulty data for training  Improving supervised learning techniques for performance deviation analysis, leveraging userbased SLA violation as labels for each application , eg. distribution of response times below a certain threshold  Analyzing distributions of response time at service level and exploit hypothesis testing and regression techniques to predict behavior and detect deviations from the norm. Salacia will implement fast algorithms based on standard machine learning techniques for fast and robust nonlinear regression.  Localising faults by analyzing the relation between the health status at application level and the application topology retrieved from weave scope as an adjacency list of containers, and issuing 11/30 fault alerts that indicate the culprit application and/or pod, by exploiting the information on the application topology.  Activating self-healing procedures, which will leverage self-healing functionalities of Kubernetes to implement self-healing actions on the pods that Salacia localises as responsible for the faulty behavior at application level.
  11. Building a model of the normal behavior for each application from collections of pods running the same application, by relying on fast deep learning techniques (Deep Belief Networks, Deep Convolutional Neural Networks) trained in a semi-supervised fashion, without relying on faulty data for training  Improving supervised learning techniques for performance deviation analysis, leveraging userbased SLA violation as labels for each application , eg. distribution of response times below a certain threshold  Analyzing distributions of response time at service level and exploit hypothesis testing and regression techniques to predict behavior and detect deviations from the norm. Salacia will implement fast algorithms based on standard machine learning techniques for fast and robust nonlinear regression.  Localising faults by analyzing the relation between the health status at application level and the application topology retrieved from weave scope as an adjacency list of containers, and issuing 11/30 fault alerts that indicate the culprit application and/or pod, by exploiting the information on the application topology.  Activating self-healing procedures, which will leverage self-healing functionalities of Kubernetes to implement self-healing actions on the pods that Salacia localises as responsible for the faulty behavior at application level.
  12. Building a model of the normal behavior for each application from collections of pods running the same application, by relying on fast deep learning techniques (Deep Belief Networks, Deep Convolutional Neural Networks) trained in a semi-supervised fashion, without relying on faulty data for training  Improving supervised learning techniques for performance deviation analysis, leveraging userbased SLA violation as labels for each application , eg. distribution of response times below a certain threshold  Analyzing distributions of response time at service level and exploit hypothesis testing and regression techniques to predict behavior and detect deviations from the norm. Salacia will implement fast algorithms based on standard machine learning techniques for fast and robust nonlinear regression.  Localising faults by analyzing the relation between the health status at application level and the application topology retrieved from weave scope as an adjacency list of containers, and issuing 11/30 fault alerts that indicate the culprit application and/or pod, by exploiting the information on the application topology.  Activating self-healing procedures, which will leverage self-healing functionalities of Kubernetes to implement self-healing actions on the pods that Salacia localises as responsible for the faulty behavior at application level.
  13. Building a model of the normal behavior for each application from collections of pods running the same application, by relying on fast deep learning techniques (Deep Belief Networks, Deep Convolutional Neural Networks) trained in a semi-supervised fashion, without relying on faulty data for training  Improving supervised learning techniques for performance deviation analysis, leveraging userbased SLA violation as labels for each application , eg. distribution of response times below a certain threshold  Analyzing distributions of response time at service level and exploit hypothesis testing and regression techniques to predict behavior and detect deviations from the norm. Salacia will implement fast algorithms based on standard machine learning techniques for fast and robust nonlinear regression.  Localising faults by analyzing the relation between the health status at application level and the application topology retrieved from weave scope as an adjacency list of containers, and issuing 11/30 fault alerts that indicate the culprit application and/or pod, by exploiting the information on the application topology.  Activating self-healing procedures, which will leverage self-healing functionalities of Kubernetes to implement self-healing actions on the pods that Salacia localises as responsible for the faulty behavior at application level.
  14. Building a model of the normal behavior for each application from collections of pods running the same application, by relying on fast deep learning techniques (Deep Belief Networks, Deep Convolutional Neural Networks) trained in a semi-supervised fashion, without relying on faulty data for training  Improving supervised learning techniques for performance deviation analysis, leveraging userbased SLA violation as labels for each application , eg. distribution of response times below a certain threshold  Analyzing distributions of response time at service level and exploit hypothesis testing and regression techniques to predict behavior and detect deviations from the norm. Salacia will implement fast algorithms based on standard machine learning techniques for fast and robust nonlinear regression.  Localising faults by analyzing the relation between the health status at application level and the application topology retrieved from weave scope as an adjacency list of containers, and issuing 11/30 fault alerts that indicate the culprit application and/or pod, by exploiting the information on the application topology.  Activating self-healing procedures, which will leverage self-healing functionalities of Kubernetes to implement self-healing actions on the pods that Salacia localises as responsible for the faulty behavior at application level.
  15. Building a model of the normal behavior for each application from collections of pods running the same application, by relying on fast deep learning techniques (Deep Belief Networks, Deep Convolutional Neural Networks) trained in a semi-supervised fashion, without relying on faulty data for training  Improving supervised learning techniques for performance deviation analysis, leveraging userbased SLA violation as labels for each application , eg. distribution of response times below a certain threshold  Analyzing distributions of response time at service level and exploit hypothesis testing and regression techniques to predict behavior and detect deviations from the norm. Salacia will implement fast algorithms based on standard machine learning techniques for fast and robust nonlinear regression.  Localising faults by analyzing the relation between the health status at application level and the application topology retrieved from weave scope as an adjacency list of containers, and issuing 11/30 fault alerts that indicate the culprit application and/or pod, by exploiting the information on the application topology.  Activating self-healing procedures, which will leverage self-healing functionalities of Kubernetes to implement self-healing actions on the pods that Salacia localises as responsible for the faulty behavior at application level.
  16. Building a model of the normal behavior for each application from collections of pods running the same application, by relying on fast deep learning techniques (Deep Belief Networks, Deep Convolutional Neural Networks) trained in a semi-supervised fashion, without relying on faulty data for training  Improving supervised learning techniques for performance deviation analysis, leveraging userbased SLA violation as labels for each application , eg. distribution of response times below a certain threshold  Analyzing distributions of response time at service level and exploit hypothesis testing and regression techniques to predict behavior and detect deviations from the norm. Salacia will implement fast algorithms based on standard machine learning techniques for fast and robust nonlinear regression.  Localising faults by analyzing the relation between the health status at application level and the application topology retrieved from weave scope as an adjacency list of containers, and issuing 11/30 fault alerts that indicate the culprit application and/or pod, by exploiting the information on the application topology.  Activating self-healing procedures, which will leverage self-healing functionalities of Kubernetes to implement self-healing actions on the pods that Salacia localises as responsible for the faulty behavior at application level.
  17. Building a model of the normal behavior for each application from collections of pods running the same application, by relying on fast deep learning techniques (Deep Belief Networks, Deep Convolutional Neural Networks) trained in a semi-supervised fashion, without relying on faulty data for training  Improving supervised learning techniques for performance deviation analysis, leveraging userbased SLA violation as labels for each application , eg. distribution of response times below a certain threshold  Analyzing distributions of response time at service level and exploit hypothesis testing and regression techniques to predict behavior and detect deviations from the norm. Salacia will implement fast algorithms based on standard machine learning techniques for fast and robust nonlinear regression.  Localising faults by analyzing the relation between the health status at application level and the application topology retrieved from weave scope as an adjacency list of containers, and issuing 11/30 fault alerts that indicate the culprit application and/or pod, by exploiting the information on the application topology.  Activating self-healing procedures, which will leverage self-healing functionalities of Kubernetes to implement self-healing actions on the pods that Salacia localises as responsible for the faulty behavior at application level.
  18. Building a model of the normal behavior for each application from collections of pods running the same application, by relying on fast deep learning techniques (Deep Belief Networks, Deep Convolutional Neural Networks) trained in a semi-supervised fashion, without relying on faulty data for training  Improving supervised learning techniques for performance deviation analysis, leveraging userbased SLA violation as labels for each application , eg. distribution of response times below a certain threshold  Analyzing distributions of response time at service level and exploit hypothesis testing and regression techniques to predict behavior and detect deviations from the norm. Salacia will implement fast algorithms based on standard machine learning techniques for fast and robust nonlinear regression.  Localising faults by analyzing the relation between the health status at application level and the application topology retrieved from weave scope as an adjacency list of containers, and issuing 11/30 fault alerts that indicate the culprit application and/or pod, by exploiting the information on the application topology.  Activating self-healing procedures, which will leverage self-healing functionalities of Kubernetes to implement self-healing actions on the pods that Salacia localises as responsible for the faulty behavior at application level.
  19. Building a model of the normal behavior for each application from collections of pods running the same application, by relying on fast deep learning techniques (Deep Belief Networks, Deep Convolutional Neural Networks) trained in a semi-supervised fashion, without relying on faulty data for training  Improving supervised learning techniques for performance deviation analysis, leveraging userbased SLA violation as labels for each application , eg. distribution of response times below a certain threshold  Analyzing distributions of response time at service level and exploit hypothesis testing and regression techniques to predict behavior and detect deviations from the norm. Salacia will implement fast algorithms based on standard machine learning techniques for fast and robust nonlinear regression.  Localising faults by analyzing the relation between the health status at application level and the application topology retrieved from weave scope as an adjacency list of containers, and issuing 11/30 fault alerts that indicate the culprit application and/or pod, by exploiting the information on the application topology.  Activating self-healing procedures, which will leverage self-healing functionalities of Kubernetes to implement self-healing actions on the pods that Salacia localises as responsible for the faulty behavior at application level.
  20. Building a model of the normal behavior for each application from collections of pods running the same application, by relying on fast deep learning techniques (Deep Belief Networks, Deep Convolutional Neural Networks) trained in a semi-supervised fashion, without relying on faulty data for training  Improving supervised learning techniques for performance deviation analysis, leveraging userbased SLA violation as labels for each application , eg. distribution of response times below a certain threshold  Analyzing distributions of response time at service level and exploit hypothesis testing and regression techniques to predict behavior and detect deviations from the norm. Salacia will implement fast algorithms based on standard machine learning techniques for fast and robust nonlinear regression.  Localising faults by analyzing the relation between the health status at application level and the application topology retrieved from weave scope as an adjacency list of containers, and issuing 11/30 fault alerts that indicate the culprit application and/or pod, by exploiting the information on the application topology.  Activating self-healing procedures, which will leverage self-healing functionalities of Kubernetes to implement self-healing actions on the pods that Salacia localises as responsible for the faulty behavior at application level.
  21. Building a model of the normal behavior for each application from collections of pods running the same application, by relying on fast deep learning techniques (Deep Belief Networks, Deep Convolutional Neural Networks) trained in a semi-supervised fashion, without relying on faulty data for training  Improving supervised learning techniques for performance deviation analysis, leveraging userbased SLA violation as labels for each application , eg. distribution of response times below a certain threshold  Analyzing distributions of response time at service level and exploit hypothesis testing and regression techniques to predict behavior and detect deviations from the norm. Salacia will implement fast algorithms based on standard machine learning techniques for fast and robust nonlinear regression.  Localising faults by analyzing the relation between the health status at application level and the application topology retrieved from weave scope as an adjacency list of containers, and issuing 11/30 fault alerts that indicate the culprit application and/or pod, by exploiting the information on the application topology.  Activating self-healing procedures, which will leverage self-healing functionalities of Kubernetes to implement self-healing actions on the pods that Salacia localises as responsible for the faulty behavior at application level.