'Machine learning’ is one of those cringy phrases, almost (if not already) taboo in the world of high-tech SaaS. Applying true machine learning to an organization’s product(s), however, can have real benefit for the business, its clients, and the industry as a whole. From credit card fraud investigations to the way that a car is built, machine learning has permeated our everyday life without a common understanding of what it is and how to implement it.
Unblocking The Main Thread Solving ANRs and Frozen Frames
Machine Learning: Addressing the Disillusionment to Bring Actual Business Benefit
1.
2. Machine Learning
Jon Mead
Technical Services Director, North America
Egress Software Technologies, Inc.
June 14, 2019
Addressing the Disillusionment to Bring Actual
Business Benefit
3. About the Speaker
Jon Mead
Technical Services Director North America, Egress Software
An experienced technical engineer, Jon has worked across corporate
and government organizations to effectively deploy and manage
SaaS technologies in complex environments. As Technical Services
Director for North America, Jon provides expert technical support
and guidance to Egress clients as they achieve key compliance and
business objectives. Working closely with strategic personnel at
Egress, Jon plays an integral part in the development and delivery of
the company’s innovative data security platform that empowers
users to send, receive and manage information without risk.
4. A leader in intelligent, user-centric data security
• A decade of success in sophisticated defense, government and private sector data privacy.
• Identify, Classify, Secure, Control, Monitor, Audit & Report
• 2000+ Enterprise customers across industry:
• US Headquarters in Boston, MA.
• Vetted, certified products and services (NIST, ISO, NATO, Common Criteria)
Banking and
insurance
Government Healthcare Non-profit Professional
services
Industry
regulators
Utilities
About Egress
5.
6. Machine Learning: Where do we begin?
» Define the real-world problem
Is there a problem to solve?
Can we solve the problem?
Should we solve the problem?
What data do we need to solve this problem?
Can/Should we use Machine Learning?
7. The rise of mistake-driven breaches – 2018 Verizon Data Breach Report*
*53,308 security incidents, 2,216 data breaches, 65 countries, 67 contributors. https://www.verizonenterprise.com/verizon-insights-lab/dbir/
8. Machine Learning: An Example
» Define the real-world problem
R Is there a problem to solve?
R Can we solve the problem?
R Should we solve the problem?
R What data do we need to solve this problem?
R Can/Should we use Machine Learning?
9. Business Problem:
How does an organization handle real-world risks to data as it travels over
untrusted networks to potentially untrusted recipients?
Can an organization consider human error and/or malicious behavior with
that data?
Ultimately, how can an organization avoid data breaches and demonstrate
compliance with rigorous data protection regulations, such as CCPA, in the
real-world?
Egress: A Problem worth Machine Learning?
10. Machine Learning Process
• Define the objective of the Problem Statement
• Data Gathering
• Data Preparation
• Exploratory Data Analysis
• Building a Machine Learning Model
• Model Evaluation & Optimization
• Predictions
11. Business Machine Learning Process
• Define the Business Objective (Problem)
• Source the appropriate data
• Split the data in a meaningful way
• Select the evaluation metric(s)
• Define all features that may be created from the data
• Train the model
• Feature selection
• Production system
• Feed the model
12. Define the Business Objective
Source the appropriate data
Split the data
Select the evaluation metrics
Define all features
Train the model
Feature Selection
Create Production Version
13. Machine Learning in practice
Machine Learning in production
Common Pitfalls when deploying from practice to
production?
How are these pitfalls defined?
Deploying Machine Learning: Common Pitfalls?
14. Sampling Bias
Data Leakage
Unknown Unknowns
Scaling and Normalization
Impact of Outliers
Fitting Data
Overfitting the Model
Social Engineering
Deploying Machine Learning: Common Pitfalls?
16. Use Tags and Labels to organize structured data
Unstructured Data – How do we organize?
How can we prevent data leakage in our machine
learning model?
Deploying Machine Learning: Data Leakage
17. What are the unknown unknowns in Machine
Learning?
Why are unknown unknowns a problem for Machine
Learning?
How can we address unknown unknowns in our
machine learning model?
Deploying Machine Learning: Unknown Unknowns
18. What is the impact of Scaling in Machine Learning
and how can it hurt our model?
What is normalization and why should we consider it
when working with Machine Learning?
Deploying Machine Learning: Scaling and Normalization
22. Without enough data, organizations are at risk of
overfitting the machine learning model
Using all the data in the world does not mean that
the developed model is accurate, or even viable
Complication is impressive, but simplicity is brilliance
Deploying Machine Learning: Overfitting
23. What is the impact of Social Engineering in Machine
Learning?
How can models defend against social engineering
attacks?
Deploying Machine Learning: Social Engineering
24.
25. Original Business Problem:
How does an organization handle real-world risks to data as it travels over
untrusted networks to potentially untrusted recipients?
Can an organization consider human error and/or malicious behavior with
that data?
Ultimately, how can an organization avoid data breaches and demonstrate
compliance with rigorous data protection regulations, such as CCPA, in the
real-world?
Egress: How did we employ Machine Learning?
26. » Apply protection and rights management
on-the-fly based on risk
» Protect against the accidental
sharing of data
» Auto-encrypt messages for
other Egress clients
» Increases user engagement
and adoption
Risk-Based Protection: What?
27. » Analyses previous email communications
to protect from accidental sends
» Calculates a risk score based on domain,
user behaviour and system info
» Applies protection based on
sensitivity of data and risk score
» Uses any email protection,
including TLS, O365, Voltage, etc.
Risk-Based Protection: How?
28. Use historical behavior to detect anomalies
Parallel processing and cloud AI enables
“cognitive” processing of vast quantities of
collected data
“Graph” databases: Link relationships and past
behaviour to quickly detect anomalies and
pattern changes
Outcomes change with learning, time, and data
Analysis of user “cliques” (groups) to detect and
prevent accidents
A New Way: Machine Learning to Detect Errors
29. Data Leakage
Scaling with Machine Learning
Selecting Appropriate Fitting Data
Social Engineering
That’s great… but what about all those pitfalls?
30. Data Leakage
Identified left-out data
Unsupervised Probabilistic Machine Learning
Historical Behavior with real-time comparison
Egress Data Leakage Resolution
31. Scaling with Machine Learning: Serverless Technologies
What is Serverless?
Why use serverless?
Benefits from the serverless architecture in practice with Machine
Learning
Egress Addresses Scaling with Machine Learning
32. Fitting Data Problem
Data Selection and Testing application
Build several models to develop the Golden Model
Run parallel models in fitting and in product
Feed the Machine
Egress Selection of Appropriate Fitting Data
33. Organizational Domain Relationship Model
Behavior-Based Risk Assessment: Why did we use a
problematic approach?
How did we mitigate the behavior-based risk assessment
model – Eager Update and User-Models
Egress Defending against Social Engineering / Malicious Data Manipulation
34. Future: What does this mean for our Clients
Data
Privacy
Data Security
NYDFS 23
NYCRR
500*
GDPR CA AB375
2017 2018 2019 ?2020
Feb 2018
Phase 2
Transition
ends. Full
compliance
Sept 2018
Phase 3
NAIC Model SC H4655
Colorado (3
CCR 704-1)
VT 4:4 Vt
Code R. 8:8-
4
CO House
Bill 18-1128
US state
Amended
Laws
35. Thank you!
Talk to us at the Egress stand.
E: info@egress.com
T: 1-800-732-0746
W: www.egress.com
Twitter: @EgressSoftware
"Despite what most SaaS companies are saying, Machine Learning requires time and
preparation. Whenever you hear the term AI, you must think about the data behind it." -
Alexandre Gonfalonieri, February 2019