SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
SESSION ID:
#RSAC
Anchit AroraApplying Auto-Data
Classification Techniques for
Large Data Sets
Program Manager
InfoSec, Cisco
PDAC-W02
#RSAC
• Complex work models: always accessible,
remote & mobile workers
• Definition of perimeter: Cloud, Customer &
partners
• Users choose devices (BYOD)
The proliferation of data and increase in
complexity
1995 2006 2014
9 to 5 in the
office
Emergence of Internet &
mobility
The Human
Network
2020
The Internet of
Everything
BYOD &
Externalization
Pace
• Enterprise data collection to increase 40 to 60 %
per year*
• Experts predict the amount of data generated
annually to increase 4300% by 2020 *
Complexity
• Big data architectures, low storage cost,
Increase of data retention
• 80% of data generated today is
unstructured
• Data generated worldwide will reach 44
zettabytes by 2020*
Volume
* Numbers and statistics from Gartner, Gigaom Research, CSC, Seagate
#RSAC
Auto-classification: The why and what
3
Desired business outcome: At Cisco we want to provide
additional sensitivity context to structured and unstructured
data, to be able to apply controls more effectively
Scope: Our aim is to have an automated classification capability
for all structured data systems, and provide capability to better
govern/control generation of unstructured data which is created
as a result of export from structured data systems using
label/field association to each record set
#RSAC
Use-case: From structured to unstructured
4
SoR
SoE
Structured data system (SoR)
Classification Engine
algorithms and
dictionaries
IndexerAPI
Classification
Index all existing and newly written data is indexed and
classified based on algorithm and dictionary defined for the SoR
Provide classification information to the user –
or access policy based on class to the application
UI
Export (E) & tag
#RSAC
5
Box.com is an external cloud platform used by Cisco for collaboration and
storage of data
Security questions to ask:
What is this data?
What’s the source of the data?
Who owns this data?
What’s the sensitivity of the data?
Is all data equally sensitive (this is the essence for optimal security)?
What’s the level of security required?
An unstructured data use-case: box.com
#RSAC
Should we ask the user to govern security?
6
Can we expect the user to make the right security decision with all this
complexity involved in decision making?
The user needs to be very knowledgable to make the right decision
The answer is No: But however many systems are designed to have
users govern security -
Recognize data categories in systems with unstructured data
Classify data in any data system
Set data securitypolicy
Securely export data out of the system
Making the shift from user governed to data owner governed
#RSAC
Data Management
Policy
Enforcement
Governance of Data by Data Owner
Data Protection capabilities
Data Intelligence & monitoring
capabilities
Governance of
Data by End User
How to make the shift to a data owner model?
Classify
Sensitivity
Data
Taxonomy
Recognize
Data Type
Tag
Across various data types: Engineering, Customer, Finance, HR
#RSAC
Conceptual approach
8
Discover Recognize Classify
Find data
objects
Identified
Data
Sensitivity1
Large
unstructured
generic data
repositories
Classification mostly unknown
Data
Sensitivity2
Data
Sensitivity3
Data
Sensitivity4
Structured data
systems (SoR)
#RSAC
Structured data case study: Engineering & Customer data
protection in context of bug Information
#RSAC
A case study: Bug information
10
Millions of bugs + product bugs, 3 approaches available to protect:
1. Treat all bugs equally, and apply ‘very strict’ controls on all bugs
• In heterogenic data models , most data is ‘Over’-protected
• Limits business ability and User experience
2. Treat all bugs equally, and apply ‘loose’ controls on all bugs
• Results in ‘Under’-protected data
3. Apply the right amount of protection on a bug, based on sensitivity
• Balanced security and cost applied – just the right amount of security!
#RSAC
Setting the foundation for auto-class
11
Category:
Is a bug
Product
development
lifecycle:
Sustaining
Severity:
Sev1,
Status:
Open
Found by
Customer
Customer
network
topology
Belongs to
hardware
A Sensitive software bug
in CDETSInventory Process
Identify
• Identify the most
sensitive IP and IP’s
appropriate
owner(s)
Define
• Define data use and
access rules for the
most sensitive IP
Translate
• Translate rules into
IT enforceable
policies
The inventory process engages the business to build out the data
taxonomy and a model of the sensitivity
#RSAC
The proof is in the numbers!!
12
Parameter Value
Average time to classify a single bug 5 minutes
Total number of bugs 7 Million
Time to classify 35 Million minutes
Cost/min of SME analyst $ 0.83/Min
Cost to classify $ 29 Million
Additional costs to consider for manual:
Training: For consistent user behavior
Change to business: Cleaning legacy
Change to applications and Infrastructure
Parameter Value
Average time to classify a single bug* 0.002 minutes
Total number of bugs 7 Million
Time to classify 14,000 Minutes
Estimated cost for Infrastructure and resources
required to classify
$ 0.25 Million
Auto-Classification approach
Manual approach
Accuracy Results
83%
#RSAC
The most sensitive data is just a small portion
13
< 1% Restricted
2.5% Highly Confidential
#RSAC
How did we execute the methodology?
14
AS-IS: New SoR integration for Auto-Class
# Phase Scope
1 Engage Identify SoR and engage stakeholders to communicate expectations, R&R, Identify data workflow (user stories)
and data categories. Plan and establish scope and planning of the SoR integration
2 Attribute Analysis of data, database fields, record and build a data sensitivity model / algorithm to be able to classify the
data
3 Develop Development of attribution and scoring algorithm into the classification engine and perform indexing of
datasets
4 Validate Validation and tuning of classification results of the classification engine to ensure accuracy of the output
5 Integrate Integration of classification data with the source system
6 Protect Planning and implementation of protective measures in the source system for sensitive data classes
Engage Attribute Develop Validate Integrate Protect
A 6 step workflow, for structured data (SoR)
#RSAC
Building an attribution model
15
Attribute A, Attribute B, Attribute C …………………….
Attribute L, Attribute M, Attribute N……………
Attribute X, Attribute Y, Attribute Z……
All available source system
built-in attributes
Selected attributes and values
Extracted entities from free-text fields
and attachments:
Attribution model
Weights
Scoring
equation
Values
and
scores
Classification
rules
Data
freshness
Contextual
information
Extracted
entities
#RSAC
How to create a similar solution for your
organization?
16
Engage
•System
Identification
•Stakeholder
identification
•Source system
data fields
•Field analysis
•Field type analysis
•Data record
analysis
•Define Dictionary
•Candidate fields
•Feasibility
•Socialization
Attribute
•Field value
assignment
•Field correlation
•Weight scoring
•Sensitivity scoring
Develop
•Classification
engine
Infrastructure
Setup
•Classification
engine
configuration
•Coding of
classification
algorithm
Validate
•Sample size
scoping
•Sample size
indexing
•Validation of
sample set
•Statistical
validation of
sample set
•Tune
•Result
socialization
Integrate
•Design
•User stories
•Source system
tagging
(application
tagging)
•Stakeholder
Socialization
Protect
•Access control
•Behavior
monitoring
•Source System
Secure design
•Source System
compliance
•Export control
•Import control
•Data Loss
#RSAC
Now what? - Prevent, Detect and Educate
17
Data
Visibility
Prevent
DetectEducate
• Restrict access to the application and
through search
• Fine grain access based on data
classification
• Tag source systems and docs w/
classification metadata
• Focus on most sensitive data
• Integration with DLP solutions
• Data science
Policy Driven,
Context-Based
Access Control
Access
Visibility
Control
Restricted
Why
• Bug Status: Open
• Bug Severity: Critical
• Keywords: Customer:
#RSAC
Q&A
18
Anchit Arora
Program Manager
InfoSec, Data Security Analytics Team
ancarora@cisco.com

Weitere ähnliche Inhalte

Was ist angesagt?

bcs_sb_TechPartner_SAPlatform_Damballa_EN_v1a (2)
bcs_sb_TechPartner_SAPlatform_Damballa_EN_v1a (2)bcs_sb_TechPartner_SAPlatform_Damballa_EN_v1a (2)
bcs_sb_TechPartner_SAPlatform_Damballa_EN_v1a (2)
Sam Kumarsamy
 

Was ist angesagt? (20)

CLOUD SECURITY ESSENTIALS 2.0 Full Stack Hacking & Recovery
CLOUD SECURITY ESSENTIALS 2.0 Full Stack Hacking & RecoveryCLOUD SECURITY ESSENTIALS 2.0 Full Stack Hacking & Recovery
CLOUD SECURITY ESSENTIALS 2.0 Full Stack Hacking & Recovery
 
Threat intel- -content-curation-organizing-the-path-to-successful-detection
Threat intel- -content-curation-organizing-the-path-to-successful-detectionThreat intel- -content-curation-organizing-the-path-to-successful-detection
Threat intel- -content-curation-organizing-the-path-to-successful-detection
 
How To Avoid The Top Ten Software Security Flaws
How To Avoid The Top Ten Software Security FlawsHow To Avoid The Top Ten Software Security Flaws
How To Avoid The Top Ten Software Security Flaws
 
Insights from-NSAs-cybersecurity-threat-operations-center
Insights from-NSAs-cybersecurity-threat-operations-centerInsights from-NSAs-cybersecurity-threat-operations-center
Insights from-NSAs-cybersecurity-threat-operations-center
 
Westjets Security Architecture Made Simple We Finally Got It Right
Westjets Security Architecture Made Simple We Finally Got It RightWestjets Security Architecture Made Simple We Finally Got It Right
Westjets Security Architecture Made Simple We Finally Got It Right
 
Soc 2030-socs-are-broken-lets-fix- them
Soc 2030-socs-are-broken-lets-fix- themSoc 2030-socs-are-broken-lets-fix- them
Soc 2030-socs-are-broken-lets-fix- them
 
Pulling our-socs-up
Pulling our-socs-upPulling our-socs-up
Pulling our-socs-up
 
The Rise of the Purple Team
The Rise of the Purple TeamThe Rise of the Purple Team
The Rise of the Purple Team
 
Aspirin as a Service: Using the Cloud to Cure Security Headaches
Aspirin as a Service: Using the Cloud to Cure Security HeadachesAspirin as a Service: Using the Cloud to Cure Security Headaches
Aspirin as a Service: Using the Cloud to Cure Security Headaches
 
Implementing An Automated Incident Response Architecture
Implementing An Automated Incident Response ArchitectureImplementing An Automated Incident Response Architecture
Implementing An Automated Incident Response Architecture
 
Introduction and a Look at Security Trends
Introduction and a Look at Security TrendsIntroduction and a Look at Security Trends
Introduction and a Look at Security Trends
 
Predicting exploitability-forecasts-for-vulnerability-management
Predicting exploitability-forecasts-for-vulnerability-managementPredicting exploitability-forecasts-for-vulnerability-management
Predicting exploitability-forecasts-for-vulnerability-management
 
Cloud Breach – Preparation and Response
Cloud Breach – Preparation and ResponseCloud Breach – Preparation and Response
Cloud Breach – Preparation and Response
 
Achieving Defendable Architectures Via Threat Driven Methodologies
Achieving Defendable Architectures Via Threat Driven MethodologiesAchieving Defendable Architectures Via Threat Driven Methodologies
Achieving Defendable Architectures Via Threat Driven Methodologies
 
Incident response-in-the-cloud
Incident response-in-the-cloudIncident response-in-the-cloud
Incident response-in-the-cloud
 
Crypto 101: Encryption, Codebreaking, SSL and Bitcoin
Crypto 101: Encryption, Codebreaking, SSL and BitcoinCrypto 101: Encryption, Codebreaking, SSL and Bitcoin
Crypto 101: Encryption, Codebreaking, SSL and Bitcoin
 
From SIEM to SOC: Crossing the Cybersecurity Chasm
From SIEM to SOC: Crossing the Cybersecurity ChasmFrom SIEM to SOC: Crossing the Cybersecurity Chasm
From SIEM to SOC: Crossing the Cybersecurity Chasm
 
Soc analyst course content
Soc analyst course contentSoc analyst course content
Soc analyst course content
 
bcs_sb_TechPartner_SAPlatform_Damballa_EN_v1a (2)
bcs_sb_TechPartner_SAPlatform_Damballa_EN_v1a (2)bcs_sb_TechPartner_SAPlatform_Damballa_EN_v1a (2)
bcs_sb_TechPartner_SAPlatform_Damballa_EN_v1a (2)
 
Overview of Google’s BeyondCorp Approach to Security
 Overview of Google’s BeyondCorp Approach to Security Overview of Google’s BeyondCorp Approach to Security
Overview of Google’s BeyondCorp Approach to Security
 

Ähnlich wie Applying Auto-Data Classification Techniques for Large Data Sets

Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for Cybersecurity
VMware Tanzu
 
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSIONCisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Renee Yao
 

Ähnlich wie Applying Auto-Data Classification Techniques for Large Data Sets (20)

(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-shared
 
IRJET- Efficient Privacy-Preserving using Novel Based Secure Protocol in SVM
IRJET-  	  Efficient Privacy-Preserving using Novel Based Secure Protocol in SVMIRJET-  	  Efficient Privacy-Preserving using Novel Based Secure Protocol in SVM
IRJET- Efficient Privacy-Preserving using Novel Based Secure Protocol in SVM
 
Data Works Berlin 2018 - Worldpay - PCI Compliance
Data Works Berlin 2018 - Worldpay - PCI ComplianceData Works Berlin 2018 - Worldpay - PCI Compliance
Data Works Berlin 2018 - Worldpay - PCI Compliance
 
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
 
RSA-Pivotal Security Big Data Reference Architecture
RSA-Pivotal Security Big Data Reference ArchitectureRSA-Pivotal Security Big Data Reference Architecture
RSA-Pivotal Security Big Data Reference Architecture
 
How the latest trends in data security can help your data protection strategy...
How the latest trends in data security can help your data protection strategy...How the latest trends in data security can help your data protection strategy...
How the latest trends in data security can help your data protection strategy...
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for Cybersecurity
 
ICMCSI 2023 PPT 1074.pptx
ICMCSI 2023 PPT 1074.pptxICMCSI 2023 PPT 1074.pptx
ICMCSI 2023 PPT 1074.pptx
 
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSIONCisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
 
Enabling Secure and Efficient Multi-Keyword Ranked Search Scheme
Enabling Secure and Efficient Multi-Keyword Ranked Search SchemeEnabling Secure and Efficient Multi-Keyword Ranked Search Scheme
Enabling Secure and Efficient Multi-Keyword Ranked Search Scheme
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudBridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the Cloud
 
IRJET- Review on Privacy Preserving on Multi Keyword Search over Encrypte...
IRJET-  	  Review on Privacy Preserving on Multi Keyword Search over Encrypte...IRJET-  	  Review on Privacy Preserving on Multi Keyword Search over Encrypte...
IRJET- Review on Privacy Preserving on Multi Keyword Search over Encrypte...
 
Time to re think our security process
Time to re think our security processTime to re think our security process
Time to re think our security process
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Secure Privacy Preserving Using Multilevel Trust For Cloud Storage
Secure Privacy Preserving Using Multilevel Trust For Cloud StorageSecure Privacy Preserving Using Multilevel Trust For Cloud Storage
Secure Privacy Preserving Using Multilevel Trust For Cloud Storage
 
New technologies for data protection
New technologies for data protectionNew technologies for data protection
New technologies for data protection
 
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data VirtualizationDenodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
 
Data Modeling for Security, Privacy and Data Protection
Data Modeling for Security, Privacy and Data ProtectionData Modeling for Security, Privacy and Data Protection
Data Modeling for Security, Privacy and Data Protection
 
Cisco Analytics: Accelerate Network Optimization with Virtualization
Cisco Analytics: Accelerate Network Optimization with VirtualizationCisco Analytics: Accelerate Network Optimization with Virtualization
Cisco Analytics: Accelerate Network Optimization with Virtualization
 

Mehr von Priyanka Aash

Mehr von Priyanka Aash (20)

Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
 
Verizon Breach Investigation Report (VBIR).pdf
Verizon Breach Investigation Report (VBIR).pdfVerizon Breach Investigation Report (VBIR).pdf
Verizon Breach Investigation Report (VBIR).pdf
 
Top 10 Security Risks .pptx.pdf
Top 10 Security Risks .pptx.pdfTop 10 Security Risks .pptx.pdf
Top 10 Security Risks .pptx.pdf
 
Simplifying data privacy and protection.pdf
Simplifying data privacy and protection.pdfSimplifying data privacy and protection.pdf
Simplifying data privacy and protection.pdf
 
Generative AI and Security (1).pptx.pdf
Generative AI and Security (1).pptx.pdfGenerative AI and Security (1).pptx.pdf
Generative AI and Security (1).pptx.pdf
 
EVERY ATTACK INVOLVES EXPLOITATION OF A WEAKNESS.pdf
EVERY ATTACK INVOLVES EXPLOITATION OF A WEAKNESS.pdfEVERY ATTACK INVOLVES EXPLOITATION OF A WEAKNESS.pdf
EVERY ATTACK INVOLVES EXPLOITATION OF A WEAKNESS.pdf
 
DPDP Act 2023.pdf
DPDP Act 2023.pdfDPDP Act 2023.pdf
DPDP Act 2023.pdf
 
Cyber Truths_Are you Prepared version 1.1.pptx.pdf
Cyber Truths_Are you Prepared version 1.1.pptx.pdfCyber Truths_Are you Prepared version 1.1.pptx.pdf
Cyber Truths_Are you Prepared version 1.1.pptx.pdf
 
Cyber Crisis Management.pdf
Cyber Crisis Management.pdfCyber Crisis Management.pdf
Cyber Crisis Management.pdf
 
CISOPlatform journey.pptx.pdf
CISOPlatform journey.pptx.pdfCISOPlatform journey.pptx.pdf
CISOPlatform journey.pptx.pdf
 
Chennai Chapter.pptx.pdf
Chennai Chapter.pptx.pdfChennai Chapter.pptx.pdf
Chennai Chapter.pptx.pdf
 
Cloud attack vectors_Moshe.pdf
Cloud attack vectors_Moshe.pdfCloud attack vectors_Moshe.pdf
Cloud attack vectors_Moshe.pdf
 
Stories From The Web 3 Battlefield
Stories From The Web 3 BattlefieldStories From The Web 3 Battlefield
Stories From The Web 3 Battlefield
 
Lessons Learned From Ransomware Attacks
Lessons Learned From Ransomware AttacksLessons Learned From Ransomware Attacks
Lessons Learned From Ransomware Attacks
 
Emerging New Threats And Top CISO Priorities In 2022 (Chennai)
Emerging New Threats And Top CISO Priorities In 2022 (Chennai)Emerging New Threats And Top CISO Priorities In 2022 (Chennai)
Emerging New Threats And Top CISO Priorities In 2022 (Chennai)
 
Emerging New Threats And Top CISO Priorities In 2022 (Mumbai)
Emerging New Threats And Top CISO Priorities In 2022 (Mumbai)Emerging New Threats And Top CISO Priorities In 2022 (Mumbai)
Emerging New Threats And Top CISO Priorities In 2022 (Mumbai)
 
Emerging New Threats And Top CISO Priorities in 2022 (Bangalore)
Emerging New Threats And Top CISO Priorities in 2022 (Bangalore)Emerging New Threats And Top CISO Priorities in 2022 (Bangalore)
Emerging New Threats And Top CISO Priorities in 2022 (Bangalore)
 
Cloud Security: Limitations of Cloud Security Groups and Flow Logs
Cloud Security: Limitations of Cloud Security Groups and Flow LogsCloud Security: Limitations of Cloud Security Groups and Flow Logs
Cloud Security: Limitations of Cloud Security Groups and Flow Logs
 
Cyber Security Governance
Cyber Security GovernanceCyber Security Governance
Cyber Security Governance
 
Ethical Hacking
Ethical HackingEthical Hacking
Ethical Hacking
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Applying Auto-Data Classification Techniques for Large Data Sets

  • 1. SESSION ID: #RSAC Anchit AroraApplying Auto-Data Classification Techniques for Large Data Sets Program Manager InfoSec, Cisco PDAC-W02
  • 2. #RSAC • Complex work models: always accessible, remote & mobile workers • Definition of perimeter: Cloud, Customer & partners • Users choose devices (BYOD) The proliferation of data and increase in complexity 1995 2006 2014 9 to 5 in the office Emergence of Internet & mobility The Human Network 2020 The Internet of Everything BYOD & Externalization Pace • Enterprise data collection to increase 40 to 60 % per year* • Experts predict the amount of data generated annually to increase 4300% by 2020 * Complexity • Big data architectures, low storage cost, Increase of data retention • 80% of data generated today is unstructured • Data generated worldwide will reach 44 zettabytes by 2020* Volume * Numbers and statistics from Gartner, Gigaom Research, CSC, Seagate
  • 3. #RSAC Auto-classification: The why and what 3 Desired business outcome: At Cisco we want to provide additional sensitivity context to structured and unstructured data, to be able to apply controls more effectively Scope: Our aim is to have an automated classification capability for all structured data systems, and provide capability to better govern/control generation of unstructured data which is created as a result of export from structured data systems using label/field association to each record set
  • 4. #RSAC Use-case: From structured to unstructured 4 SoR SoE Structured data system (SoR) Classification Engine algorithms and dictionaries IndexerAPI Classification Index all existing and newly written data is indexed and classified based on algorithm and dictionary defined for the SoR Provide classification information to the user – or access policy based on class to the application UI Export (E) & tag
  • 5. #RSAC 5 Box.com is an external cloud platform used by Cisco for collaboration and storage of data Security questions to ask: What is this data? What’s the source of the data? Who owns this data? What’s the sensitivity of the data? Is all data equally sensitive (this is the essence for optimal security)? What’s the level of security required? An unstructured data use-case: box.com
  • 6. #RSAC Should we ask the user to govern security? 6 Can we expect the user to make the right security decision with all this complexity involved in decision making? The user needs to be very knowledgable to make the right decision The answer is No: But however many systems are designed to have users govern security - Recognize data categories in systems with unstructured data Classify data in any data system Set data securitypolicy Securely export data out of the system Making the shift from user governed to data owner governed
  • 7. #RSAC Data Management Policy Enforcement Governance of Data by Data Owner Data Protection capabilities Data Intelligence & monitoring capabilities Governance of Data by End User How to make the shift to a data owner model? Classify Sensitivity Data Taxonomy Recognize Data Type Tag Across various data types: Engineering, Customer, Finance, HR
  • 8. #RSAC Conceptual approach 8 Discover Recognize Classify Find data objects Identified Data Sensitivity1 Large unstructured generic data repositories Classification mostly unknown Data Sensitivity2 Data Sensitivity3 Data Sensitivity4 Structured data systems (SoR)
  • 9. #RSAC Structured data case study: Engineering & Customer data protection in context of bug Information
  • 10. #RSAC A case study: Bug information 10 Millions of bugs + product bugs, 3 approaches available to protect: 1. Treat all bugs equally, and apply ‘very strict’ controls on all bugs • In heterogenic data models , most data is ‘Over’-protected • Limits business ability and User experience 2. Treat all bugs equally, and apply ‘loose’ controls on all bugs • Results in ‘Under’-protected data 3. Apply the right amount of protection on a bug, based on sensitivity • Balanced security and cost applied – just the right amount of security!
  • 11. #RSAC Setting the foundation for auto-class 11 Category: Is a bug Product development lifecycle: Sustaining Severity: Sev1, Status: Open Found by Customer Customer network topology Belongs to hardware A Sensitive software bug in CDETSInventory Process Identify • Identify the most sensitive IP and IP’s appropriate owner(s) Define • Define data use and access rules for the most sensitive IP Translate • Translate rules into IT enforceable policies The inventory process engages the business to build out the data taxonomy and a model of the sensitivity
  • 12. #RSAC The proof is in the numbers!! 12 Parameter Value Average time to classify a single bug 5 minutes Total number of bugs 7 Million Time to classify 35 Million minutes Cost/min of SME analyst $ 0.83/Min Cost to classify $ 29 Million Additional costs to consider for manual: Training: For consistent user behavior Change to business: Cleaning legacy Change to applications and Infrastructure Parameter Value Average time to classify a single bug* 0.002 minutes Total number of bugs 7 Million Time to classify 14,000 Minutes Estimated cost for Infrastructure and resources required to classify $ 0.25 Million Auto-Classification approach Manual approach Accuracy Results 83%
  • 13. #RSAC The most sensitive data is just a small portion 13 < 1% Restricted 2.5% Highly Confidential
  • 14. #RSAC How did we execute the methodology? 14 AS-IS: New SoR integration for Auto-Class # Phase Scope 1 Engage Identify SoR and engage stakeholders to communicate expectations, R&R, Identify data workflow (user stories) and data categories. Plan and establish scope and planning of the SoR integration 2 Attribute Analysis of data, database fields, record and build a data sensitivity model / algorithm to be able to classify the data 3 Develop Development of attribution and scoring algorithm into the classification engine and perform indexing of datasets 4 Validate Validation and tuning of classification results of the classification engine to ensure accuracy of the output 5 Integrate Integration of classification data with the source system 6 Protect Planning and implementation of protective measures in the source system for sensitive data classes Engage Attribute Develop Validate Integrate Protect A 6 step workflow, for structured data (SoR)
  • 15. #RSAC Building an attribution model 15 Attribute A, Attribute B, Attribute C ……………………. Attribute L, Attribute M, Attribute N…………… Attribute X, Attribute Y, Attribute Z…… All available source system built-in attributes Selected attributes and values Extracted entities from free-text fields and attachments: Attribution model Weights Scoring equation Values and scores Classification rules Data freshness Contextual information Extracted entities
  • 16. #RSAC How to create a similar solution for your organization? 16 Engage •System Identification •Stakeholder identification •Source system data fields •Field analysis •Field type analysis •Data record analysis •Define Dictionary •Candidate fields •Feasibility •Socialization Attribute •Field value assignment •Field correlation •Weight scoring •Sensitivity scoring Develop •Classification engine Infrastructure Setup •Classification engine configuration •Coding of classification algorithm Validate •Sample size scoping •Sample size indexing •Validation of sample set •Statistical validation of sample set •Tune •Result socialization Integrate •Design •User stories •Source system tagging (application tagging) •Stakeholder Socialization Protect •Access control •Behavior monitoring •Source System Secure design •Source System compliance •Export control •Import control •Data Loss
  • 17. #RSAC Now what? - Prevent, Detect and Educate 17 Data Visibility Prevent DetectEducate • Restrict access to the application and through search • Fine grain access based on data classification • Tag source systems and docs w/ classification metadata • Focus on most sensitive data • Integration with DLP solutions • Data science Policy Driven, Context-Based Access Control Access Visibility Control Restricted Why • Bug Status: Open • Bug Severity: Critical • Keywords: Customer:
  • 18. #RSAC Q&A 18 Anchit Arora Program Manager InfoSec, Data Security Analytics Team ancarora@cisco.com